unsubscribe

2024-02-24 Thread Ameet Kini

customized comparator in groupByKey

2014-05-06 Thread Ameet Kini
I'd like to override the logic of comparing keys for equality in groupByKey. Kinda like how combineByKey allows you to pass in the combining logic for values, I'd like to do the same for keys. My code looks like this: val res = rdd.groupBy(myPartitioner) Here, rdd is of type RDD[(MyKey,

Re: question on setup() and cleanup() methods for map() and reduce()

2014-04-28 Thread Ameet Kini
I don't think there is a setup() or cleanup() in Spark but you can usually achieve the same using mapPartitions and having the setup code at the top of the mapPartitions and cleanup at the end. The reason why this usually works is that in Hadoop map/reduce, each map task runs over an input split.

Re: shuffle memory requirements

2014-04-11 Thread Ameet Kini
) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:679) Thanks, Ameet On Wed, Apr 9, 2014 at 10:48 PM, Ameet Kini ameetk...@gmail.com wrote: val hrdd

Re: shuffle memory requirements

2014-04-11 Thread Ameet Kini
A typo - I mean't section 2.1.2.5 ulimit and nproc of https://hbase.apache.org/book.html Ameet On Fri, Apr 11, 2014 at 10:32 AM, Ameet Kini ameetk...@gmail.com wrote: Turns out that my ulimit settings were too low. I bumped up and the job successfully completes. Here's what I have now

Re: sort order after reduceByKey / groupByKey

2014-03-20 Thread Ameet Kini
://www.sigmoidanalytics.com @mayur_rustagi https://twitter.com/mayur_rustagi On Thu, Mar 20, 2014 at 3:20 PM, Ameet Kini ameetk...@gmail.com wrote: val rdd2 = rdd.partitionBy(my partitioner).reduceByKey(some function) I see that rdd2's partitions are not internally sorted. Can someone confirm