I'd like to override the logic of comparing keys for equality in
groupByKey. Kinda like how combineByKey allows you to pass in the combining
logic for values, I'd like to do the same for keys.
My code looks like this:
val res = rdd.groupBy(myPartitioner)
Here, rdd is of type RDD[(MyKey,
I don't think there is a setup() or cleanup() in Spark but you can usually
achieve the same using mapPartitions and having the setup code at the top
of the mapPartitions and cleanup at the end.
The reason why this usually works is that in Hadoop map/reduce, each map
task runs over an input split.
)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:679)
Thanks,
Ameet
On Wed, Apr 9, 2014 at 10:48 PM, Ameet Kini ameetk...@gmail.com wrote:
val hrdd
A typo - I mean't section 2.1.2.5 ulimit and nproc of
https://hbase.apache.org/book.html
Ameet
On Fri, Apr 11, 2014 at 10:32 AM, Ameet Kini ameetk...@gmail.com wrote:
Turns out that my ulimit settings were too low. I bumped up and the job
successfully completes. Here's what I have now
://www.sigmoidanalytics.com
@mayur_rustagi https://twitter.com/mayur_rustagi
On Thu, Mar 20, 2014 at 3:20 PM, Ameet Kini ameetk...@gmail.com wrote:
val rdd2 = rdd.partitionBy(my partitioner).reduceByKey(some function)
I see that rdd2's partitions are not internally sorted. Can someone
confirm