Re: Spark reduce serialization question

2016-03-06 Thread Holden Karau
You might want to try treeAggregate On Sunday, March 6, 2016, Takeshi Yamamuro wrote: > Hi, > > I'm not exactly sure what's your codes like though, ISTM this is a correct > behaviour. > If the size of data that a driver fetches exceeds the limit, the driver > throws this

Re: Spark reduce serialization question

2016-03-06 Thread Takeshi Yamamuro
Hi, I'm not exactly sure what's your codes like though, ISTM this is a correct behaviour. If the size of data that a driver fetches exceeds the limit, the driver throws this exception. (See

Spark reduce serialization question

2016-03-04 Thread James Jia
I'm running a distributed KMeans algorithm with 4 executors. I have a RDD[Data]. I use mapPartition to run a learner on each data partition, and then call reduce with my custom model reduce function to reduce the result of the model to start a new iteration. The model size is around ~330 MB. I