Hi Will, When you call collect() the item you are collecting needs to fit in memory on the driver. Is it possible your driver program does not have enough memory?
- Patrick On Wed, Dec 24, 2014 at 9:34 PM, Will Yang <[email protected]> wrote: > Hi all, > In my occasion, I have a huge HashMap[(Int, Long), (Double, Double, > Double)], say several GB to tens of GB, after each iteration, I need to > collect() this HashMap and perform some calculation, and then broadcast() > it to every node. Now I have 20GB for each executor and after it > performances collect(), it gets stuck at "Added rdd_xx_xx", no further > respond showed on the Application UI. > > I've tried to lower the spark.shuffle.memoryFraction and > spark.storage.memoryFraction, but it seems that it can only deal with as > much as 2GB HashMap. What should I optimize for such conditions. > > (ps: sorry for my bad English & Grammar) > > > Thanks --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
