Hi Will,

When you call collect() the item you are collecting needs to fit in
memory on the driver. Is it possible your driver program does not have
enough memory?

- Patrick

On Wed, Dec 24, 2014 at 9:34 PM, Will Yang <era.ye...@gmail.com> wrote:
> Hi all,
> In my occasion, I have a huge HashMap[(Int, Long), (Double, Double,
> Double)], say several GB to tens of GB, after each iteration, I need to
> collect() this HashMap and perform some calculation, and then broadcast()
> it to every node. Now I have 20GB for each executor and after it
> performances collect(), it gets stuck at "Added rdd_xx_xx", no further
> respond showed on the Application UI.
>
> I've tried to lower the spark.shuffle.memoryFraction and
> spark.storage.memoryFraction, but it seems that it can only deal with as
> much as 2GB HashMap. What should I optimize for such conditions.
>
> (ps: sorry for my bad English & Grammar)
>
>
> Thanks

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Reply via email to