Re: Broadcast failure with variable size of ~ 500mb with "key already cancelled ?"

2014-11-11 Thread Davies Liu
There is a open PR [1] to support broadcast larger than 2G, could you try it? [1] https://github.com/apache/spark/pull/2659 On Tue, Nov 11, 2014 at 6:39 AM, Tom Seddon wrote: > Hi, > > Just wondering if anyone has any advice about this issue, as I am > experiencing the same thing. I'm working w

Re: Broadcast failure with variable size of ~ 500mb with "key already cancelled ?"

2014-11-11 Thread Tom Seddon
Hi, Just wondering if anyone has any advice about this issue, as I am experiencing the same thing. I'm working with multiple broadcast variables in PySpark, most of which are small, but one of around 4.5GB, using 10 workers at 31GB memory each and driver with same spec. It's not running out of m