Re: Broadcast failure with variable size of ~ 500mb with key already cancelled ?

2014-11-11 Thread Tom Seddon
Hi, Just wondering if anyone has any advice about this issue, as I am experiencing the same thing. I'm working with multiple broadcast variables in PySpark, most of which are small, but one of around 4.5GB, using 10 workers at 31GB memory each and driver with same spec. It's not running out of

Re: Broadcast failure with variable size of ~ 500mb with key already cancelled ?

2014-11-11 Thread Davies Liu
There is a open PR [1] to support broadcast larger than 2G, could you try it? [1] https://github.com/apache/spark/pull/2659 On Tue, Nov 11, 2014 at 6:39 AM, Tom Seddon mr.tom.sed...@gmail.com wrote: Hi, Just wondering if anyone has any advice about this issue, as I am experiencing the same

Broadcast failure with variable size of ~ 500mb with key already cancelled ?

2014-10-24 Thread htailor
Hi All, I am relatively new to spark and currently having troubles with broadcasting large variables ~500mb in size. Th e broadcast fails with an error shown below and the memory usage on the hosts also blow up. Our hardware consists of 8 hosts (1 x 64gb (driver) and 7 x 32gb (workers)) and we