Pyspark not using all cores

2015-03-10 Thread htailor
Hi All, I need some help with a problem in pyspark which is causing a major issue. Recently I've noticed that the behaviour of the python.deamons on the worker nodes for compute-intensive tasks have changed from using all the avaliable cores to using only a single core. On each worker node, 8

Broadcast failure with variable size of ~ 500mb with key already cancelled ?

2014-10-24 Thread htailor
Hi All, I am relatively new to spark and currently having troubles with broadcasting large variables ~500mb in size. Th e broadcast fails with an error shown below and the memory usage on the hosts also blow up. Our hardware consists of 8 hosts (1 x 64gb (driver) and 7 x 32gb (workers)) and we