How to share memory in a broadcast between tasks in the same executor?

2015-09-22 Thread Clément Frison
Hello, My team and I have a 32-core machine and we would like to use a huge object - for example a large dictionary - in a map transformation and use all our cores in parallel by sharing this object among some tasks. We broadcast our large dictionary. dico_br = sc.broadcast(dico) We use it in

Re: How to share memory in a broadcast between tasks in the same executor?

2015-09-22 Thread Utkarsh Sengar
If broadcast variable doesn't fit in memory, I think is not the right fit for you. You can think about fitting it with an RDD as a tuple with other data you are working on. Say you are working on RDD (rdd in your case), run a map/reduce to convert it to RDD> so now

Re: How to share memory in a broadcast between tasks in the same executor?

2015-09-22 Thread Deenar Toraskar
Clement In local mode all worker threads run in the driver VM. Your dictionary should not be copied 32 times, in fact it wont be broadcast at all. Have you tried increasing spark.driver.memory to ensure that the driver uses all the memory on the machine. Deenar On 22 September 2015 at 19:42,