Hi,
When I try to broadcast a hashmap, it runs much slower than the same data
broadcast in array.
It hangs in SparkContext: Created broadcast 0 for few secondes (30s), while
an array does not.
The broadcast dataset is about 1G.
best!
huanglr
. But spark broadcast them 48 times (if I understand correctly).
Is there a way to broadcast just one copy for each node and share by all tasks
running on such nodes?
Much appreciated!
best!
huanglr
have 48 partitions
corresponding 48 tasks (or clousure) where each tasks get a broadcast value (I
see this from the memory usage and the API doc). Is there a way to share the
value with all 48 partitions of 48 tasks?
best!
huanglr
From: Ashic Mahtab
Date: 2015-07-10 17:02
To: huanglr