How about your scene? do you need use lots of Broadcast? If not, It will be
better to focus more on other thing.
At this time, there is not more better method than TorrentBroadcast. Though
one-by-one, but after one node get the data, it can act as the data source
immediately.
Guys,
I have a question regarding to Spark 1.1 broadcast implementation.
In our pipeline, we have a large multi-class LR model, which is about 1GiB
size.
To employ the benefit of Spark parallelism, a natural thinking is to
broadcast this model file to the worker node.
However, it looks that
In the current implementation of TorrentBroadcast, the blocks are
fetched one-by-one
in single thread, so it can not fully utilize the network bandwidth.
Davies
On Fri, Jan 9, 2015 at 2:11 AM, Jun Yang yangjun...@gmail.com wrote:
Guys,
I have a question regarding to Spark 1.1 broadcast
You can try the following:
- Increase spark.akka.frameSize (default is 10MB)
- Try using torrentBroadcast
Thanks
Best Regards
On Fri, Jan 9, 2015 at 3:41 PM, Jun Yang yangjun...@gmail.com wrote:
Guys,
I have a question regarding to Spark 1.1 broadcast implementation.
In our pipeline, we