Re: Is It Feasible for Spark 1.1 Broadcast to Fully Utilize the Ethernet Card Throughput?
How about your scene? do you need use lots of Broadcast? If not, It will be better to focus more on other thing. At this time, there is not more better method than TorrentBroadcast. Though one-by-one, but after one node get the data, it can act as the data source immediately.
Re: Is It Feasible for Spark 1.1 Broadcast to Fully Utilize the Ethernet Card Throughput?
In the current implementation of TorrentBroadcast, the blocks are fetched one-by-one in single thread, so it can not fully utilize the network bandwidth. Davies On Fri, Jan 9, 2015 at 2:11 AM, Jun Yang wrote: > Guys, > > I have a question regarding to Spark 1.1 broadcast implementation. > > In our pipeline, we have a large multi-class LR model, which is about 1GiB > size. > To employ the benefit of Spark parallelism, a natural thinking is to > broadcast this model file to the worker node. > > However, it looks that broadcast performance is not quite good. > > During the process of broadcasting the model file, I just monitor the > network card throughput of worker node, their > recv/write throughput is just around 30~40 MiB( our server box is equipped > with 100MiB ethernet card). > > Is this the real limitation of Spark 1.1 broadcast implementation? Or there > may be some configuration or tricks > that can help make Spark broadcast perform better. > > Thanks > > > > -- > yangjun...@gmail.com > http://hi.baidu.com/yjpro - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Is It Feasible for Spark 1.1 Broadcast to Fully Utilize the Ethernet Card Throughput?
You can try the following: - Increase spark.akka.frameSize (default is 10MB) - Try using torrentBroadcast Thanks Best Regards On Fri, Jan 9, 2015 at 3:41 PM, Jun Yang wrote: > Guys, > > I have a question regarding to Spark 1.1 broadcast implementation. > > In our pipeline, we have a large multi-class LR model, which is about 1GiB > size. > To employ the benefit of Spark parallelism, a natural thinking is to > broadcast this model file to the worker node. > > However, it looks that broadcast performance is not quite good. > > During the process of broadcasting the model file, I just monitor the > network card throughput of worker node, their > recv/write throughput is just around 30~40 MiB( our server box is equipped > with 100MiB ethernet card). > > Is this the real limitation of Spark 1.1 broadcast implementation? Or > there may be some configuration or tricks > that can help make Spark broadcast perform better. > > Thanks > > > > -- > yangjun...@gmail.com > http://hi.baidu.com/yjpro >
Is It Feasible for Spark 1.1 Broadcast to Fully Utilize the Ethernet Card Throughput?
Guys, I have a question regarding to Spark 1.1 broadcast implementation. In our pipeline, we have a large multi-class LR model, which is about 1GiB size. To employ the benefit of Spark parallelism, a natural thinking is to broadcast this model file to the worker node. However, it looks that broadcast performance is not quite good. During the process of broadcasting the model file, I just monitor the network card throughput of worker node, their recv/write throughput is just around 30~40 MiB( our server box is equipped with 100MiB ethernet card). Is this the real limitation of Spark 1.1 broadcast implementation? Or there may be some configuration or tricks that can help make Spark broadcast perform better. Thanks -- yangjun...@gmail.com http://hi.baidu.com/yjpro