Guys, I have a question regarding to Spark 1.1 broadcast implementation.
In our pipeline, we have a large multi-class LR model, which is about 1GiB size. To employ the benefit of Spark parallelism, a natural thinking is to broadcast this model file to the worker node. However, it looks that broadcast performance is not quite good. During the process of broadcasting the model file, I just monitor the network card throughput of worker node, their recv/write throughput is just around 30~40 MiB( our server box is equipped with 100MiB ethernet card). Is this the real limitation of Spark 1.1 broadcast implementation? Or there may be some configuration or tricks that can help make Spark broadcast perform better. Thanks -- yangjun...@gmail.com http://hi.baidu.com/yjpro