Guys,

I have a question regarding to Spark 1.1 broadcast implementation.

In our pipeline, we have a large multi-class LR model, which is about 1GiB
size.
To employ the benefit of Spark parallelism, a natural thinking is to
broadcast this model file to the worker node.

However, it looks that broadcast performance is not quite good.

During the process of broadcasting the model file, I just monitor the
network card throughput of worker node, their
recv/write throughput is just around 30~40 MiB( our server box is equipped
with 100MiB ethernet card).

Is this the real limitation of Spark 1.1 broadcast implementation? Or there
may be some configuration or tricks
that can help make Spark broadcast perform better.

Thanks



-- 
yangjun...@gmail.com
http://hi.baidu.com/yjpro

Reply via email to