Hi, I've implemented back propagation algorithm using Gradient class and a simple update using Updater class. Then I run the algorithm with mllib's GradientDescent class. I have troubles in scaling out this implementation. I thought that if I partition my data into the number of workers then performance will increase, because each worker will run a step of gradient descent on its partition of data. But this does not happen and each worker seems to process all data (if miniBatchFraction == 1.0 as in mllib's logisic regression implementation). For me, this doesn't make sense, because then only single Worker will provide the same performance. Could someone elaborate on this and correct me if I am wrong. How can I scale out the algorithm with many Workers?
Best regards, Alexander