Hi Xiangrui, Thanks for explanation, but I'm still missing something. In my experiments, if miniBatchFraction == 1.0, no matter how the data is partitioned (2, 4, 8, 16 partitions), the algorithm executes more or less in the same time. (I have 16 Workers). Reduce from runMiniBatchSGD takes most of the time for 2 partitions, mapPartitionWithIndex -- for 16. What I would expect is that the time reduces proportional to the number of data partitions because each partition will be processed on separate Worker hopefully. Why the time does not reduce?
Btw processing of one instance in my algorithm is a heavy computation, this is exact reason why I want to parallelize it. Best regards, Alexander 26.08.2014, в 20:54, "Xiangrui Meng" <men...@gmail.com<mailto:men...@gmail.com>> написал(а): miniBatchFraction uses RDD.sample to get the mini-batch, and sample still needs to visit the elements one after another. So it is not efficient if the task is not computation heavy and this is why setMiniBatchFraction is marked as experimental. If we can detect that the partition iterator is backed by an ArrayBuffer, maybe we can do a skip iterator to skip elements. -Xiangrui On Tue, Aug 26, 2014 at 8:15 AM, Ulanov, Alexander <alexander.ula...@hp.com<mailto:alexander.ula...@hp.com>> wrote: Hi, RJ https://github.com/avulanov/spark/blob/neuralnetwork/mllib/src/main/scala/org/apache/spark/mllib/classification/NeuralNetwork.scala Unit tests are in the same branch. Alexander From: RJ Nowling [mailto:rnowl...@gmail.com] Sent: Tuesday, August 26, 2014 6:59 PM To: Ulanov, Alexander Cc: dev@spark.apache.org<mailto:dev@spark.apache.org> Subject: Re: Gradient descent and runMiniBatchSGD Hi Alexander, Can you post a link to the code? RJ On Tue, Aug 26, 2014 at 6:53 AM, Ulanov, Alexander <alexander.ula...@hp.com<mailto:alexander.ula...@hp.com><mailto:alexander.ula...@hp.com>> wrote: Hi, I've implemented back propagation algorithm using Gradient class and a simple update using Updater class. Then I run the algorithm with mllib's GradientDescent class. I have troubles in scaling out this implementation. I thought that if I partition my data into the number of workers then performance will increase, because each worker will run a step of gradient descent on its partition of data. But this does not happen and each worker seems to process all data (if miniBatchFraction == 1.0 as in mllib's logisic regression implementation). For me, this doesn't make sense, because then only single Worker will provide the same performance. Could someone elaborate on this and correct me if I am wrong. How can I scale out the algorithm with many Workers? Best regards, Alexander -- em rnowl...@gmail.com<mailto:rnowl...@gmail.com><mailto:rnowl...@gmail.com> c 954.496.2314 --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org