Re: Gradient descent and runMiniBatchSGD

2014-08-26 Thread RJ Nowling
Hi Alexander,

Can you post a link to the code?

RJ


On Tue, Aug 26, 2014 at 6:53 AM, Ulanov, Alexander alexander.ula...@hp.com
wrote:

 Hi,

 I've implemented back propagation algorithm using Gradient class and a
 simple update using Updater class. Then I run the algorithm with mllib's
 GradientDescent class. I have troubles in scaling out this implementation.
 I thought that if I partition my data into the number of workers then
 performance will increase, because each worker will run a step of gradient
 descent on its partition of data. But this does not happen and each worker
 seems to process all data (if miniBatchFraction == 1.0 as in mllib's
 logisic regression implementation). For me, this doesn't make sense,
 because then only single Worker will provide the same performance. Could
 someone elaborate on this and correct me if I am wrong. How can I scale out
 the algorithm with many Workers?

 Best regards, Alexander




-- 
em rnowl...@gmail.com
c 954.496.2314


Re: Gradient descent and runMiniBatchSGD

2014-08-26 Thread RJ Nowling
Xiangrui,

I posted a note on my JIRA for MiniBatch KMeans about the same problem --
sampling running in O(n).

Can you elaborate on ways to get more efficient sampling?  I think this
will be important for a variety of stochastic algorithms.

RJ


On Tue, Aug 26, 2014 at 12:54 PM, Xiangrui Meng men...@gmail.com wrote:

 miniBatchFraction uses RDD.sample to get the mini-batch, and sample
 still needs to visit the elements one after another. So it is not
 efficient if the task is not computation heavy and this is why
 setMiniBatchFraction is marked as experimental. If we can detect that
 the partition iterator is backed by an ArrayBuffer, maybe we can do a
 skip iterator to skip elements. -Xiangrui

 On Tue, Aug 26, 2014 at 8:15 AM, Ulanov, Alexander
 alexander.ula...@hp.com wrote:
  Hi, RJ
 
 
 https://github.com/avulanov/spark/blob/neuralnetwork/mllib/src/main/scala/org/apache/spark/mllib/classification/NeuralNetwork.scala
 
  Unit tests are in the same branch.
 
  Alexander
 
  From: RJ Nowling [mailto:rnowl...@gmail.com]
  Sent: Tuesday, August 26, 2014 6:59 PM
  To: Ulanov, Alexander
  Cc: dev@spark.apache.org
  Subject: Re: Gradient descent and runMiniBatchSGD
 
  Hi Alexander,
 
  Can you post a link to the code?
 
  RJ
 
  On Tue, Aug 26, 2014 at 6:53 AM, Ulanov, Alexander 
 alexander.ula...@hp.commailto:alexander.ula...@hp.com wrote:
  Hi,
 
  I've implemented back propagation algorithm using Gradient class and a
 simple update using Updater class. Then I run the algorithm with mllib's
 GradientDescent class. I have troubles in scaling out this implementation.
 I thought that if I partition my data into the number of workers then
 performance will increase, because each worker will run a step of gradient
 descent on its partition of data. But this does not happen and each worker
 seems to process all data (if miniBatchFraction == 1.0 as in mllib's
 logisic regression implementation). For me, this doesn't make sense,
 because then only single Worker will provide the same performance. Could
 someone elaborate on this and correct me if I am wrong. How can I scale out
 the algorithm with many Workers?
 
  Best regards, Alexander
 
 
 
  --
  em rnowl...@gmail.commailto:rnowl...@gmail.com
  c 954.496.2314




-- 
em rnowl...@gmail.com
c 954.496.2314


Re: Gradient descent and runMiniBatchSGD

2014-08-26 Thread RJ Nowling
Also, another idea: may algorithms that use sampling tend to do so multiple
times.  It may be beneficial to allow a transformation to a representation
that is more efficient for multiple rounds of sampling.


On Tue, Aug 26, 2014 at 4:36 PM, RJ Nowling rnowl...@gmail.com wrote:

 Xiangrui,

 I posted a note on my JIRA for MiniBatch KMeans about the same problem --
 sampling running in O(n).

 Can you elaborate on ways to get more efficient sampling?  I think this
 will be important for a variety of stochastic algorithms.

 RJ


 On Tue, Aug 26, 2014 at 12:54 PM, Xiangrui Meng men...@gmail.com wrote:

 miniBatchFraction uses RDD.sample to get the mini-batch, and sample
 still needs to visit the elements one after another. So it is not
 efficient if the task is not computation heavy and this is why
 setMiniBatchFraction is marked as experimental. If we can detect that
 the partition iterator is backed by an ArrayBuffer, maybe we can do a
 skip iterator to skip elements. -Xiangrui

 On Tue, Aug 26, 2014 at 8:15 AM, Ulanov, Alexander
 alexander.ula...@hp.com wrote:
  Hi, RJ
 
 
 https://github.com/avulanov/spark/blob/neuralnetwork/mllib/src/main/scala/org/apache/spark/mllib/classification/NeuralNetwork.scala
 
  Unit tests are in the same branch.
 
  Alexander
 
  From: RJ Nowling [mailto:rnowl...@gmail.com]
  Sent: Tuesday, August 26, 2014 6:59 PM
  To: Ulanov, Alexander
  Cc: dev@spark.apache.org
  Subject: Re: Gradient descent and runMiniBatchSGD
 
  Hi Alexander,
 
  Can you post a link to the code?
 
  RJ
 
  On Tue, Aug 26, 2014 at 6:53 AM, Ulanov, Alexander 
 alexander.ula...@hp.commailto:alexander.ula...@hp.com wrote:
  Hi,
 
  I've implemented back propagation algorithm using Gradient class and a
 simple update using Updater class. Then I run the algorithm with mllib's
 GradientDescent class. I have troubles in scaling out this implementation.
 I thought that if I partition my data into the number of workers then
 performance will increase, because each worker will run a step of gradient
 descent on its partition of data. But this does not happen and each worker
 seems to process all data (if miniBatchFraction == 1.0 as in mllib's
 logisic regression implementation). For me, this doesn't make sense,
 because then only single Worker will provide the same performance. Could
 someone elaborate on this and correct me if I am wrong. How can I scale out
 the algorithm with many Workers?
 
  Best regards, Alexander
 
 
 
  --
  em rnowl...@gmail.commailto:rnowl...@gmail.com
  c 954.496.2314




 --
 em rnowl...@gmail.com
 c 954.496.2314




-- 
em rnowl...@gmail.com
c 954.496.2314


Re: Gradient descent and runMiniBatchSGD

2014-08-26 Thread Ulanov, Alexander
Hi Xiangrui,

Thanks for explanation, but I'm still missing something. In my experiments, if 
miniBatchFraction == 1.0, no matter how the data is partitioned (2, 4, 8, 16 
partitions), the algorithm executes more or less in the same time. (I have 16 
Workers). Reduce from runMiniBatchSGD takes most of the time for 2 partitions, 
mapPartitionWithIndex -- for 16. What I would expect is that the time reduces 
proportional to the number of data partitions because each partition will be 
processed on separate Worker hopefully. Why the time does not reduce?

Btw processing of one instance in my algorithm is a heavy computation, this is 
exact reason why I want to parallelize it.

Best regards, Alexander

26.08.2014, в 20:54, Xiangrui Meng 
men...@gmail.commailto:men...@gmail.com написал(а):

miniBatchFraction uses RDD.sample to get the mini-batch, and sample
still needs to visit the elements one after another. So it is not
efficient if the task is not computation heavy and this is why
setMiniBatchFraction is marked as experimental. If we can detect that
the partition iterator is backed by an ArrayBuffer, maybe we can do a
skip iterator to skip elements. -Xiangrui

On Tue, Aug 26, 2014 at 8:15 AM, Ulanov, Alexander
alexander.ula...@hp.commailto:alexander.ula...@hp.com wrote:
Hi, RJ

https://github.com/avulanov/spark/blob/neuralnetwork/mllib/src/main/scala/org/apache/spark/mllib/classification/NeuralNetwork.scala

Unit tests are in the same branch.

Alexander

From: RJ Nowling [mailto:rnowl...@gmail.com]
Sent: Tuesday, August 26, 2014 6:59 PM
To: Ulanov, Alexander
Cc: dev@spark.apache.orgmailto:dev@spark.apache.org
Subject: Re: Gradient descent and runMiniBatchSGD

Hi Alexander,

Can you post a link to the code?

RJ

On Tue, Aug 26, 2014 at 6:53 AM, Ulanov, Alexander 
alexander.ula...@hp.commailto:alexander.ula...@hp.commailto:alexander.ula...@hp.com
 wrote:
Hi,

I've implemented back propagation algorithm using Gradient class and a simple 
update using Updater class. Then I run the algorithm with mllib's 
GradientDescent class. I have troubles in scaling out this implementation. I 
thought that if I partition my data into the number of workers then performance 
will increase, because each worker will run a step of gradient descent on its 
partition of data. But this does not happen and each worker seems to process 
all data (if miniBatchFraction == 1.0 as in mllib's logisic regression 
implementation). For me, this doesn't make sense, because then only single 
Worker will provide the same performance. Could someone elaborate on this and 
correct me if I am wrong. How can I scale out the algorithm with many Workers?

Best regards, Alexander



--
em rnowl...@gmail.commailto:rnowl...@gmail.commailto:rnowl...@gmail.com
c 954.496.2314

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org