Hi Alexander, Thanks a lot for your response.Yes, I am considering the use case when the weight matrix is too large to fit into the main memory of a single machine.
Can you tell me ways of dividing the weight matrix? According to my investigations so far, we can do this by two ways: 1. By parallelizing the weight matrix RDD using sc.parallelize and then using suitable map functions in the forward and backward pass. 2. By using RowMatrix / BlockMatrix to represent the weight matrix and do calculations on it. Which of these methods will be efficient to use ? Also, I came across an implementation using Akka where layer-by-layer partitioning of the network has been done ( http://alexminnaar.com/implementing-the-distbelief-deep-neural-network-training-framework-with-akka.html) which I believe is model parallelism in the true sense. Please suggest any other ways/implementation that can help. I would love to hear your remarks on the above. Thanks and Regards, Disha On Wed, Dec 9, 2015 at 1:29 AM, Ulanov, Alexander <alexander.ula...@hpe.com> wrote: > Hi Disha, > > > > Which use case do you have in mind that would require model parallelism? > It should have large number of weights, so it could not fit into the memory > of a single machine. For example, multilayer perceptron topologies, that > are used for speech recognition, have up to 100M of weights. Present > hardware is capable of accommodating this in the main memory. That might be > a problem for GPUs, but this is a different topic. > > > > The straightforward way of model parallelism for fully connected neural > networks is to distribute horizontal (or vertical) blocks of weight > matrices across several nodes. That means that the input data has to be > reproduced on all these nodes. The forward and the backward passes will > require re-assembling the outputs and the errors on each of the nodes after > each layer, because each of the node can produce only partial results since > it holds a part of weights. According to my estimations, this is > inefficient due to large intermediate traffic between the nodes and should > be used only if the model does not fit in memory of a single machine. > Another way of model parallelism would be to represent the network as the > graph and use GraphX to write forward and back propagation. However, this > option does not seem very practical to me. > > > > Best regards, Alexander > > > > *From:* Disha Shrivastava [mailto:dishu....@gmail.com] > *Sent:* Tuesday, December 08, 2015 11:19 AM > *To:* Ulanov, Alexander > *Cc:* dev@spark.apache.org > *Subject:* Re: Data and Model Parallelism in MLPC > > > > Hi Alexander, > > Thanks for your response. Can you suggest ways to incorporate Model > Parallelism in MPLC? I am trying to do the same in Spark. I got hold of > your post > http://apache-spark-developers-list.1001551.n3.nabble.com/Model-parallelism-with-RDD-td13141.html > where you have divided the weight matrix into different worker machines. I > have two basic questions in this regard: > > 1. How to actually visualize/analyze and control how nodes of the neural > network/ weights are divided across different workers? > > 2. Is there any alternate way to achieve model parallelism for MPLC in > Spark? I believe we need to have some kind of synchronization and control > for the updation of weights shared across different workers during > backpropagation. > > Looking forward for your views on this. > > Thanks and Regards, > > Disha > > > > On Wed, Dec 9, 2015 at 12:36 AM, Ulanov, Alexander < > alexander.ula...@hpe.com> wrote: > > Hi Disha, > > > > Multilayer perceptron classifier in Spark implements data parallelism. > > > > Best regards, Alexander > > > > *From:* Disha Shrivastava [mailto:dishu....@gmail.com] > *Sent:* Tuesday, December 08, 2015 12:43 AM > *To:* dev@spark.apache.org; Ulanov, Alexander > *Subject:* Data and Model Parallelism in MLPC > > > > Hi, > > I would like to know if the implementation of MLPC in the latest released > version of Spark ( 1.5.2 ) implements model parallelism and data > parallelism as done in the DistBelief model implemented by Google > http://static.googleusercontent.com/media/research.google.com/hi//archive/large_deep_networks_nips2012.pdf > <http://static.googleusercontent.com/media/research.google.com/hi/archive/large_deep_networks_nips2012.pdf> > > > Thanks And Regards, > > Disha > > >