Re: Data and Model Parallelism in MLPC

Disha Shrivastava Tue, 29 Dec 2015 04:11:02 -0800

Hi Alexander,

Thanks a lot for your response.Yes, I am considering the use case when the
weight matrix is too large to fit into the main memory of a single machine.


Can you tell me ways of dividing the weight matrix? According to my
investigations so far, we can do this by two ways:

1. By parallelizing the weight matrix RDD using sc.parallelize and then
using suitable map functions in the forward and backward pass.
2. By using RowMatrix / BlockMatrix to represent the weight matrix and do
calculations on it.

Which of these methods will be efficient to use ? Also, I came across an
implementation using Akka where layer-by-layer partitioning of the network
has been done (
http://alexminnaar.com/implementing-the-distbelief-deep-neural-network-training-framework-with-akka.html)
which I believe is model parallelism in the true sense.

Please suggest any other ways/implementation that can help. I would love to
hear your remarks on the above.

Thanks and Regards,
Disha

On Wed, Dec 9, 2015 at 1:29 AM, Ulanov, Alexander <alexander.ula...@hpe.com>
wrote:

> Hi Disha,
>
>
>
> Which use case do you have in mind that would require model parallelism?
> It should have large number of weights, so it could not fit into the memory
> of a single machine. For example, multilayer perceptron topologies, that
> are used for speech recognition, have up to 100M of weights. Present
> hardware is capable of accommodating this in the main memory. That might be
> a problem for GPUs, but this is a different topic.
>
>
>
> The straightforward way of model parallelism for fully connected neural
> networks is to distribute horizontal (or vertical) blocks of weight
> matrices across several nodes. That means that the input data has to be
> reproduced on all these nodes. The forward and the backward passes will
> require re-assembling the outputs and the errors on each of the nodes after
> each layer, because each of the node can produce only partial results since
> it holds a part of weights. According to my estimations, this is
> inefficient due to large intermediate traffic between the nodes and should
> be used only if the model does not fit in memory of a single machine.
> Another way of model parallelism would be to represent the network as the
> graph and use GraphX to write forward and back propagation. However, this
> option does not seem very practical to me.
>
>
>
> Best regards, Alexander
>
>
>
> *From:* Disha Shrivastava [mailto:dishu....@gmail.com]
> *Sent:* Tuesday, December 08, 2015 11:19 AM
> *To:* Ulanov, Alexander
> *Cc:* dev@spark.apache.org
> *Subject:* Re: Data and Model Parallelism in MLPC
>
>
>
> Hi Alexander,
>
> Thanks for your response. Can you suggest ways to incorporate Model
> Parallelism in MPLC? I am trying to do the same in Spark. I got hold of
> your post
> http://apache-spark-developers-list.1001551.n3.nabble.com/Model-parallelism-with-RDD-td13141.html
> where you have divided the weight matrix into different worker machines. I
> have two basic questions in this regard:
>
> 1. How to actually visualize/analyze and control how nodes of the neural
> network/ weights are divided across different workers?
>
> 2. Is there any alternate way to achieve model parallelism for MPLC in
> Spark? I believe we need to have some kind of synchronization and control
> for the updation of weights shared across different workers during
> backpropagation.
>
> Looking forward for your views on this.
>
> Thanks and Regards,
>
> Disha
>
>
>
> On Wed, Dec 9, 2015 at 12:36 AM, Ulanov, Alexander <
> alexander.ula...@hpe.com> wrote:
>
> Hi Disha,
>
>
>
> Multilayer perceptron classifier in Spark implements data parallelism.
>
>
>
> Best regards, Alexander
>
>
>
> *From:* Disha Shrivastava [mailto:dishu....@gmail.com]
> *Sent:* Tuesday, December 08, 2015 12:43 AM
> *To:* dev@spark.apache.org; Ulanov, Alexander
> *Subject:* Data and Model Parallelism in MLPC
>
>
>
> Hi,
>
> I would like to know if the implementation of MLPC in the latest released
> version of Spark ( 1.5.2 ) implements model parallelism and data
> parallelism as done in the DistBelief model implemented by Google
> http://static.googleusercontent.com/media/research.google.com/hi//archive/large_deep_networks_nips2012.pdf
> <http://static.googleusercontent.com/media/research.google.com/hi/archive/large_deep_networks_nips2012.pdf>
>
>
> Thanks And Regards,
>
> Disha
>
>
>

Re: Data and Model Parallelism in MLPC

Reply via email to