Re: Data and Model Parallelism in MLPC

2015-12-30 Thread Disha Shrivastava
Hi,

I went through the code for implementation of MLPC and couldn't understand
why stacking/unstacking of the input data has been done. The description
says " Block size for stacking input data in matrices to speed up the
computation. Data is stacked within partitions. If block size is more than
remaining data in a partition then it is adjusted to the size of this
data. Recommended
size is between 10 and 1000. Default: 128". I am not pretty sure what this
means and how does this attain speed in computation?

Also, I couldn't find exactly how data parallelism as depicted in
http://static.googleusercontent.com/media/research.google.com/hi//archive/large_deep_networks_nips2012.pdf
is incorporated in the existing code. There seems to be no notion of
parameter server and optimization routine is also normal LBFGS not
Sandblaster LBFGS. The only parallelism seems to be coming from the way
input data is read and stored.

Please correct me if I am wrong and clarify my doubt.

Thanks and Regards,
Disha

On Tue, Dec 29, 2015 at 5:40 PM, Disha Shrivastava <dishu@gmail.com>
wrote:

> Hi Alexander,
>
> Thanks a lot for your response.Yes, I am considering the use case when the
> weight matrix is too large to fit into the main memory of a single machine.
>
> Can you tell me ways of dividing the weight matrix? According to my
> investigations so far, we can do this by two ways:
>
> 1. By parallelizing the weight matrix RDD using sc.parallelize and then
> using suitable map functions in the forward and backward pass.
> 2. By using RowMatrix / BlockMatrix to represent the weight matrix and do
> calculations on it.
>
> Which of these methods will be efficient to use ? Also, I came across an
> implementation using Akka where layer-by-layer partitioning of the network
> has been done (
> http://alexminnaar.com/implementing-the-distbelief-deep-neural-network-training-framework-with-akka.html)
> which I believe is model parallelism in the true sense.
>
> Please suggest any other ways/implementation that can help. I would love
> to hear your remarks on the above.
>
> Thanks and Regards,
> Disha
>
> On Wed, Dec 9, 2015 at 1:29 AM, Ulanov, Alexander <
> alexander.ula...@hpe.com> wrote:
>
>> Hi Disha,
>>
>>
>>
>> Which use case do you have in mind that would require model parallelism?
>> It should have large number of weights, so it could not fit into the memory
>> of a single machine. For example, multilayer perceptron topologies, that
>> are used for speech recognition, have up to 100M of weights. Present
>> hardware is capable of accommodating this in the main memory. That might be
>> a problem for GPUs, but this is a different topic.
>>
>>
>>
>> The straightforward way of model parallelism for fully connected neural
>> networks is to distribute horizontal (or vertical) blocks of weight
>> matrices across several nodes. That means that the input data has to be
>> reproduced on all these nodes. The forward and the backward passes will
>> require re-assembling the outputs and the errors on each of the nodes after
>> each layer, because each of the node can produce only partial results since
>> it holds a part of weights. According to my estimations, this is
>> inefficient due to large intermediate traffic between the nodes and should
>> be used only if the model does not fit in memory of a single machine.
>> Another way of model parallelism would be to represent the network as the
>> graph and use GraphX to write forward and back propagation. However, this
>> option does not seem very practical to me.
>>
>>
>>
>> Best regards, Alexander
>>
>>
>>
>> *From:* Disha Shrivastava [mailto:dishu@gmail.com]
>> *Sent:* Tuesday, December 08, 2015 11:19 AM
>> *To:* Ulanov, Alexander
>> *Cc:* dev@spark.apache.org
>> *Subject:* Re: Data and Model Parallelism in MLPC
>>
>>
>>
>> Hi Alexander,
>>
>> Thanks for your response. Can you suggest ways to incorporate Model
>> Parallelism in MPLC? I am trying to do the same in Spark. I got hold of
>> your post
>> http://apache-spark-developers-list.1001551.n3.nabble.com/Model-parallelism-with-RDD-td13141.html
>> where you have divided the weight matrix into different worker machines. I
>> have two basic questions in this regard:
>>
>> 1. How to actually visualize/analyze and control how nodes of the neural
>> network/ weights are divided across different workers?
>>
>> 2. Is there any alternate way to achieve model parallelism for MPLC in
>> Spark? I believe we need to have some kind of synchronization and control
>> for the updation of weights shared across diff

Partitioning of RDD across worker machines

2015-12-29 Thread Disha Shrivastava
Hi,

Suppose I have a file locally on my master machine and the same file is
also present in the same path on all the worker machines , say
/home/user_name/Desktop. I wanted to know that when we partition the data
using sc.parallelize , Spark actually broadcasts parts of the RDD to all
the worker machines or it reads the corresponding segment locally from the
memory of the worker machine?

How to I avoid movement of this data? Will it help if I store the file in
HDFS?

Thanks and Regards,
Disha


Akka with Spark

2015-12-26 Thread Disha Shrivastava
Hi,

I wanted to know how to use Akka framework with Spark starting from basics.
I saw online that Spark uses Akka framework but I am not really sure if I
can define Actors and use it in Spark.

Also, how to integrate Akka with Spark as in how will I know how many Akka
actors are running on each of my worker machines? Can I control that?

Please help. The only useful resource which I could find online was Akka
with Spark Streaming which was also not very clear.

Thanks,

Disha


Data and Model Parallelism in MLPC

2015-12-08 Thread Disha Shrivastava
Hi,

I would like to know if the implementation of MLPC in the latest released
version of Spark ( 1.5.2 ) implements model parallelism and data
parallelism as done in the DistBelief model implemented by Google
http://static.googleusercontent.com/media/research.google.com/hi//archive/large_deep_networks_nips2012.pdf


Thanks And Regards,
Disha


Re: Data and Model Parallelism in MLPC

2015-12-08 Thread Disha Shrivastava
Hi Alexander,

Thanks for your response. Can you suggest ways to incorporate Model
Parallelism in MPLC? I am trying to do the same in Spark. I got hold of
your post
http://apache-spark-developers-list.1001551.n3.nabble.com/Model-parallelism-with-RDD-td13141.html
where you have divided the weight matrix into different worker machines. I
have two basic questions in this regard:

1. How to actually visualize/analyze and control how nodes of the neural
network/ weights are divided across different workers?

2. Is there any alternate way to achieve model parallelism for MPLC in
Spark? I believe we need to have some kind of synchronization and control
for the updation of weights shared across different workers during
backpropagation.

Looking forward for your views on this.

Thanks and Regards,
Disha

On Wed, Dec 9, 2015 at 12:36 AM, Ulanov, Alexander <alexander.ula...@hpe.com
> wrote:

> Hi Disha,
>
>
>
> Multilayer perceptron classifier in Spark implements data parallelism.
>
>
>
> Best regards, Alexander
>
>
>
> *From:* Disha Shrivastava [mailto:dishu@gmail.com]
> *Sent:* Tuesday, December 08, 2015 12:43 AM
> *To:* dev@spark.apache.org; Ulanov, Alexander
> *Subject:* Data and Model Parallelism in MLPC
>
>
>
> Hi,
>
> I would like to know if the implementation of MLPC in the latest released
> version of Spark ( 1.5.2 ) implements model parallelism and data
> parallelism as done in the DistBelief model implemented by Google
> http://static.googleusercontent.com/media/research.google.com/hi//archive/large_deep_networks_nips2012.pdf
> <http://static.googleusercontent.com/media/research.google.com/hi/archive/large_deep_networks_nips2012.pdf>
>
>
> Thanks And Regards,
>
> Disha
>


Re: Implementation of RNN/LSTM in Spark

2015-11-03 Thread Disha Shrivastava
Hi Julio,

Can you please cite references based on the distributed implementation?

On Tue, Nov 3, 2015 at 8:52 PM, Julio Antonio Soto de Vicente <
ju...@esbet.es> wrote:

> Hi,
> Is my understanding that little research has been done yet on distributed
> computation (without access to shared memory) in RNN. I also look forward
> to contributing in this respect.
>
> El 03/11/2015, a las 16:00, Disha Shrivastava <dishu@gmail.com>
> escribió:
>
> I would love to work on this and ask for ideas on how it can be done or
> can suggest some papers as starting point. Also, I wanted to know if Spark
> would be an ideal platform to have a distributive implementation for
> RNN/LSTM
>
> On Mon, Nov 2, 2015 at 10:52 AM, Sasaki Kai <lewua...@me.com> wrote:
>
>> Hi, Disha
>>
>> There seems to be no JIRA on RNN/LSTM directly. But there were several
>> tickets about other type of networks regarding deep learning.
>>
>> Stacked Auto Encoder
>> https://issues.apache.org/jira/browse/SPARK-2623
>> CNN
>> https://issues.apache.org/jira/browse/SPARK-9129
>> https://issues.apache.org/jira/browse/SPARK-9273
>>
>> Roadmap of MLlib deep learning
>> https://issues.apache.org/jira/browse/SPARK-5575
>>
>> I think it may be good to join the discussion on SPARK-5575.
>> Best
>>
>> Kai Sasaki
>>
>>
>> On Nov 2, 2015, at 1:59 PM, Disha Shrivastava <dishu@gmail.com>
>> wrote:
>>
>> Hi,
>>
>> I wanted to know if someone is working on implementing RNN/LSTM in Spark
>> or has already done. I am also willing to contribute to it and get some
>> guidance on how to go about it.
>>
>> Thanks and Regards
>> Disha
>> Masters Student, IIT Delhi
>>
>>
>>
>


Implementation of RNN/LSTM in Spark

2015-11-01 Thread Disha Shrivastava
Hi,

I wanted to know if someone is working on implementing RNN/LSTM in Spark or
has already done. I am also willing to contribute to it and get some
guidance on how to go about it.

Thanks and Regards
Disha
Masters Student, IIT Delhi


Re: No speedup in MultiLayerPerceptronClassifier with increase in number of cores

2015-10-15 Thread Disha Shrivastava
Hi Alexander,

Thanks for your reply.Actually I am working with a modified version of the
actual MNIST dataset ( maximum samples = 8.2 M)
https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass.html. I
have been running different sized versions*( 1,10,50,1M,8M
samples)* on different number of workers(*1,2,3,4,5*) and obtaining
results. I have observed that when I specify partitions manually, the
cluster actually shows scalability performance with decrease in time taken
with increase in number of cores. With default settings, Spark
automatically divides the data into partitions ( I guess based on data
size,etc) and this number is fixed irrespective of the actual number of
workers present in the cluster.

As per the data residing on two machines is concerned, I am reading the
data from HDFS ( multi-node hadoop cluster setup done for all worker
machines). With default number of partitions, Spark gives better results (
less time and better accuracy) as compared to when I manually set the
number of partitions; but the problem here is that I can't observe the
effect of scalability.

My question is that if I have to obtain both scalability and optimality how
should I go about it in Spark? Because clearly in my case, scalable
implementation is not necessarily optimal. Here, by scalability I mean that
if I increase he number of worker machines , I should get a better
performance ( less time taken).

Thanks and Regards
Disha

On Mon, Oct 12, 2015 at 11:45 PM, Ulanov, Alexander <
alexander.ula...@hpe.com> wrote:

> Hi Disha,
>
>
>
> The problem might be as follows. The data that you have might physically
> reside only on two nodes and Spark launches data-local tasks. As a result,
> only two workers are used. You might want to force Spark to distribute the
> data across all nodes, however it does not seem to be worthwhile for this
> rather small dataset.
>
>
>
> Best regards, Alexander
>
>
>
> *From:* Disha Shrivastava [mailto:dishu@gmail.com]
> *Sent:* Sunday, October 11, 2015 9:29 AM
> *To:* Mike Hynes
> *Cc:* dev@spark.apache.org; Ulanov, Alexander
> *Subject:* Re: No speedup in MultiLayerPerceptronClassifier with increase
> in number of cores
>
>
>
> Actually I have 5 workers running ( 1 per physical machine) as displayed
> by the spark UI on spark://IP_of_the_master:7077. I have entered all the
> physical machines IP in a file named slaves in spark/conf directory and
> using the script start-all.sh to start the cluster.
>
> My question is that is there a way to control how the tasks are
> distributed among different workers? To my knowledge it is done by Spark
> automatically and is not in our control.
>
>
>
> On Sun, Oct 11, 2015 at 9:49 PM, Mike Hynes <91m...@gmail.com> wrote:
>
> Having only 2 workers for 5 machines would be your problem: you
> probably want 1 worker per physical machine, which entails running the
> spark-daemon.sh script to start a worker on those machines.
> The partitioning is agnositic to how many executors are available for
> running the tasks, so you can't do scalability tests in the manner
> you're thinking by changing the partitioning.
>
>
> On 10/11/15, Disha Shrivastava <dishu@gmail.com> wrote:
> > Dear Spark developers,
> >
> > I am trying to study the effect of increasing number of cores ( CPU's) on
> > speedup and accuracy ( scalability with spark ANN ) performance for the
> > MNIST dataset using ANN implementation provided in the latest spark
> > release.
> >
> > I have formed a cluster of 5 machines with 88 cores in total.The thing
> > which is troubling me is that even if I have more than 2 workers in my
> > spark cluster the job gets divided only to 2 workers.( executors) which
> > Spark takes by default and hence it takes the same time . I know we can
> set
> > the number of partitions manually using sc.parallelize(train_data,10)
> > suppose which then divides the data in 10 partitions and all the workers
> > are involved in the computation.I am using the below code:
> >
> >
> > import org.apache.spark.ml.classification.MultilayerPerceptronClassifier
> > import org.apache.spark.ml.evaluation.MulticlassClassificationEvaluator
> > import org.apache.spark.mllib.util.MLUtils
> > import org.apache.spark.sql.Row
> >
> > // Load training data
> > val data = MLUtils.loadLibSVMFile(sc, "data/1_libsvm").toDF()
> > // Split the data into train and test
> > val splits = data.randomSplit(Array(0.7, 0.3), seed = 1234L)
> > val train = splits(0)
> > val test = splits(1)
> > //val tr=sc.parallelize(train,10);
> > // specify layers for the neural network:
> > // input layer of size 4 (features), two intermediat

No speedup in MultiLayerPerceptronClassifier with increase in number of cores

2015-10-11 Thread Disha Shrivastava
Dear Spark developers,

I am trying to study the effect of increasing number of cores ( CPU's) on
speedup and accuracy ( scalability with spark ANN ) performance for the
MNIST dataset using ANN implementation provided in the latest spark release.

I have formed a cluster of 5 machines with 88 cores in total.The thing
which is troubling me is that even if I have more than 2 workers in my
spark cluster the job gets divided only to 2 workers.( executors) which
Spark takes by default and hence it takes the same time . I know we can set
the number of partitions manually using sc.parallelize(train_data,10)
suppose which then divides the data in 10 partitions and all the workers
are involved in the computation.I am using the below code:


import org.apache.spark.ml.classification.MultilayerPerceptronClassifier
import org.apache.spark.ml.evaluation.MulticlassClassificationEvaluator
import org.apache.spark.mllib.util.MLUtils
import org.apache.spark.sql.Row

// Load training data
val data = MLUtils.loadLibSVMFile(sc, "data/1_libsvm").toDF()
// Split the data into train and test
val splits = data.randomSplit(Array(0.7, 0.3), seed = 1234L)
val train = splits(0)
val test = splits(1)
//val tr=sc.parallelize(train,10);
// specify layers for the neural network:
// input layer of size 4 (features), two intermediate of size 5 and 4 and
output of size 3 (classes)
val layers = Array[Int](784,160,10)
// create the trainer and set its parameters
val trainer = new
MultilayerPerceptronClassifier().setLayers(layers).setBlockSize(128).setSeed(1234L).setMaxIter(100)
// train the model
val model = trainer.fit(train)
// compute precision on the test set
val result = model.transform(test)
val predictionAndLabels = result.select("prediction", "label")
val evaluator = new
MulticlassClassificationEvaluator().setMetricName("precision")
println("Precision:" + evaluator.evaluate(predictionAndLabels))

Can you please suggest me how can I ensure that the data/task is divided
equally to all the worker machines?

Thanks and Regards,
Disha Shrivastava
Masters student, IIT Delhi