Dear Spark developers,

I am trying to study the effect of increasing number of cores ( CPU's) on
speedup and accuracy ( scalability with spark ANN ) performance for the
MNIST dataset using ANN implementation provided in the latest spark release.

I have formed a cluster of 5 machines with 88 cores in total.The thing
which is troubling me is that even if I have more than 2 workers in my
spark cluster the job gets divided only to 2 workers.( executors) which
Spark takes by default and hence it takes the same time . I know we can set
the number of partitions manually using sc.parallelize(train_data,10)
suppose which then divides the data in 10 partitions and all the workers
are involved in the computation.I am using the below code:


import org.apache.spark.ml.classification.MultilayerPerceptronClassifier
import org.apache.spark.ml.evaluation.MulticlassClassificationEvaluator
import org.apache.spark.mllib.util.MLUtils
import org.apache.spark.sql.Row

// Load training data
val data = MLUtils.loadLibSVMFile(sc, "data/10000_libsvm").toDF()
// Split the data into train and test
val splits = data.randomSplit(Array(0.7, 0.3), seed = 1234L)
val train = splits(0)
val test = splits(1)
//val tr=sc.parallelize(train,10);
// specify layers for the neural network:
// input layer of size 4 (features), two intermediate of size 5 and 4 and
output of size 3 (classes)
val layers = Array[Int](784,160,10)
// create the trainer and set its parameters
val trainer = new
MultilayerPerceptronClassifier().setLayers(layers).setBlockSize(128).setSeed(1234L).setMaxIter(100)
// train the model
val model = trainer.fit(train)
// compute precision on the test set
val result = model.transform(test)
val predictionAndLabels = result.select("prediction", "label")
val evaluator = new
MulticlassClassificationEvaluator().setMetricName("precision")
println("Precision:" + evaluator.evaluate(predictionAndLabels))

Can you please suggest me how can I ensure that the data/task is divided
equally to all the worker machines?

Thanks and Regards,
Disha Shrivastava
Masters student, IIT Delhi

Reply via email to