Dear Spark developers, I am trying to study the effect of increasing number of cores ( CPU's) on speedup and accuracy ( scalability with spark ANN ) performance for the MNIST dataset using ANN implementation provided in the latest spark release.
I have formed a cluster of 5 machines with 88 cores in total.The thing which is troubling me is that even if I have more than 2 workers in my spark cluster the job gets divided only to 2 workers.( executors) which Spark takes by default and hence it takes the same time . I know we can set the number of partitions manually using sc.parallelize(train_data,10) suppose which then divides the data in 10 partitions and all the workers are involved in the computation.I am using the below code: import org.apache.spark.ml.classification.MultilayerPerceptronClassifier import org.apache.spark.ml.evaluation.MulticlassClassificationEvaluator import org.apache.spark.mllib.util.MLUtils import org.apache.spark.sql.Row // Load training data val data = MLUtils.loadLibSVMFile(sc, "data/10000_libsvm").toDF() // Split the data into train and test val splits = data.randomSplit(Array(0.7, 0.3), seed = 1234L) val train = splits(0) val test = splits(1) //val tr=sc.parallelize(train,10); // specify layers for the neural network: // input layer of size 4 (features), two intermediate of size 5 and 4 and output of size 3 (classes) val layers = Array[Int](784,160,10) // create the trainer and set its parameters val trainer = new MultilayerPerceptronClassifier().setLayers(layers).setBlockSize(128).setSeed(1234L).setMaxIter(100) // train the model val model = trainer.fit(train) // compute precision on the test set val result = model.transform(test) val predictionAndLabels = result.select("prediction", "label") val evaluator = new MulticlassClassificationEvaluator().setMetricName("precision") println("Precision:" + evaluator.evaluate(predictionAndLabels)) Can you please suggest me how can I ensure that the data/task is divided equally to all the worker machines? Thanks and Regards, Disha Shrivastava Masters student, IIT Delhi