NUm Iterations: For LR and SVM, I am using the default value of 100. All the other parameters also I am using the default values. I am pretty much reusing the code from BinaryClassification.scala. For Decision Tree, I dont see any parameter for number of iterations inthe example code, so I did not specify any. I am running each algorithm on my dataset 100 times and taking the average runtime.
MY dataset is very dense (hardly any zeros). The labels are 1 and 0. I did not explicity specify the number of partitions. I did not see any code for this in the MLLib examples for BinaryClassification and DecisionTree. hardware: local: intel core i7 with 12 cores and 7.8 GB of which I am allocating 4GB for the executor memory. According to the application detail stats in the spark UI, the total memory consumed is around 1.5 GB. cluster: 10 nodes with a total of 320 cores, with 16GB per node. According to the application detail stats in the spark UI, the total memory consumed is around 95.5 GB. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/mllib-performance-on-cluster-tp13290p13299.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org