Re: mllib performance on cluster

SK Tue, 02 Sep 2014 13:13:17 -0700

NUm Iterations: For  LR and SVM, I am using the default value of 100.  All
the other parameters also I am using the default values.  I am pretty much
reusing the code from BinaryClassification.scala.  For Decision Tree, I dont
see any parameter for number of iterations inthe example code, so I did not
specify any. I am running each algorithm on my dataset 100 times and taking
the average runtime.


MY dataset is very dense (hardly any zeros). The labels are 1 and 0. 

I did not explicity specify the number of partitions. I did not see any code
for this in the MLLib examples for BinaryClassification and DecisionTree.

hardware: 
local: intel core i7 with 12 cores and 7.8 GB of which I am allocating 4GB
for the executor memory. According to the application detail stats in the
spark UI, the total memory consumed is around 1.5 GB.

cluster: 10 nodes with a total of 320 cores, with 16GB per node. According
to the application detail stats in the spark UI, the total memory consumed
is around 95.5 GB.




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/mllib-performance-on-cluster-tp13290p13299.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: mllib performance on cluster

Reply via email to