Hi,

I have a small dataset (120 training points, 30 test points) that I am
trying to classify into binary classes (1 or 0). The dataset has 4 numerical
features and 1 binary label (1 or 0). 

I used LogisticRegression and SVM in MLLib and I got 100% accuracy in both
cases. But when I used DecisionTree, I am getting only 33% accuracy
(basically all the predicted test labels are 1 whereas actually only 10 out
of the 30 should be 1). I tried modifying the different parameters
(maxDepth, bins, impurity etc) and still am able to get only 33% accuracy. 

I used the same dataset with R's decision tree  (rpart) and I am getting
100% accuracy. I would like to understand why the performance of MLLib's
decision tree model is poor  and if there is some way I can improve it. 

thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Decision-tree-classifier-in-MLlib-tp9457.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to