Hi, I have a small dataset (120 training points, 30 test points) that I am trying to classify into binary classes (1 or 0). The dataset has 4 numerical features and 1 binary label (1 or 0).
I used LogisticRegression and SVM in MLLib and I got 100% accuracy in both cases. But when I used DecisionTree, I am getting only 33% accuracy (basically all the predicted test labels are 1 whereas actually only 10 out of the 30 should be 1). I tried modifying the different parameters (maxDepth, bins, impurity etc) and still am able to get only 33% accuracy. I used the same dataset with R's decision tree (rpart) and I am getting 100% accuracy. I would like to understand why the performance of MLLib's decision tree model is poor and if there is some way I can improve it. thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Decision-tree-classifier-in-MLlib-tp9457.html Sent from the Apache Spark User List mailing list archive at Nabble.com.