Hayri Volkan Agun created SPARK-16098: -----------------------------------------
Summary: Multiclass SVM Learning Key: SPARK-16098 URL: https://issues.apache.org/jira/browse/SPARK-16098 Project: Spark Issue Type: Request Components: ML, MLlib Affects Versions: 2.0.0, 2.1.0 Environment: Spark MLLib and ML 1.6.1 Reporter: Hayri Volkan Agun Fix For: 2.1.0 There exists a OneVsRest classifier for using all binary classification classifiers in multi-class classification. However for Linear SVM using OneVsRest may create an imbalanced dataset scenarios where SVM of Spark certainly fails. I verified this by creating LinearSVM classifier and implemented predictRaw method of ClassificationModel class. In all experiments the results came very poor in terms of F-Measure. The only explanation is SVM is very sensitive to imbalanced dataset, and naturally OneVsRest classifier creates an imbalanced dataset. For multi-class classification, linear SVM can be optimized by considering imbalanced datasets. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org