GitHub user yanboliang opened a pull request: https://github.com/apache/spark/pull/15851
[SPARK-18412][SPARKR][ML] Fix exception for some SparkR ML algorithms training on libsvm data ## What changes were proposed in this pull request? * Fix the following exceptions which throws when ```spark.randomForest```(classification), ```spark.gbt```(classification), ```spark.naiveBayes``` and ```spark.glm```(binomial family) were fitted on libsvm data. ``` java.lang.IllegalArgumentException: requirement failed: If label column already exists, forceIndexLabel can not be set with true. ``` See [SPARK-18412](https://issues.apache.org/jira/browse/SPARK-18412) for more detail about how to reproduce this bug. * Refactor out ```getFeaturesAndLabels``` to RWrapperUtils, since lots of ML algorithm wrappers use this function. * Drop some unwanted columns when making prediction. ## How was this patch tested? Add unit test. You can merge this pull request into a Git repository by running: $ git pull https://github.com/yanboliang/spark spark-18412 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15851.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15851 ---- commit 4752fe2c1e0e211ae2e27a0a7807f141c91430a2 Author: Yanbo Liang <yblia...@gmail.com> Date: 2016-11-11T10:29:27Z Handle the case label column already exists and forceIndexLabel = true. commit 6262178be4b2a085fb48ad0be8b1bf61c7812689 Author: Yanbo Liang <yblia...@gmail.com> Date: 2016-11-11T10:42:17Z Add unit tests. commit 26eb40aaca3b8e4de4d2f1922a83dc2198754c6a Author: Yanbo Liang <yblia...@gmail.com> Date: 2016-11-11T11:16:12Z Set correct label column for classification algorithms. commit d0d7c28b05bbba51266a9a1364b7fe9e4c452ed9 Author: Yanbo Liang <yblia...@gmail.com> Date: 2016-11-11T11:47:57Z Divide spark.gbt test into two parts: classification and regression. ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org