[ https://issues.apache.org/jira/browse/SPARK-18412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yanbo Liang updated SPARK-18412: -------------------------------- Description: {{spark.randomForest}} classification throws exception when training on libsvm data. It can be reproduced as following: {code} df <- read.df("data/mllib/sample_multiclass_classification_data.txt", source = "libsvm") model <- spark.randomForest(df, label ~ features, "classification") {code} The exception is: {code} 16/11/11 02:16:40 ERROR RBackendHandler: fit on org.apache.spark.ml.r.RandomForestClassifierWrapper failed java.lang.reflect.InvocationTargetException ...... Caused by: java.lang.IllegalArgumentException: requirement failed: If label column already exists, forceIndexLabel can not be set with true. ...... {code} This error is caused by the label column of the R formula already exists, we can not force to index label. However, it must index the label for classification algorithms, so we need to rename the RFormula.labelCol to a new value and then we can index the original label. This issue also appears at other algorithms: spark.naiveBayes, spark.glm(only for binomial family) and spark.gbt (only for classification). was: {{spark.randomForest}} classification throws exception when training on libsvm data. It can be reproduced as following: {code} df <- read.df("data/mllib/sample_multiclass_classification_data.txt", source = "libsvm") model <- spark.randomForest(df, label ~ features, "classification") {code} The exception is: {code} 16/11/11 02:16:40 ERROR RBackendHandler: fit on org.apache.spark.ml.r.RandomForestClassifierWrapper failed java.lang.reflect.InvocationTargetException ...... Caused by: java.lang.IllegalArgumentException: requirement failed: If label column already exists, forceIndexLabel can not be set with true. ...... {code} > SparkR spark.randomForest classification throws exception when training on > libsvm data > -------------------------------------------------------------------------------------- > > Key: SPARK-18412 > URL: https://issues.apache.org/jira/browse/SPARK-18412 > Project: Spark > Issue Type: Bug > Components: ML, SparkR > Reporter: Yanbo Liang > > {{spark.randomForest}} classification throws exception when training on > libsvm data. It can be reproduced as following: > {code} > df <- read.df("data/mllib/sample_multiclass_classification_data.txt", source > = "libsvm") > model <- spark.randomForest(df, label ~ features, "classification") > {code} > The exception is: > {code} > 16/11/11 02:16:40 ERROR RBackendHandler: fit on > org.apache.spark.ml.r.RandomForestClassifierWrapper failed > java.lang.reflect.InvocationTargetException > ...... > Caused by: java.lang.IllegalArgumentException: requirement failed: If label > column already exists, forceIndexLabel can not be set with true. > ...... > {code} > This error is caused by the label column of the R formula already exists, we > can not force to index label. However, it must index the label for > classification algorithms, so we need to rename the RFormula.labelCol to a > new value and then we can index the original label. > This issue also appears at other algorithms: spark.naiveBayes, spark.glm(only > for binomial family) and spark.gbt (only for classification). -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org