[ https://issues.apache.org/jira/browse/SPARK-18412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15656892#comment-15656892 ]
Apache Spark commented on SPARK-18412: -------------------------------------- User 'yanboliang' has created a pull request for this issue: https://github.com/apache/spark/pull/15851 > SparkR spark.randomForest classification throws exception when training on > libsvm data > -------------------------------------------------------------------------------------- > > Key: SPARK-18412 > URL: https://issues.apache.org/jira/browse/SPARK-18412 > Project: Spark > Issue Type: Bug > Components: ML, SparkR > Reporter: Yanbo Liang > > {{spark.randomForest}} classification throws exception when training on > libsvm data. It can be reproduced as following: > {code} > df <- read.df("data/mllib/sample_multiclass_classification_data.txt", source > = "libsvm") > model <- spark.randomForest(df, label ~ features, "classification") > {code} > The exception is: > {code} > Error in handleErrors(returnStatus, conn) : > java.lang.IllegalArgumentException: requirement failed: If label column > already exists, forceIndexLabel can not be set with true. > at scala.Predef$.require(Predef.scala:224) > at > org.apache.spark.ml.feature.RFormula.transformSchema(RFormula.scala:205) > at org.apache.spark.ml.PipelineStage.transformSchema(Pipeline.scala:70) > at org.apache.spark.ml.feature.RFormula.fit(RFormula.scala:136) > at > org.apache.spark.ml.r.RandomForestClassifierWrapper$.fit(RandomForestClassificationWrapper.scala:86) > at > org.apache.spark.ml.r.RandomForestClassifierWrapper.fit(RandomForestClassificationWrapper.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBackendHandler.scala:172) > {code} > This error is caused by the label column of the R formula already exists, we > can not force to index label. However, it must index the label for > classification algorithms, so we need to rename the RFormula.labelCol to a > new value and then we can index the original label. > This issue also appears at other algorithms: spark.naiveBayes, spark.glm(only > for binomial family) and spark.gbt (only for classification). -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org