[ https://issues.apache.org/jira/browse/SPARK-15509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15298787#comment-15298787 ]
Xin Ren commented on SPARK-15509: --------------------------------- Hi Joseph, I'd like to try to fix this one. Thanks a lot :) > R MLlib algorithms should support input columns "features" and "label" > ---------------------------------------------------------------------- > > Key: SPARK-15509 > URL: https://issues.apache.org/jira/browse/SPARK-15509 > Project: Spark > Issue Type: Improvement > Components: ML, SparkR > Affects Versions: 2.0.0 > Reporter: Joseph K. Bradley > > Currently in SparkR, when you load a LibSVM dataset using the sqlContext and > then pass it to an MLlib algorithm, the ML wrappers will fail since they will > try to create a "features" column, which conflicts with the existing > "features" column from the LibSVM loader. E.g., using the "mnist" dataset > from LibSVM: > {code} > training <- loadDF(sqlContext, ".../mnist", "libsvm") > model <- naiveBayes(label ~ features, training) > {code} > This fails with: > {code} > 16/05/24 11:52:41 ERROR RBackendHandler: fit on > org.apache.spark.ml.r.NaiveBayesWrapper failed > Error in invokeJava(isStatic = TRUE, className, methodName, ...) : > java.lang.IllegalArgumentException: Output column features already exists. > at > org.apache.spark.ml.feature.VectorAssembler.transformSchema(VectorAssembler.scala:120) > at > org.apache.spark.ml.Pipeline$$anonfun$transformSchema$4.apply(Pipeline.scala:179) > at > org.apache.spark.ml.Pipeline$$anonfun$transformSchema$4.apply(Pipeline.scala:179) > at > scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57) > at > scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66) > at scala.collection.mutable.ArrayOps$ofRef.foldLeft(ArrayOps.scala:186) > at org.apache.spark.ml.Pipeline.transformSchema(Pipeline.scala:179) > at org.apache.spark.ml.PipelineStage.transformSchema(Pipeline.scala:67) > at org.apache.spark.ml.Pipeline.fit(Pipeline.scala:131) > at org.apache.spark.ml.feature.RFormula.fit(RFormula.scala:169) > at > org.apache.spark.ml.r.NaiveBayesWrapper$.fit(NaiveBayesWrapper.scala:62) > at org.apache.spark.ml.r.NaiveBayesWrapper.fit(NaiveBayesWrapper.sca > {code} > The same issue appears for the "label" column once you rename the "features" > column. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org