[ https://issues.apache.org/jira/browse/SPARK-15509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Xin Ren updated SPARK-15509: ---------------------------- Description: Currently in SparkR, when you load a LibSVM dataset using the sqlContext and then pass it to an MLlib algorithm, the ML wrappers will fail since they will try to create a "features" column, which conflicts with the existing "features" column from the LibSVM loader. E.g., using the "mnist" dataset from LibSVM: {code} training <- loadDF(sqlContext, ".../mnist", "libsvm") model <- naiveBayes(label ~ features, training) {code} This fails with: {code} 16/05/24 11:52:41 ERROR RBackendHandler: fit on org.apache.spark.ml.r.NaiveBayesWrapper failed Error in invokeJava(isStatic = TRUE, className, methodName, ...) : java.lang.IllegalArgumentException: Output column features already exists. at org.apache.spark.ml.feature.VectorAssembler.transformSchema(VectorAssembler.scala:120) at org.apache.spark.ml.Pipeline$$anonfun$transformSchema$4.apply(Pipeline.scala:179) at org.apache.spark.ml.Pipeline$$anonfun$transformSchema$4.apply(Pipeline.scala:179) at scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57) at scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66) at scala.collection.mutable.ArrayOps$ofRef.foldLeft(ArrayOps.scala:186) at org.apache.spark.ml.Pipeline.transformSchema(Pipeline.scala:179) at org.apache.spark.ml.PipelineStage.transformSchema(Pipeline.scala:67) at org.apache.spark.ml.Pipeline.fit(Pipeline.scala:131) at org.apache.spark.ml.feature.RFormula.fit(RFormula.scala:169) at org.apache.spark.ml.r.NaiveBayesWrapper$.fit(NaiveBayesWrapper.scala:62) at org.apache.spark.ml.r.NaiveBayesWrapper.fit(NaiveBayesWrapper.sca {code} The same issue appears for the "label" column once you rename the "features" column. was: Currently in SparkR, when you load a LibSVM dataset using the sqlContext and then pass it to an MLlib algorithm, the ML wrappers will fail since they will try to create a "features" column, which conflicts with the existing "features" column from the LibSVM loader. E.g., using the "mnist" dataset from LibSVM: {code} training <- loadDF(sqlContext, ".../mnist", "libsvm") model <- spark.naiveBayes(label ~ features, training) {code} This fails with: {code} 16/05/24 11:52:41 ERROR RBackendHandler: fit on org.apache.spark.ml.r.NaiveBayesWrapper failed Error in invokeJava(isStatic = TRUE, className, methodName, ...) : java.lang.IllegalArgumentException: Output column features already exists. at org.apache.spark.ml.feature.VectorAssembler.transformSchema(VectorAssembler.scala:120) at org.apache.spark.ml.Pipeline$$anonfun$transformSchema$4.apply(Pipeline.scala:179) at org.apache.spark.ml.Pipeline$$anonfun$transformSchema$4.apply(Pipeline.scala:179) at scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57) at scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66) at scala.collection.mutable.ArrayOps$ofRef.foldLeft(ArrayOps.scala:186) at org.apache.spark.ml.Pipeline.transformSchema(Pipeline.scala:179) at org.apache.spark.ml.PipelineStage.transformSchema(Pipeline.scala:67) at org.apache.spark.ml.Pipeline.fit(Pipeline.scala:131) at org.apache.spark.ml.feature.RFormula.fit(RFormula.scala:169) at org.apache.spark.ml.r.NaiveBayesWrapper$.fit(NaiveBayesWrapper.scala:62) at org.apache.spark.ml.r.NaiveBayesWrapper.fit(NaiveBayesWrapper.sca {code} The same issue appears for the "label" column once you rename the "features" column. > R MLlib algorithms should support input columns "features" and "label" > ---------------------------------------------------------------------- > > Key: SPARK-15509 > URL: https://issues.apache.org/jira/browse/SPARK-15509 > Project: Spark > Issue Type: Improvement > Components: ML, SparkR > Reporter: Joseph K. Bradley > > Currently in SparkR, when you load a LibSVM dataset using the sqlContext and > then pass it to an MLlib algorithm, the ML wrappers will fail since they will > try to create a "features" column, which conflicts with the existing > "features" column from the LibSVM loader. E.g., using the "mnist" dataset > from LibSVM: > {code} > training <- loadDF(sqlContext, ".../mnist", "libsvm") > model <- naiveBayes(label ~ features, training) > {code} > This fails with: > {code} > 16/05/24 11:52:41 ERROR RBackendHandler: fit on > org.apache.spark.ml.r.NaiveBayesWrapper failed > Error in invokeJava(isStatic = TRUE, className, methodName, ...) : > java.lang.IllegalArgumentException: Output column features already exists. > at > org.apache.spark.ml.feature.VectorAssembler.transformSchema(VectorAssembler.scala:120) > at > org.apache.spark.ml.Pipeline$$anonfun$transformSchema$4.apply(Pipeline.scala:179) > at > org.apache.spark.ml.Pipeline$$anonfun$transformSchema$4.apply(Pipeline.scala:179) > at > scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57) > at > scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66) > at scala.collection.mutable.ArrayOps$ofRef.foldLeft(ArrayOps.scala:186) > at org.apache.spark.ml.Pipeline.transformSchema(Pipeline.scala:179) > at org.apache.spark.ml.PipelineStage.transformSchema(Pipeline.scala:67) > at org.apache.spark.ml.Pipeline.fit(Pipeline.scala:131) > at org.apache.spark.ml.feature.RFormula.fit(RFormula.scala:169) > at > org.apache.spark.ml.r.NaiveBayesWrapper$.fit(NaiveBayesWrapper.scala:62) > at org.apache.spark.ml.r.NaiveBayesWrapper.fit(NaiveBayesWrapper.sca > {code} > The same issue appears for the "label" column once you rename the "features" > column. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org