[ https://issues.apache.org/jira/browse/SPARK-15509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15304777#comment-15304777 ]
Xin Ren commented on SPARK-15509: --------------------------------- Hi [~josephkb], I tried many times but cannot reproduce your error message here. I tried R naiveBayes package and also spark.naiveBayes, but both got {code} naiveBayes formula interface handles data frames or arrays only {code} below is what I did: {code} ./bin/sparkR --master "local[2]" > training <- loadDF(sqlContext, "data/mllib/sample_libsvm_data.txt", "libsvm") > model <- spark.naiveBayes(label ~ features, training) Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘spark.naiveBayes’ for signature ‘"formula", "SparkDataFrame"’ > model <- naiveBayes(label ~ features, training) Error in naiveBayes.formula(label ~ features, training) : naiveBayes formula interface handles data frames or arrays only {code} then I tried example here and it's working http://spark.apache.org/docs/latest/sparkr.html#gaussian-glm-model {code} df <- createDataFrame(sqlContext, iris) model <- glm(Sepal_Length ~ Sepal_Width + Species, data = df, family = "gaussian") {code} so I compare these 2 examples, and features are 'vector' type and df above is normal columns. {code} > df SparkDataFrame[Sepal_Length:double, Sepal_Width:double, Petal_Length:double, Petal_Width:double, Species:string] > training SparkDataFrame[label:double, features:vector] {code} I also downloaded "mnist" dataset LibSVM, and same error. https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass.html#mnist Is there anything I'm doing wrong? I'm using R package of naiveBayes (http://www.inside-r.org/packages/cran/e1071/docs/naivebayes), maybe I'm using the wrong package? Thank you very much Joseph. > R MLlib algorithms should support input columns "features" and "label" > ---------------------------------------------------------------------- > > Key: SPARK-15509 > URL: https://issues.apache.org/jira/browse/SPARK-15509 > Project: Spark > Issue Type: Improvement > Components: ML, SparkR > Reporter: Joseph K. Bradley > > Currently in SparkR, when you load a LibSVM dataset using the sqlContext and > then pass it to an MLlib algorithm, the ML wrappers will fail since they will > try to create a "features" column, which conflicts with the existing > "features" column from the LibSVM loader. E.g., using the "mnist" dataset > from LibSVM: > {code} > training <- loadDF(sqlContext, ".../mnist", "libsvm") > model <- naiveBayes(label ~ features, training) > {code} > This fails with: > {code} > 16/05/24 11:52:41 ERROR RBackendHandler: fit on > org.apache.spark.ml.r.NaiveBayesWrapper failed > Error in invokeJava(isStatic = TRUE, className, methodName, ...) : > java.lang.IllegalArgumentException: Output column features already exists. > at > org.apache.spark.ml.feature.VectorAssembler.transformSchema(VectorAssembler.scala:120) > at > org.apache.spark.ml.Pipeline$$anonfun$transformSchema$4.apply(Pipeline.scala:179) > at > org.apache.spark.ml.Pipeline$$anonfun$transformSchema$4.apply(Pipeline.scala:179) > at > scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57) > at > scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66) > at scala.collection.mutable.ArrayOps$ofRef.foldLeft(ArrayOps.scala:186) > at org.apache.spark.ml.Pipeline.transformSchema(Pipeline.scala:179) > at org.apache.spark.ml.PipelineStage.transformSchema(Pipeline.scala:67) > at org.apache.spark.ml.Pipeline.fit(Pipeline.scala:131) > at org.apache.spark.ml.feature.RFormula.fit(RFormula.scala:169) > at > org.apache.spark.ml.r.NaiveBayesWrapper$.fit(NaiveBayesWrapper.scala:62) > at org.apache.spark.ml.r.NaiveBayesWrapper.fit(NaiveBayesWrapper.sca > {code} > The same issue appears for the "label" column once you rename the "features" > column. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org