[ https://issues.apache.org/jira/browse/SPARK-6812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529930#comment-14529930 ]
Sun Rui commented on SPARK-6812: -------------------------------- Interestingly, we have a unit test case for filter() and the test passes. In R, if multiple packages have a same name, the name in the package loaded lastly overwrites that in the packages loaded before. If you use bin/sparkR to start a SparkR shell, the environment list is as follows: [1] ".GlobalEnv" "package:stats" "package:graphics" [4] "package:grDevices" "package:datasets" "package:SparkR" [7] "package:utils" "package:methods" "Autoloads" [10] "package:base" You can see that "package:stats" is before "package:SparkR", so its filter() function overwrites the one in SparkR. While in the test procedure, the environment list is different: .GlobalEnv package:plyr package:SparkR package:testthat package:methods package:stats package:graphics package:grDevices package:utils package:datasets Autoloads package:base You can see that package:SparkR is before package:stats. That why filter() in SparkR passes the test. Don't know why the package loading order is different now. > filter() on DataFrame does not work as expected > ----------------------------------------------- > > Key: SPARK-6812 > URL: https://issues.apache.org/jira/browse/SPARK-6812 > Project: Spark > Issue Type: Bug > Components: SparkR > Reporter: Davies Liu > Assignee: Sun Rui > Priority: Blocker > > {code} > > filter(df, df$age > 21) > Error in filter(df, df$age > 21) : > no method for coercing this S4 class to a vector > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org