[ https://issues.apache.org/jira/browse/SPARK-10981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14948299#comment-14948299 ]
Sun Rui edited comment on SPARK-10981 at 10/8/15 8:48 AM: ---------------------------------------------------------- yes, this is a bug in SparkR. your fix looks good. Could you submit a PR for this? In the PR, please: 1. Support all join types defined in sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/joinTypes.scala (You can remove the "_" char from the currently supported join types in SparkR) 2. Add test cases for missing join types including "leftsemi" was (Author: sunrui): yes, this is a bug in SparkR. your fix looks good. Could you submit a PR for this? In the PR, please: 1. Support all join types defined in sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/joinTypes.scala (You can move the "_" char from the currently supported join types in SparkR) 2. Add test cases for missing join types including "leftsemi" > R semijoin leads to Java errors, R leftsemi leads to Spark errors > ----------------------------------------------------------------- > > Key: SPARK-10981 > URL: https://issues.apache.org/jira/browse/SPARK-10981 > Project: Spark > Issue Type: Bug > Components: R > Affects Versions: 1.5.0 > Environment: SparkR from RStudio on Macbook > Reporter: Monica Liu > Priority: Minor > Labels: easyfix, newbie > > I am using SparkR from RStudio, and I ran into an error with the join > function that I recreated with a smaller example: > {code:title=joinTest.R|borderStyle=solid} > Sys.setenv(SPARK_HOME="/Users/liumo1/Applications/spark/") > .libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths())) > library(SparkR) > sc <- sparkR.init("local[4]") > sqlContext <- sparkRSQL.init(sc) > n = c(2, 3, 5) > s = c("aa", "bb", "cc") > b = c(TRUE, FALSE, TRUE) > df = data.frame(n, s, b) > df1= createDataFrame(sqlContext, df) > showDF(df1) > x = c(2, 3, 10) > t = c("dd", "ee", "ff") > c = c(FALSE, FALSE, TRUE) > dff = data.frame(x, t, c) > df2 = createDataFrame(sqlContext, dff) > showDF(df2) > res = join(df1, df2, df1$n == df2$x, "semijoin") > showDF(res) > {code} > Running this code, I encountered the error: > {panel} > Error in invokeJava(isStatic = FALSE, objId$id, methodName, ...) : > java.lang.IllegalArgumentException: Unsupported join type 'semijoin'. > Supported join types include: 'inner', 'outer', 'full', 'fullouter', > 'leftouter', 'left', 'rightouter', 'right', 'leftsemi'. > {panel} > However, if I changed the joinType to "leftsemi", > {code} > res = join(df1, df2, df1$n == df2$x, "leftsemi") > {code} > I would get the error: > {panel} > Error in .local(x, y, ...) : > joinType must be one of the following types: 'inner', 'outer', > 'left_outer', 'right_outer', 'semijoin' > {panel} > Since the join function in R appears to invoke a Java method, I went into > DataFrame.R and changed the code on line 1374 and line 1378 to change the > "semijoin" to "leftsemi" to match the Java function's parameters. These also > make the R joinType accepted values match those of Scala's. > semijoin: > {code:title=DataFrame.R: join(x, y, joinExpr, joinType)|borderStyle=solid} > if (joinType %in% c("inner", "outer", "left_outer", "right_outer", > "semijoin")) { > sdf <- callJMethod(x@sdf, "join", y@sdf, joinExpr@jc, joinType) > } > else { > stop("joinType must be one of the following types: ", > "'inner', 'outer', 'left_outer', 'right_outer', 'semijoin'") > } > {code} > leftsemi: > {code:title=DataFrame.R: join(x, y, joinExpr, joinType)|borderStyle=solid} > if (joinType %in% c("inner", "outer", "left_outer", "right_outer", > "leftsemi")) { > sdf <- callJMethod(x@sdf, "join", y@sdf, joinExpr@jc, joinType) > } > else { > stop("joinType must be one of the following types: ", > "'inner', 'outer', 'left_outer', 'right_outer', 'leftsemi'") > } > {code} > This fixed the issue, but I'm not sure if this solution breaks hive > compatibility or causes other issues, but I can submit a pull request to > change this -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org