[ 
https://issues.apache.org/jira/browse/SPARK-10981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Monica Liu updated SPARK-10981:
-------------------------------
    Description: 
I am using SparkR from RStudio, and I ran into an error with the join function 
that I recreated with a smaller example:

{code:title=joinTest.R|borderStyle=solid}
Sys.setenv(SPARK_HOME="/Users/liumo1/Applications/spark/")
.libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))
library(SparkR)
sc <- sparkR.init("local[4]")
sqlContext <- sparkRSQL.init(sc) 

n = c(2, 3, 5)
s = c("aa", "bb", "cc")
b = c(TRUE, FALSE, TRUE)
df = data.frame(n, s, b)
df1= createDataFrame(sqlContext, df)
showDF(df1)

x = c(2, 3, 10)
t = c("dd", "ee", "ff")
c = c(FALSE, FALSE, TRUE)
dff = data.frame(x, t, c)
df2 = createDataFrame(sqlContext, dff)
showDF(df2)
res = join(df1, df2, df1$n == df2$x, "semijoin")
showDF(res)
{code}

Running this code, I encountered the error:
{panel}
Error in invokeJava(isStatic = FALSE, objId$id, methodName, ...) : 
  java.lang.IllegalArgumentException: Unsupported join type 'semijoin'. 
Supported join types include: 'inner', 'outer', 'full', 'fullouter', 
'leftouter', 'left', 'rightouter', 'right', 'leftsemi'.
{panel}

However, if I changed the joinType to "leftsemi", 
{code}
res = join(df1, df2, df1$n == df2$x, "leftsemi")
{code}

I would get the error:
{panel}
Error in .local(x, y, ...) : 
  joinType must be one of the following types: 'inner', 'outer', 'left_outer', 
'right_outer', 'semijoin'
{panel}

Since the join function in R appears to invoke a Java method, I went into 
DataFrame.R and changed the code on line 1374 and line 1378 to change the 
"semijoin" to "leftsemi" to match the Java function's parameters. These also 
make the R joinType accepted values match those of Scala's. 

semijoin:
{code:title=DataFrame.R: join(x, y, joinExpr, joinType)|borderStyle=solid}
if (joinType %in% c("inner", "outer", "left_outer", "right_outer", "semijoin")) 
{
    sdf <- callJMethod(x@sdf, "join", y@sdf, joinExpr@jc, joinType)
} 
else {
     stop("joinType must be one of the following types: ",
             "'inner', 'outer', 'left_outer', 'right_outer', 'semijoin'")
}
{code}

leftsemi:
{code:title=DataFrame.R: join(x, y, joinExpr, joinType)|borderStyle=solid}
if (joinType %in% c("inner", "outer", "left_outer", "right_outer", "leftsemi")) 
{
    sdf <- callJMethod(x@sdf, "join", y@sdf, joinExpr@jc, joinType)
} 
else {
     stop("joinType must be one of the following types: ",
             "'inner', 'outer', 'left_outer', 'right_outer', 'leftsemi'")
}
{code}

This fixed the issue, but I'm not sure if this solution breaks hive 
compatibility or causes other issues, but I can submit a pull request to change 
this

  was:
I am using SparkR from RStudio, and I ran into an error with the join function 
that I recreated with a smaller example:

{code:title=joinTest.R|borderStyle=solid}
Sys.setenv(SPARK_HOME="/Users/liumo1/Applications/spark/")
.libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))
library(SparkR)
sc <- sparkR.init("local[4]")
sqlContext <- sparkRSQL.init(sc) 

n = c(2, 3, 5)
s = c("aa", "bb", "cc")
b = c(TRUE, FALSE, TRUE)
df = data.frame(n, s, b)
df1= createDataFrame(sqlContext, df)
showDF(df1)

x = c(2, 3, 10)
t = c("dd", "ee", "ff")
c = c(FALSE, FALSE, TRUE)
dff = data.frame(x, t, c)
df2 = createDataFrame(sqlContext, dff)
showDF(df2)
res = join(df1, df2, df1$n == df2$x, "semijoin")
showDF(res)
{code}

Running this code, I encountered the error:
{panel}
Error in invokeJava(isStatic = FALSE, objId$id, methodName, ...) : 
  java.lang.IllegalArgumentException: Unsupported join type 'semijoin'. 
Supported join types include: 'inner', 'outer', 'full', 'fullouter', 
'leftouter', 'left', 'rightouter', 'right', 'leftsemi'.
{panel}

However, if I changed the joinType to "leftsemi", 
{code}
res = join(df1, df2, df1$n == df2$x, "leftsemi")
{code}

I would get the error:
{panel}
Error in .local(x, y, ...) : 
  joinType must be one of the following types: 'inner', 'outer', 'left_outer', 
'right_outer', 'semijoin'
{panel}

Since the join function in R appears to invoke a Java method, I went into 
DataFrame.R and changed the code on line 1374 and line 1378 to change the 
"semijoin" to "leftsemi" to match the Java function's parameters. These also 
make the R joinType accepted values match those of Scala's. 

semijoin:
{code:title=DataFrame.R: join(x, y, joinExpr, joinType)|borderStyle=solid}
if (joinType %in% c("inner", "outer", "left_outer", "right_outer", "semijoin")) 
{
    sdf <- callJMethod(x@sdf, "join", y@sdf, joinExpr@jc, joinType)
} 
else {
     stop("joinType must be one of the following types: ",
             "'inner', 'outer', 'left_outer', 'right_outer', 'semijoin'")
}
{code}

leftsemi:
{code:title=DataFrame.R: join(x, y, joinExpr, joinType)|borderStyle=solid}
if (joinType %in% c("inner", "outer", "left_outer", "right_outer", "leftsemi")) 
{
    sdf <- callJMethod(x@sdf, "join", y@sdf, joinExpr@jc, joinType)
} 
else {
     stop("joinType must be one of the following types: ",
             "'inner', 'outer', 'left_outer', 'right_outer', 'leftsemi'")
}
{code}

This fixed the issue, but I'm not sure if this solution breaks hive 
compatibility or causes other issues, or if this issue is caused by a 
compatibility issue elsewhere. 


> R semijoin leads to Java errors, R leftsemi leads to Spark errors
> -----------------------------------------------------------------
>
>                 Key: SPARK-10981
>                 URL: https://issues.apache.org/jira/browse/SPARK-10981
>             Project: Spark
>          Issue Type: Bug
>          Components: R
>    Affects Versions: 1.5.0
>         Environment: SparkR from RStudio on Macbook
>            Reporter: Monica Liu
>            Priority: Minor
>              Labels: easyfix, newbie
>
> I am using SparkR from RStudio, and I ran into an error with the join 
> function that I recreated with a smaller example:
> {code:title=joinTest.R|borderStyle=solid}
> Sys.setenv(SPARK_HOME="/Users/liumo1/Applications/spark/")
> .libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))
> library(SparkR)
> sc <- sparkR.init("local[4]")
> sqlContext <- sparkRSQL.init(sc) 
> n = c(2, 3, 5)
> s = c("aa", "bb", "cc")
> b = c(TRUE, FALSE, TRUE)
> df = data.frame(n, s, b)
> df1= createDataFrame(sqlContext, df)
> showDF(df1)
> x = c(2, 3, 10)
> t = c("dd", "ee", "ff")
> c = c(FALSE, FALSE, TRUE)
> dff = data.frame(x, t, c)
> df2 = createDataFrame(sqlContext, dff)
> showDF(df2)
> res = join(df1, df2, df1$n == df2$x, "semijoin")
> showDF(res)
> {code}
> Running this code, I encountered the error:
> {panel}
> Error in invokeJava(isStatic = FALSE, objId$id, methodName, ...) : 
>   java.lang.IllegalArgumentException: Unsupported join type 'semijoin'. 
> Supported join types include: 'inner', 'outer', 'full', 'fullouter', 
> 'leftouter', 'left', 'rightouter', 'right', 'leftsemi'.
> {panel}
> However, if I changed the joinType to "leftsemi", 
> {code}
> res = join(df1, df2, df1$n == df2$x, "leftsemi")
> {code}
> I would get the error:
> {panel}
> Error in .local(x, y, ...) : 
>   joinType must be one of the following types: 'inner', 'outer', 
> 'left_outer', 'right_outer', 'semijoin'
> {panel}
> Since the join function in R appears to invoke a Java method, I went into 
> DataFrame.R and changed the code on line 1374 and line 1378 to change the 
> "semijoin" to "leftsemi" to match the Java function's parameters. These also 
> make the R joinType accepted values match those of Scala's. 
> semijoin:
> {code:title=DataFrame.R: join(x, y, joinExpr, joinType)|borderStyle=solid}
> if (joinType %in% c("inner", "outer", "left_outer", "right_outer", 
> "semijoin")) {
>     sdf <- callJMethod(x@sdf, "join", y@sdf, joinExpr@jc, joinType)
> } 
> else {
>      stop("joinType must be one of the following types: ",
>              "'inner', 'outer', 'left_outer', 'right_outer', 'semijoin'")
> }
> {code}
> leftsemi:
> {code:title=DataFrame.R: join(x, y, joinExpr, joinType)|borderStyle=solid}
> if (joinType %in% c("inner", "outer", "left_outer", "right_outer", 
> "leftsemi")) {
>     sdf <- callJMethod(x@sdf, "join", y@sdf, joinExpr@jc, joinType)
> } 
> else {
>      stop("joinType must be one of the following types: ",
>              "'inner', 'outer', 'left_outer', 'right_outer', 'leftsemi'")
> }
> {code}
> This fixed the issue, but I'm not sure if this solution breaks hive 
> compatibility or causes other issues, but I can submit a pull request to 
> change this



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to