[jira] [Commented] (SPARK-9318) Add `merge` as synonym for join

2015-10-06 Thread Hossein Falaki (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14945835#comment-14945835
 ] 

Hossein Falaki commented on SPARK-9318:
---

I agree with the issue being discussed. SparkR should have called this 
signature of join:
{code}
def join(right: DataFrame, usingColumns: Seq[String]): DataFrame
{code}

This version of DataFrame.join makes sure only a single join column is 
returned. Right now the join (and merge) behavior in SparkR is not what R users 
expect.

> Add `merge` as synonym for join
> ---
>
> Key: SPARK-9318
> URL: https://issues.apache.org/jira/browse/SPARK-9318
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Reporter: Shivaram Venkataraman
>Assignee: Hossein Falaki
> Fix For: 1.5.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9318) Add `merge` as synonym for join

2015-10-06 Thread Shivaram Venkataraman (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14945402#comment-14945402
 ] 

Shivaram Venkataraman commented on SPARK-9318:
--

[~Narine] Could you post general questions / issues with `join` to the user / 
dev mailing list ? That way all the devs can respond to this.

> Add `merge` as synonym for join
> ---
>
> Key: SPARK-9318
> URL: https://issues.apache.org/jira/browse/SPARK-9318
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Reporter: Shivaram Venkataraman
>Assignee: Hossein Falaki
> Fix For: 1.5.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9318) Add `merge` as synonym for join

2015-10-05 Thread Narine Kokhlikyan (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14943784#comment-14943784
 ] 

Narine Kokhlikyan commented on SPARK-9318:
--

Hi all,

[~shivaram], [~falaki], 
I am working on the new signature for merge and have noticed that the join in 
general has serous issues.
I took one of the examples from R base:::merge -  
https://stat.ethz.ch/R-manual/R-devel/library/base/html/merge.html

x <- data.frame(k1 = c(NA,NA,3,4,5), k2 = c(1,NA,NA,4,5), data = 1:5)
y <- data.frame(k1 = c(NA,2,NA,4,5), k2 = c(NA,NA,3,4,5), data = 1:5)

I want to do join on this two dataframes: res <- join(xdf,ydf)

res has the following structure:
DataFrame[k1:double, k2:double, data:int, k1:double, k2:double, data:int]

but when I do head(res) I get the following:
 k1 k2 data
1 NA NA1
2  2 NA2
3 NA  33
4  4  44
5  5  55
6 NA NA1

This is not what I was expecting. The structure is inconsistent with the 
content/data I see with head.

I tried to put aliases for those columns which have the same names for both 
data frames with: 

ydfsel <- select(ydf, alias(ydf$k1,"k1.y"), alias(ydf$k2,"k2.y"), 
alias(ydf$data,"data.y"))
xdfsel <- select(xdf, alias(xdf$k1,"k1.x"), alias(xdf$k2,"k2.x"), 
alias(xdf$data,"data.x"))

and this actually works and when I do: join(xdfsel, ydfsel ) - this also works 

but the following fails:
join(xdfsel,ydfsel,xdfsel$k1.x==ydfsel$k1.y)

This means that I cannot refer to alias column??

Do you know what the issue here is ? 

Thanks,
Narine



 

> Add `merge` as synonym for join
> ---
>
> Key: SPARK-9318
> URL: https://issues.apache.org/jira/browse/SPARK-9318
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Reporter: Shivaram Venkataraman
>Assignee: Hossein Falaki
> Fix For: 1.5.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9318) Add `merge` as synonym for join

2015-10-05 Thread Deborah Siegel (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14943810#comment-14943810
 ] 

Deborah Siegel commented on SPARK-9318:
---

Narine, just want to offer that I haven't replicated that problem. 
 
x <- data.frame(k1 = c(NA,NA,3,4,5), k2 = c(1,NA,NA,4,5), data = 1:5)
y <- data.frame(k1 = c(NA,2,NA,4,5), k2 = c(NA,NA,3,4,5), data = 1:5)
xdf <- createDataFrame(sqlContext, x) 
ydf <- createDataFrame(sqlContext, y) 
res <- join(xdf,ydf)
head(res)
  k1 k2 data k1 k2 data
1 NA  11 NA NA1
2 NA  11  2 NA2
3 NA  11 NA  33
4 NA  11  4  44
5 NA  11  5  55
6 NA NA2 NA NA1

> printSchema(res)
root
 |-- k1: double (nullable = true)
 |-- k2: double (nullable = true)
 |-- data: integer (nullable = true)
 |-- k1: double (nullable = true)
 |-- k2: double (nullable = true)
 |-- data: integer (nullable = true)

> Add `merge` as synonym for join
> ---
>
> Key: SPARK-9318
> URL: https://issues.apache.org/jira/browse/SPARK-9318
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Reporter: Shivaram Venkataraman
>Assignee: Hossein Falaki
> Fix For: 1.5.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9318) Add `merge` as synonym for join

2015-10-05 Thread Narine Kokhlikyan (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14943988#comment-14943988
 ] 

Narine Kokhlikyan commented on SPARK-9318:
--

printSchema is showing up correctly for me too. Only the head function returns 
unexpected result 

> Add `merge` as synonym for join
> ---
>
> Key: SPARK-9318
> URL: https://issues.apache.org/jira/browse/SPARK-9318
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Reporter: Shivaram Venkataraman
>Assignee: Hossein Falaki
> Fix For: 1.5.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9318) Add `merge` as synonym for join

2015-10-05 Thread Narine Kokhlikyan (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14943986#comment-14943986
 ] 

Narine Kokhlikyan commented on SPARK-9318:
--

Hi [~dsiegel], thanks for checking it.
Was there a recent fix related to that ? 
Also, have you tried the aliases ? Is it working for you ?

> Add `merge` as synonym for join
> ---
>
> Key: SPARK-9318
> URL: https://issues.apache.org/jira/browse/SPARK-9318
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Reporter: Shivaram Venkataraman
>Assignee: Hossein Falaki
> Fix For: 1.5.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9318) Add `merge` as synonym for join

2015-10-05 Thread Deborah Siegel (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14944007#comment-14944007
 ] 

Deborah Siegel commented on SPARK-9318:
---

not sure about the fix. I tried this on 1.5.0 and 1.5.1, same results. 

regarding the alias column, the issue is that "." in the schema is being 
converted to "_" behind the scenes. This happens automatically when 
createDataFrame is used. But it seems that with alias, it is not being 
converted, however the select is looking for the converted name. 

this works:
ydfsel <- select(ydf, alias(ydf$k1,"k1_y"), alias(ydf$k2,"k2_y"), 
alias(ydf$data,"data_y"))
xdfsel <- select(xdf, alias(xdf$k1,"k1_x"), alias(xdf$k2,"k2_x"), 
alias(xdf$data,"data_x"))
res3 <- join(xdfsel,ydfsel,xdfsel$k1_x==ydfsel$k1_y)



> Add `merge` as synonym for join
> ---
>
> Key: SPARK-9318
> URL: https://issues.apache.org/jira/browse/SPARK-9318
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Reporter: Shivaram Venkataraman
>Assignee: Hossein Falaki
> Fix For: 1.5.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9318) Add `merge` as synonym for join

2015-10-05 Thread Narine Kokhlikyan (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14944056#comment-14944056
 ] 

Narine Kokhlikyan commented on SPARK-9318:
--

I asked other ppl to try this and they all see 
  k1 k2 data
1 NA NA1
2  2 NA2
3 NA  33
4  4  44
5  5  55
6 NA NA1


we just run :
x <- data.frame(k1 = c(NA,NA,3,4,5), k2 = c(1,NA,NA,4,5), data = 1:5)
y <- data.frame(k1 = c(NA,2,NA,4,5), k2 = c(NA,NA,3,4,5), data = 1:5)
xdf <- createDataFrame(sqlContext, x)
ydf <- createDataFrame(sqlContext, y)
res <- join(xdf,ydf)
head(res)

Can anyone else try this ?

[~olarayej] [~shivaram]




> Add `merge` as synonym for join
> ---
>
> Key: SPARK-9318
> URL: https://issues.apache.org/jira/browse/SPARK-9318
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Reporter: Shivaram Venkataraman
>Assignee: Hossein Falaki
> Fix For: 1.5.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9318) Add `merge` as synonym for join

2015-09-28 Thread Shivaram Venkataraman (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934484#comment-14934484
 ] 

Shivaram Venkataraman commented on SPARK-9318:
--

Sure. Feel free to send a PR and cc [~falaki] on it as well. 

> Add `merge` as synonym for join
> ---
>
> Key: SPARK-9318
> URL: https://issues.apache.org/jira/browse/SPARK-9318
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Reporter: Shivaram Venkataraman
>Assignee: Hossein Falaki
> Fix For: 1.5.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9318) Add `merge` as synonym for join

2015-09-28 Thread Narine Kokhlikyan (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934376#comment-14934376
 ] 

Narine Kokhlikyan commented on SPARK-9318:
--

Hi guys, 
can we reopen this issue Shivaram Venkataraman ?

The reason is that merge should follow the following signature: 
merge(x, y, by = intersect(names(x), names(y)),
  by.x = by, by.y = by, all = FALSE, all.x = all, all.y = all,
  sort = TRUE, suffixes = c(".x",".y"),
  incomparables = NULL, ...)

I'm working on this and will do a pull request soon .

Thanks,
Narine


> Add `merge` as synonym for join
> ---
>
> Key: SPARK-9318
> URL: https://issues.apache.org/jira/browse/SPARK-9318
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Reporter: Shivaram Venkataraman
>Assignee: Hossein Falaki
> Fix For: 1.5.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9318) Add `merge` as synonym for join

2015-07-30 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648387#comment-14648387
 ] 

Apache Spark commented on SPARK-9318:
-

User 'falaki' has created a pull request for this issue:
https://github.com/apache/spark/pull/7806

 Add `merge` as synonym for join
 ---

 Key: SPARK-9318
 URL: https://issues.apache.org/jira/browse/SPARK-9318
 Project: Spark
  Issue Type: Sub-task
  Components: SparkR
Reporter: Shivaram Venkataraman





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org