[jira] [Commented] (SPARK-546) Support full outer join and multiple join in a single shuffle

2014-12-04 Thread Reynold Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14234404#comment-14234404
 ] 

Reynold Xin commented on SPARK-546:
---

Actually my experience implementing full join in a single shuffle is that it is 
fairly complicated and very hard to maintain. Since it is doable entirely in 
user code and given SparkSQL's SchemaRDD already supports it, I suggest not 
pulling this in Spark core. 

> Support full outer join and multiple join in a single shuffle
> -
>
> Key: SPARK-546
> URL: https://issues.apache.org/jira/browse/SPARK-546
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, Streaming
>Reporter: Reynold Xin
>Assignee: Aaron Staple
> Fix For: 1.2.0
>
>
> RDD[(K,V)] now supports left/right outer join but not full outer join.
> Also it'd be nice to provide a way for users to join multiple RDDs on the 
> same key in a single shuffle.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-546) Support full outer join and multiple join in a single shuffle

2014-12-04 Thread Thiago Souza (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14234383#comment-14234383
 ] 

Thiago Souza commented on SPARK-546:


What about #2? Did you file a new ticket?

I'm quite interested on this!

> Support full outer join and multiple join in a single shuffle
> -
>
> Key: SPARK-546
> URL: https://issues.apache.org/jira/browse/SPARK-546
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, Streaming
>Reporter: Reynold Xin
>Assignee: Aaron Staple
> Fix For: 1.2.0
>
>
> RDD[(K,V)] now supports left/right outer join but not full outer join.
> Also it'd be nice to provide a way for users to join multiple RDDs on the 
> same key in a single shuffle.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-546) Support full outer join and multiple join in a single shuffle

2014-09-25 Thread Aaron Staple (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14148052#comment-14148052
 ] 

Aaron Staple commented on SPARK-546:


Hi, I think there are two features requested in this ticket:

1) full outer join
2) an RDD function to join >2 rdds in a single shuffle (e.g. multiJoin function)

I’ve implemented #1 in my recent PR, but not #2. I’m happy to implement #2 as 
well though.

Would it make sense to reopen this ticket? File a new ticket?

> Support full outer join and multiple join in a single shuffle
> -
>
> Key: SPARK-546
> URL: https://issues.apache.org/jira/browse/SPARK-546
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, Streaming
>Reporter: Reynold Xin
>Assignee: Aaron Staple
> Fix For: 1.2.0
>
>
> RDD[(K,V)] now supports left/right outer join but not full outer join.
> Also it'd be nice to provide a way for users to join multiple RDDs on the 
> same key in a single shuffle.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-546) Support full outer join and multiple join in a single shuffle

2014-07-14 Thread Aaron (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14060763#comment-14060763
 ] 

Aaron commented on SPARK-546:
-

I created a PR for a full outer join implementation here:
https://github.com/apache/spark/pull/1395

If there is interest I can also implement multiJoin.

> Support full outer join and multiple join in a single shuffle
> -
>
> Key: SPARK-546
> URL: https://issues.apache.org/jira/browse/SPARK-546
> Project: Spark
>  Issue Type: Improvement
>Reporter: Reynold Xin
>
> RDD[(K,V)] now supports left/right outer join but not full outer join.
> Also it'd be nice to provide a way for users to join multiple RDDs on the 
> same key in a single shuffle.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-546) Support full outer join and multiple join in a single shuffle

2014-07-09 Thread sam (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14056355#comment-14056355
 ] 

sam commented on SPARK-546:
---

We use a pimp-my-library pattern to add this functionality. Basically here's 
our code:

case class OuterJoinableRDD[K: ClassManifest, V1: ClassManifest](rdd: RDD[(K, 
V1)]) extends RDDWrapper[(K, V1)] {
  def outerJoin[V2](other: RDD[(K, V2)], numPartitions: Int): RDD[(K, 
(Option[V1], Option[V2]))] = 
rdd.cogroup(other, new HashPartitioner(numPartitions)).flatMapValues {
  case (v1s, Seq()) => v1s.iterator.map(v1 => (Some(v1), None))
  case (Seq(), v2s) => v2s.iterator.map(v2 => (None, Some(v2)))
  case (v1s, v2s) => v1s.iterator.flatMap(v1 => v2s.iterator.map(v2 => 
(Some(v1), Some(v2
}
}

Hope it helps :) (disclaimer - code in testing)

> Support full outer join and multiple join in a single shuffle
> -
>
> Key: SPARK-546
> URL: https://issues.apache.org/jira/browse/SPARK-546
> Project: Spark
>  Issue Type: Improvement
>Reporter: Reynold Xin
>
> RDD[(K,V)] now supports left/right outer join but not full outer join.
> Also it'd be nice to provide a way for users to join multiple RDDs on the 
> same key in a single shuffle.



--
This message was sent by Atlassian JIRA
(v6.2#6252)