[jira] [Commented] (SPARK-546) Support full outer join and multiple join in a single shuffle
[ https://issues.apache.org/jira/browse/SPARK-546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14234404#comment-14234404 ] Reynold Xin commented on SPARK-546: --- Actually my experience implementing full join in a single shuffle is that it is fairly complicated and very hard to maintain. Since it is doable entirely in user code and given SparkSQL's SchemaRDD already supports it, I suggest not pulling this in Spark core. > Support full outer join and multiple join in a single shuffle > - > > Key: SPARK-546 > URL: https://issues.apache.org/jira/browse/SPARK-546 > Project: Spark > Issue Type: Improvement > Components: Spark Core, Streaming >Reporter: Reynold Xin >Assignee: Aaron Staple > Fix For: 1.2.0 > > > RDD[(K,V)] now supports left/right outer join but not full outer join. > Also it'd be nice to provide a way for users to join multiple RDDs on the > same key in a single shuffle. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-546) Support full outer join and multiple join in a single shuffle
[ https://issues.apache.org/jira/browse/SPARK-546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14234383#comment-14234383 ] Thiago Souza commented on SPARK-546: What about #2? Did you file a new ticket? I'm quite interested on this! > Support full outer join and multiple join in a single shuffle > - > > Key: SPARK-546 > URL: https://issues.apache.org/jira/browse/SPARK-546 > Project: Spark > Issue Type: Improvement > Components: Spark Core, Streaming >Reporter: Reynold Xin >Assignee: Aaron Staple > Fix For: 1.2.0 > > > RDD[(K,V)] now supports left/right outer join but not full outer join. > Also it'd be nice to provide a way for users to join multiple RDDs on the > same key in a single shuffle. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-546) Support full outer join and multiple join in a single shuffle
[ https://issues.apache.org/jira/browse/SPARK-546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14148052#comment-14148052 ] Aaron Staple commented on SPARK-546: Hi, I think there are two features requested in this ticket: 1) full outer join 2) an RDD function to join >2 rdds in a single shuffle (e.g. multiJoin function) I’ve implemented #1 in my recent PR, but not #2. I’m happy to implement #2 as well though. Would it make sense to reopen this ticket? File a new ticket? > Support full outer join and multiple join in a single shuffle > - > > Key: SPARK-546 > URL: https://issues.apache.org/jira/browse/SPARK-546 > Project: Spark > Issue Type: Improvement > Components: Spark Core, Streaming >Reporter: Reynold Xin >Assignee: Aaron Staple > Fix For: 1.2.0 > > > RDD[(K,V)] now supports left/right outer join but not full outer join. > Also it'd be nice to provide a way for users to join multiple RDDs on the > same key in a single shuffle. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-546) Support full outer join and multiple join in a single shuffle
[ https://issues.apache.org/jira/browse/SPARK-546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14060763#comment-14060763 ] Aaron commented on SPARK-546: - I created a PR for a full outer join implementation here: https://github.com/apache/spark/pull/1395 If there is interest I can also implement multiJoin. > Support full outer join and multiple join in a single shuffle > - > > Key: SPARK-546 > URL: https://issues.apache.org/jira/browse/SPARK-546 > Project: Spark > Issue Type: Improvement >Reporter: Reynold Xin > > RDD[(K,V)] now supports left/right outer join but not full outer join. > Also it'd be nice to provide a way for users to join multiple RDDs on the > same key in a single shuffle. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-546) Support full outer join and multiple join in a single shuffle
[ https://issues.apache.org/jira/browse/SPARK-546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14056355#comment-14056355 ] sam commented on SPARK-546: --- We use a pimp-my-library pattern to add this functionality. Basically here's our code: case class OuterJoinableRDD[K: ClassManifest, V1: ClassManifest](rdd: RDD[(K, V1)]) extends RDDWrapper[(K, V1)] { def outerJoin[V2](other: RDD[(K, V2)], numPartitions: Int): RDD[(K, (Option[V1], Option[V2]))] = rdd.cogroup(other, new HashPartitioner(numPartitions)).flatMapValues { case (v1s, Seq()) => v1s.iterator.map(v1 => (Some(v1), None)) case (Seq(), v2s) => v2s.iterator.map(v2 => (None, Some(v2))) case (v1s, v2s) => v1s.iterator.flatMap(v1 => v2s.iterator.map(v2 => (Some(v1), Some(v2 } } Hope it helps :) (disclaimer - code in testing) > Support full outer join and multiple join in a single shuffle > - > > Key: SPARK-546 > URL: https://issues.apache.org/jira/browse/SPARK-546 > Project: Spark > Issue Type: Improvement >Reporter: Reynold Xin > > RDD[(K,V)] now supports left/right outer join but not full outer join. > Also it'd be nice to provide a way for users to join multiple RDDs on the > same key in a single shuffle. -- This message was sent by Atlassian JIRA (v6.2#6252)