[GitHub] spark pull request: [SQL][SPARK-2212]Hash Outer Join

marmbrus Thu, 31 Jul 2014 16:37:22 -0700

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1147#discussion_r15676602
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins.scala ---
    @@ -37,6 +37,135 @@ case object BuildLeft extends BuildSide
     @DeveloperApi
     case object BuildRight extends BuildSide
     
    +/**
    + * Constant Value for Binary Join Node
    + */
    +object BinaryJoinNode {
    +  val SINGLE_NULL_LIST = Seq[Row](null)
    +  val EMPTY_NULL_LIST  = Seq[Row]()
    +}
    +
    +// TODO If join key was null should be considered as equal? In Hive this 
is configurable.
    +
    +/**
    + * Output the tuples for the matched (with the same join key) join group, 
base on the join types,
    + * Both input iterators should be repeatable.
    + */
    +trait BinaryRepeatableIteratorNode extends BinaryNode {
    +  self: Product =>
    +
    +  val leftNullRow = new GenericRow(left.output.length)
    +  val rightNullRow = new GenericRow(right.output.length)
    +
    +  val joinedRow = new JoinedRow()
    --- End diff --
    
    I'm worried about having a mutable structure that isn't explicitly 
allocated per partition.  @rxin is doing a lot of work trying to make us more 
efficient by broadcasting closures per job instead of serializing them per task 
and I think this could break in subtle ways.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SQL][SPARK-2212]Hash Outer Join

Reply via email to