[ 
https://issues.apache.org/jira/browse/ARROW-15590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17530201#comment-17530201
 ] 

Weston Pace edited comment on ARROW-15590 at 4/29/22 7:16 PM:
--------------------------------------------------------------

The Substrait spec (the website) doesn't always match the .proto yet.  This is 
not a great thing but it's a work in progress.  Feel free to open some PRs 
against the site if you want.  In the meantime I find it easier to work with 
the proto:

{noformat}
message JoinRel {
  RelCommon common = 1;
  Rel left = 2;
  Rel right = 3;
  Expression expression = 4;
  Expression post_join_filter = 5;

  JoinType type = 6;
  ...
}
{noformat}

The {{post_join_filter}} is not on the site today and should match 
{{HashJoinNodeOptions::filter}}.

{{LeftInput}} and {{RightInput}} correspond to the inputs specified when adding 
a join to a plan and so they aren't in {{HashJoinNodeOptions}}:

{noformat}
MakeExecNode("hashjoin", plan.get(), {LeftInput, RightInput}, join_options));
{noformat}

You are correct that we do not handle expressions in general for the join 
condition.  So I think the best thing to do here initially is restrict the set 
of allowed plans.  If the expression is not a call then reject it.  If the 
expression is a call then it must be one of two functions, "equal" or 
"is_not_distinct_from".  In either case the function has two arguments.  Both 
arguments must be a {{FieldReference}}.  We can convert from a Substrait 
{{FieldReference}} to an Arrow {{FieldRef}} and so that will give you left keys 
and right keys.  There is an Arrow options {{HashJoinNodeOptions::key_cmp}}.  
If the Substrait function is "equal" then use {{JoinKeyCmp::Eq}}.  If the 
Substrait function is "is_not_distinct_from" then use {{JoinKeyCmp::Is}}.

With the above approach you will always have exactly one left key, one right 
key, and one join type.

Later (could be in this PR or a follow-up) we can also handle expressions that 
are an and'ed set of equality expressions:

{noformat}
and(equal(field(3),field(5)), equal(field(1),field(7)), equal(field(2), 
field(12)))
{noformat}

In this case the number of keys/join types you have would depend on the number 
of equality expressions in the and (3 in the above example).


was (Author: westonpace):
The Substrait spec (the website) doesn't always match the .proto yet.  This is 
not a great thing but it's a work in progress.  Feel free to open some PRs 
against the site if you want.  In the meantime I find it easier to work with 
the proto:

```
message JoinRel {
  RelCommon common = 1;
  Rel left = 2;
  Rel right = 3;
  Expression expression = 4;
  Expression post_join_filter = 5;

  JoinType type = 6;
  ...
}
```

The {{post_join_filter}} is not on the site today and should match 
{{HashJoinNodeOptions::filter}}.

{{LeftInput}} and {{RightInput}} correspond to the inputs specified when adding 
a join to a plan and so they aren't in {{HashJoinNodeOptions}}:

```
MakeExecNode("hashjoin", plan.get(), {LeftInput, RightInput}, join_options));
```

You are correct that we do not handle expressions in general for the join 
condition.  So I think the best thing to do here initially is restrict the set 
of allowed plans.  If the expression is not a call then reject it.  If the 
expression is a call then it must be one of two functions, "equal" or 
"is_not_distinct_from".  In either case the function has two arguments.  Both 
arguments must be a {{FieldReference}}.  We can convert from a Substrait 
{{FieldReference}} to an Arrow {{FieldRef}} and so that will give you left keys 
and right keys.  There is an Arrow options {{HashJoinNodeOptions::key_cmp}}.  
If the Substrait function is "equal" then use {{JoinKeyCmp::Eq}}.  If the 
Substrait function is "is_not_distinct_from" then use {{JoinKeyCmp::Is}}.

With the above approach you will always have exactly one left key, one right 
key, and one join type.

Later (could be in this PR or a follow-up) we can also handle expressions that 
are an and'ed set of equality expressions:

{noformat}
and(equal(field(3),field(5)), equal(field(1),field(7)), equal(field(2), 
field(12)))
{noformat}

In this case the number of keys/join types you have would depend on the number 
of equality expressions in the and (3 in the above example).

> [C++] Add support for joins to the Substrait consumer
> -----------------------------------------------------
>
>                 Key: ARROW-15590
>                 URL: https://issues.apache.org/jira/browse/ARROW-15590
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Weston Pace
>            Assignee: Vibhatha Lakmal Abeykoon
>            Priority: Major
>              Labels: substrait
>
> The streaming execution engine supports joins.  The Substrait consumer does 
> not currently consume joins.  We should add support for this.  We may want to 
> split this PR into subtasks as there are many different kinds of joins and we 
> may not support all of them immediately.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to