Re: why BroadcastHashJoinExec is not implemented with outputOrdering?

2018-06-29 Thread Marco Gaido
Yes, I'd say so.

2018-06-29 4:43 GMT+02:00 吴晓菊 :

> And it should be generic for HashJoin not only broadcast join, right?
>
>
> Chrysan Wu
> 吴晓菊
> Phone:+86 17717640807
>
>
> 2018-06-29 10:42 GMT+08:00 吴晓菊 :
>
>> Sorry for the mistake. You are right output ordering of broadcast join
>> can be the order of big table in some types of join. I will prepare a PR
>> and let you review later. Thanks a lot!
>>
>>
>> Chrysan Wu
>> 吴晓菊
>> Phone:+86 17717640807
>>
>>
>> 2018-06-29 0:00 GMT+08:00 Wenchen Fan :
>>
>>> SortMergeJoin sorts its children by join key, but broadcast join does
>>> not. I think the output ordering of broadcast join has nothing to do with
>>> join key.
>>>
>>> On Thu, Jun 28, 2018 at 11:28 PM Marco Gaido 
>>> wrote:
>>>
 I think the outputOrdering would be the one of the big table (if any)
 and it wouldn't matter if this involves the join keys or not. Am I wrong?

 2018-06-28 17:01 GMT+02:00 吴晓菊 :

> Thanks for the reply.
> By looking into the SortMergeJoinExec, I think we can follow what
> SortMergeJoin do, for some types of join, if the children is ordered on
> join keys, we can output the ordered join keys as output ordering.
>
>
> Chrysan Wu
> 吴晓菊
> Phone:+86 17717640807
>
>
> 2018-06-28 22:53 GMT+08:00 Wenchen Fan :
>
>> SortMergeJoin only reports ordering of the join keys, not the output
>> ordering of any child.
>>
>> It seems reasonable to me that broadcast join should respect the
>> output ordering of the children. Feel free to submit a PR to fix it, 
>> thanks!
>>
>> On Thu, Jun 28, 2018 at 10:07 PM 吴晓菊  wrote:
>>
>>> Why we cannot use the output order of big table?
>>>
>>>
>>> Chrysan Wu
>>> Phone:+86 17717640807
>>>
>>>
>>> 2018-06-28 21:48 GMT+08:00 Marco Gaido :
>>>
 The easy answer to this is that SortMergeJoin ensure an
 outputOrdering, while BroadcastHashJoin doesn't, ie. after running a
 BroadcastHashJoin you don't know which is going to be the order of the
 output since nothing enforces it.

 Hope this helps.
 Thanks.
 Marco

 2018-06-28 15:46 GMT+02:00 吴晓菊 :

>
> We see SortMergeJoinExec is implemented with
> outputPartitioning while BroadcastHashJoinExec is
> only implemented with outputPartitioning. Why is the design?
>
> Chrysan Wu
> Phone:+86 17717640807
>
>

>>>
>

>>
>


Re: why BroadcastHashJoinExec is not implemented with outputOrdering?

2018-06-28 Thread 吴晓菊
And it should be generic for HashJoin not only broadcast join, right?


Chrysan Wu
吴晓菊
Phone:+86 17717640807


2018-06-29 10:42 GMT+08:00 吴晓菊 :

> Sorry for the mistake. You are right output ordering of broadcast join can
> be the order of big table in some types of join. I will prepare a PR and
> let you review later. Thanks a lot!
>
>
> Chrysan Wu
> 吴晓菊
> Phone:+86 17717640807
>
>
> 2018-06-29 0:00 GMT+08:00 Wenchen Fan :
>
>> SortMergeJoin sorts its children by join key, but broadcast join does
>> not. I think the output ordering of broadcast join has nothing to do with
>> join key.
>>
>> On Thu, Jun 28, 2018 at 11:28 PM Marco Gaido 
>> wrote:
>>
>>> I think the outputOrdering would be the one of the big table (if any)
>>> and it wouldn't matter if this involves the join keys or not. Am I wrong?
>>>
>>> 2018-06-28 17:01 GMT+02:00 吴晓菊 :
>>>
 Thanks for the reply.
 By looking into the SortMergeJoinExec, I think we can follow what
 SortMergeJoin do, for some types of join, if the children is ordered on
 join keys, we can output the ordered join keys as output ordering.


 Chrysan Wu
 吴晓菊
 Phone:+86 17717640807


 2018-06-28 22:53 GMT+08:00 Wenchen Fan :

> SortMergeJoin only reports ordering of the join keys, not the output
> ordering of any child.
>
> It seems reasonable to me that broadcast join should respect the
> output ordering of the children. Feel free to submit a PR to fix it, 
> thanks!
>
> On Thu, Jun 28, 2018 at 10:07 PM 吴晓菊  wrote:
>
>> Why we cannot use the output order of big table?
>>
>>
>> Chrysan Wu
>> Phone:+86 17717640807
>>
>>
>> 2018-06-28 21:48 GMT+08:00 Marco Gaido :
>>
>>> The easy answer to this is that SortMergeJoin ensure an
>>> outputOrdering, while BroadcastHashJoin doesn't, ie. after running a
>>> BroadcastHashJoin you don't know which is going to be the order of the
>>> output since nothing enforces it.
>>>
>>> Hope this helps.
>>> Thanks.
>>> Marco
>>>
>>> 2018-06-28 15:46 GMT+02:00 吴晓菊 :
>>>

 We see SortMergeJoinExec is implemented with
 outputPartitioning while BroadcastHashJoinExec is
 only implemented with outputPartitioning. Why is the design?

 Chrysan Wu
 Phone:+86 17717640807


>>>
>>

>>>
>


Re: why BroadcastHashJoinExec is not implemented with outputOrdering?

2018-06-28 Thread 吴晓菊
Sorry for the mistake. You are right output ordering of broadcast join can
be the order of big table in some types of join. I will prepare a PR and
let you review later. Thanks a lot!


Chrysan Wu
吴晓菊
Phone:+86 17717640807


2018-06-29 0:00 GMT+08:00 Wenchen Fan :

> SortMergeJoin sorts its children by join key, but broadcast join does not.
> I think the output ordering of broadcast join has nothing to do with join
> key.
>
> On Thu, Jun 28, 2018 at 11:28 PM Marco Gaido 
> wrote:
>
>> I think the outputOrdering would be the one of the big table (if any) and
>> it wouldn't matter if this involves the join keys or not. Am I wrong?
>>
>> 2018-06-28 17:01 GMT+02:00 吴晓菊 :
>>
>>> Thanks for the reply.
>>> By looking into the SortMergeJoinExec, I think we can follow what
>>> SortMergeJoin do, for some types of join, if the children is ordered on
>>> join keys, we can output the ordered join keys as output ordering.
>>>
>>>
>>> Chrysan Wu
>>> 吴晓菊
>>> Phone:+86 17717640807
>>>
>>>
>>> 2018-06-28 22:53 GMT+08:00 Wenchen Fan :
>>>
 SortMergeJoin only reports ordering of the join keys, not the output
 ordering of any child.

 It seems reasonable to me that broadcast join should respect the output
 ordering of the children. Feel free to submit a PR to fix it, thanks!

 On Thu, Jun 28, 2018 at 10:07 PM 吴晓菊  wrote:

> Why we cannot use the output order of big table?
>
>
> Chrysan Wu
> Phone:+86 17717640807
>
>
> 2018-06-28 21:48 GMT+08:00 Marco Gaido :
>
>> The easy answer to this is that SortMergeJoin ensure an
>> outputOrdering, while BroadcastHashJoin doesn't, ie. after running a
>> BroadcastHashJoin you don't know which is going to be the order of the
>> output since nothing enforces it.
>>
>> Hope this helps.
>> Thanks.
>> Marco
>>
>> 2018-06-28 15:46 GMT+02:00 吴晓菊 :
>>
>>>
>>> We see SortMergeJoinExec is implemented with 
>>> outputPartitioning
>>> while BroadcastHashJoinExec is only implemented with outputPartitioning.
>>> Why is the design?
>>>
>>> Chrysan Wu
>>> Phone:+86 17717640807
>>>
>>>
>>
>
>>>
>>


Re: why BroadcastHashJoinExec is not implemented with outputOrdering?

2018-06-28 Thread Wenchen Fan
SortMergeJoin sorts its children by join key, but broadcast join does not.
I think the output ordering of broadcast join has nothing to do with join
key.

On Thu, Jun 28, 2018 at 11:28 PM Marco Gaido  wrote:

> I think the outputOrdering would be the one of the big table (if any) and
> it wouldn't matter if this involves the join keys or not. Am I wrong?
>
> 2018-06-28 17:01 GMT+02:00 吴晓菊 :
>
>> Thanks for the reply.
>> By looking into the SortMergeJoinExec, I think we can follow what
>> SortMergeJoin do, for some types of join, if the children is ordered on
>> join keys, we can output the ordered join keys as output ordering.
>>
>>
>> Chrysan Wu
>> 吴晓菊
>> Phone:+86 17717640807
>>
>>
>> 2018-06-28 22:53 GMT+08:00 Wenchen Fan :
>>
>>> SortMergeJoin only reports ordering of the join keys, not the output
>>> ordering of any child.
>>>
>>> It seems reasonable to me that broadcast join should respect the output
>>> ordering of the children. Feel free to submit a PR to fix it, thanks!
>>>
>>> On Thu, Jun 28, 2018 at 10:07 PM 吴晓菊  wrote:
>>>
 Why we cannot use the output order of big table?


 Chrysan Wu
 Phone:+86 17717640807


 2018-06-28 21:48 GMT+08:00 Marco Gaido :

> The easy answer to this is that SortMergeJoin ensure an
> outputOrdering, while BroadcastHashJoin doesn't, ie. after running a
> BroadcastHashJoin you don't know which is going to be the order of the
> output since nothing enforces it.
>
> Hope this helps.
> Thanks.
> Marco
>
> 2018-06-28 15:46 GMT+02:00 吴晓菊 :
>
>>
>> We see SortMergeJoinExec is implemented with
>> outputPartitioning while BroadcastHashJoinExec is only
>> implemented with outputPartitioning. Why is the design?
>>
>> Chrysan Wu
>> Phone:+86 17717640807
>>
>>
>

>>
>


Re: why BroadcastHashJoinExec is not implemented with outputOrdering?

2018-06-28 Thread Marco Gaido
I think the outputOrdering would be the one of the big table (if any) and
it wouldn't matter if this involves the join keys or not. Am I wrong?

2018-06-28 17:01 GMT+02:00 吴晓菊 :

> Thanks for the reply.
> By looking into the SortMergeJoinExec, I think we can follow what
> SortMergeJoin do, for some types of join, if the children is ordered on
> join keys, we can output the ordered join keys as output ordering.
>
>
> Chrysan Wu
> 吴晓菊
> Phone:+86 17717640807
>
>
> 2018-06-28 22:53 GMT+08:00 Wenchen Fan :
>
>> SortMergeJoin only reports ordering of the join keys, not the output
>> ordering of any child.
>>
>> It seems reasonable to me that broadcast join should respect the output
>> ordering of the children. Feel free to submit a PR to fix it, thanks!
>>
>> On Thu, Jun 28, 2018 at 10:07 PM 吴晓菊  wrote:
>>
>>> Why we cannot use the output order of big table?
>>>
>>>
>>> Chrysan Wu
>>> Phone:+86 17717640807
>>>
>>>
>>> 2018-06-28 21:48 GMT+08:00 Marco Gaido :
>>>
 The easy answer to this is that SortMergeJoin ensure an outputOrdering,
 while BroadcastHashJoin doesn't, ie. after running a BroadcastHashJoin you
 don't know which is going to be the order of the output since nothing
 enforces it.

 Hope this helps.
 Thanks.
 Marco

 2018-06-28 15:46 GMT+02:00 吴晓菊 :

>
> We see SortMergeJoinExec is implemented with
> outputPartitioning while BroadcastHashJoinExec is only
> implemented with outputPartitioning. Why is the design?
>
> Chrysan Wu
> Phone:+86 17717640807
>
>

>>>
>


Re: why BroadcastHashJoinExec is not implemented with outputOrdering?

2018-06-28 Thread 吴晓菊
Thanks for the reply.
By looking into the SortMergeJoinExec, I think we can follow what
SortMergeJoin do, for some types of join, if the children is ordered on
join keys, we can output the ordered join keys as output ordering.


Chrysan Wu
吴晓菊
Phone:+86 17717640807


2018-06-28 22:53 GMT+08:00 Wenchen Fan :

> SortMergeJoin only reports ordering of the join keys, not the output
> ordering of any child.
>
> It seems reasonable to me that broadcast join should respect the output
> ordering of the children. Feel free to submit a PR to fix it, thanks!
>
> On Thu, Jun 28, 2018 at 10:07 PM 吴晓菊  wrote:
>
>> Why we cannot use the output order of big table?
>>
>>
>> Chrysan Wu
>> Phone:+86 17717640807
>>
>>
>> 2018-06-28 21:48 GMT+08:00 Marco Gaido :
>>
>>> The easy answer to this is that SortMergeJoin ensure an outputOrdering,
>>> while BroadcastHashJoin doesn't, ie. after running a BroadcastHashJoin you
>>> don't know which is going to be the order of the output since nothing
>>> enforces it.
>>>
>>> Hope this helps.
>>> Thanks.
>>> Marco
>>>
>>> 2018-06-28 15:46 GMT+02:00 吴晓菊 :
>>>

 We see SortMergeJoinExec is implemented with 
 outputPartitioning
 while BroadcastHashJoinExec is only implemented with outputPartitioning.
 Why is the design?

 Chrysan Wu
 Phone:+86 17717640807


>>>
>>


Re: why BroadcastHashJoinExec is not implemented with outputOrdering?

2018-06-28 Thread Wenchen Fan
SortMergeJoin only reports ordering of the join keys, not the output
ordering of any child.

It seems reasonable to me that broadcast join should respect the output
ordering of the children. Feel free to submit a PR to fix it, thanks!

On Thu, Jun 28, 2018 at 10:07 PM 吴晓菊  wrote:

> Why we cannot use the output order of big table?
>
>
> Chrysan Wu
> Phone:+86 17717640807
>
>
> 2018-06-28 21:48 GMT+08:00 Marco Gaido :
>
>> The easy answer to this is that SortMergeJoin ensure an outputOrdering,
>> while BroadcastHashJoin doesn't, ie. after running a BroadcastHashJoin you
>> don't know which is going to be the order of the output since nothing
>> enforces it.
>>
>> Hope this helps.
>> Thanks.
>> Marco
>>
>> 2018-06-28 15:46 GMT+02:00 吴晓菊 :
>>
>>>
>>> We see SortMergeJoinExec is implemented with
>>> outputPartitioning while BroadcastHashJoinExec is only
>>> implemented with outputPartitioning. Why is the design?
>>>
>>> Chrysan Wu
>>> Phone:+86 17717640807
>>>
>>>
>>
>


Re: why BroadcastHashJoinExec is not implemented with outputOrdering?

2018-06-28 Thread 吴晓菊
Why we cannot use the output order of big table?


Chrysan Wu
Phone:+86 17717640807


2018-06-28 21:48 GMT+08:00 Marco Gaido :

> The easy answer to this is that SortMergeJoin ensure an outputOrdering,
> while BroadcastHashJoin doesn't, ie. after running a BroadcastHashJoin you
> don't know which is going to be the order of the output since nothing
> enforces it.
>
> Hope this helps.
> Thanks.
> Marco
>
> 2018-06-28 15:46 GMT+02:00 吴晓菊 :
>
>>
>> We see SortMergeJoinExec is implemented with
>> outputPartitioning while BroadcastHashJoinExec is only
>> implemented with outputPartitioning. Why is the design?
>>
>> Chrysan Wu
>> Phone:+86 17717640807
>>
>>
>


Re: why BroadcastHashJoinExec is not implemented with outputOrdering?

2018-06-28 Thread Marco Gaido
The easy answer to this is that SortMergeJoin ensure an outputOrdering,
while BroadcastHashJoin doesn't, ie. after running a BroadcastHashJoin you
don't know which is going to be the order of the output since nothing
enforces it.

Hope this helps.
Thanks.
Marco

2018-06-28 15:46 GMT+02:00 吴晓菊 :

>
> We see SortMergeJoinExec is implemented with outputPartitioning
> while BroadcastHashJoinExec is only implemented with outputPartitioning.
> Why is the design?
>
> Chrysan Wu
> Phone:+86 17717640807
>
>


why BroadcastHashJoinExec is not implemented with outputOrdering?

2018-06-28 Thread 吴晓菊
We see SortMergeJoinExec is implemented with
outputPartitioning while BroadcastHashJoinExec is only
implemented with outputPartitioning. Why is the design?

Chrysan Wu
Phone:+86 17717640807