Yes, I'd say so. 2018-06-29 4:43 GMT+02:00 吴晓菊 <chrysan...@gmail.com>:
> And it should be generic for HashJoin not only broadcast join, right? > > > Chrysan Wu > 吴晓菊 > Phone:+86 17717640807 > > > 2018-06-29 10:42 GMT+08:00 吴晓菊 <chrysan...@gmail.com>: > >> Sorry for the mistake. You are right output ordering of broadcast join >> can be the order of big table in some types of join. I will prepare a PR >> and let you review later. Thanks a lot! >> >> >> Chrysan Wu >> 吴晓菊 >> Phone:+86 17717640807 >> >> >> 2018-06-29 0:00 GMT+08:00 Wenchen Fan <cloud0...@gmail.com>: >> >>> SortMergeJoin sorts its children by join key, but broadcast join does >>> not. I think the output ordering of broadcast join has nothing to do with >>> join key. >>> >>> On Thu, Jun 28, 2018 at 11:28 PM Marco Gaido <marcogaid...@gmail.com> >>> wrote: >>> >>>> I think the outputOrdering would be the one of the big table (if any) >>>> and it wouldn't matter if this involves the join keys or not. Am I wrong? >>>> >>>> 2018-06-28 17:01 GMT+02:00 吴晓菊 <chrysan...@gmail.com>: >>>> >>>>> Thanks for the reply. >>>>> By looking into the SortMergeJoinExec, I think we can follow what >>>>> SortMergeJoin do, for some types of join, if the children is ordered on >>>>> join keys, we can output the ordered join keys as output ordering. >>>>> >>>>> >>>>> Chrysan Wu >>>>> 吴晓菊 >>>>> Phone:+86 17717640807 >>>>> >>>>> >>>>> 2018-06-28 22:53 GMT+08:00 Wenchen Fan <cloud0...@gmail.com>: >>>>> >>>>>> SortMergeJoin only reports ordering of the join keys, not the output >>>>>> ordering of any child. >>>>>> >>>>>> It seems reasonable to me that broadcast join should respect the >>>>>> output ordering of the children. Feel free to submit a PR to fix it, >>>>>> thanks! >>>>>> >>>>>> On Thu, Jun 28, 2018 at 10:07 PM 吴晓菊 <chrysan...@gmail.com> wrote: >>>>>> >>>>>>> Why we cannot use the output order of big table? >>>>>>> >>>>>>> >>>>>>> Chrysan Wu >>>>>>> Phone:+86 17717640807 >>>>>>> >>>>>>> >>>>>>> 2018-06-28 21:48 GMT+08:00 Marco Gaido <marcogaid...@gmail.com>: >>>>>>> >>>>>>>> The easy answer to this is that SortMergeJoin ensure an >>>>>>>> outputOrdering, while BroadcastHashJoin doesn't, ie. after running a >>>>>>>> BroadcastHashJoin you don't know which is going to be the order of the >>>>>>>> output since nothing enforces it. >>>>>>>> >>>>>>>> Hope this helps. >>>>>>>> Thanks. >>>>>>>> Marco >>>>>>>> >>>>>>>> 2018-06-28 15:46 GMT+02:00 吴晓菊 <chrysan...@gmail.com>: >>>>>>>> >>>>>>>>> >>>>>>>>> We see SortMergeJoinExec is implemented with >>>>>>>>> outputPartitioning&outputOrdering while BroadcastHashJoinExec is >>>>>>>>> only implemented with outputPartitioning. Why is the design? >>>>>>>>> >>>>>>>>> Chrysan Wu >>>>>>>>> Phone:+86 17717640807 >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>> >>>> >> >