And it should be generic for HashJoin not only broadcast join, right?
Chrysan Wu 吴晓菊 Phone:+86 17717640807 2018-06-29 10:42 GMT+08:00 吴晓菊 <chrysan...@gmail.com>: > Sorry for the mistake. You are right output ordering of broadcast join can > be the order of big table in some types of join. I will prepare a PR and > let you review later. Thanks a lot! > > > Chrysan Wu > 吴晓菊 > Phone:+86 17717640807 > > > 2018-06-29 0:00 GMT+08:00 Wenchen Fan <cloud0...@gmail.com>: > >> SortMergeJoin sorts its children by join key, but broadcast join does >> not. I think the output ordering of broadcast join has nothing to do with >> join key. >> >> On Thu, Jun 28, 2018 at 11:28 PM Marco Gaido <marcogaid...@gmail.com> >> wrote: >> >>> I think the outputOrdering would be the one of the big table (if any) >>> and it wouldn't matter if this involves the join keys or not. Am I wrong? >>> >>> 2018-06-28 17:01 GMT+02:00 吴晓菊 <chrysan...@gmail.com>: >>> >>>> Thanks for the reply. >>>> By looking into the SortMergeJoinExec, I think we can follow what >>>> SortMergeJoin do, for some types of join, if the children is ordered on >>>> join keys, we can output the ordered join keys as output ordering. >>>> >>>> >>>> Chrysan Wu >>>> 吴晓菊 >>>> Phone:+86 17717640807 >>>> >>>> >>>> 2018-06-28 22:53 GMT+08:00 Wenchen Fan <cloud0...@gmail.com>: >>>> >>>>> SortMergeJoin only reports ordering of the join keys, not the output >>>>> ordering of any child. >>>>> >>>>> It seems reasonable to me that broadcast join should respect the >>>>> output ordering of the children. Feel free to submit a PR to fix it, >>>>> thanks! >>>>> >>>>> On Thu, Jun 28, 2018 at 10:07 PM 吴晓菊 <chrysan...@gmail.com> wrote: >>>>> >>>>>> Why we cannot use the output order of big table? >>>>>> >>>>>> >>>>>> Chrysan Wu >>>>>> Phone:+86 17717640807 >>>>>> >>>>>> >>>>>> 2018-06-28 21:48 GMT+08:00 Marco Gaido <marcogaid...@gmail.com>: >>>>>> >>>>>>> The easy answer to this is that SortMergeJoin ensure an >>>>>>> outputOrdering, while BroadcastHashJoin doesn't, ie. after running a >>>>>>> BroadcastHashJoin you don't know which is going to be the order of the >>>>>>> output since nothing enforces it. >>>>>>> >>>>>>> Hope this helps. >>>>>>> Thanks. >>>>>>> Marco >>>>>>> >>>>>>> 2018-06-28 15:46 GMT+02:00 吴晓菊 <chrysan...@gmail.com>: >>>>>>> >>>>>>>> >>>>>>>> We see SortMergeJoinExec is implemented with >>>>>>>> outputPartitioning&outputOrdering while BroadcastHashJoinExec is >>>>>>>> only implemented with outputPartitioning. Why is the design? >>>>>>>> >>>>>>>> Chrysan Wu >>>>>>>> Phone:+86 17717640807 >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>> >>> >