Hi Jacek, I recall an issue with similar concern [1] that I was trying to answer, hope that can help.
Besides, if you do the join in parallel, e.g. by directly calling acero API in C++ and the source node is parallel, there is another level of uncertainty of the order of output rows, depending on the timing of each thread finishes. I think acero is kind of a SQL-like query engine. So, though not explicitly documented, it follows the order convention of SQL - no order guarantee unless specified using `order by`. [1] https://github.com/apache/arrow/issues/37542#issuecomment-1871692692 Thanks. *Regards,* *Rossi SUN* Weston Pace <weston.p...@gmail.com> 于2024年4月16日周二 23:34写道: > > Can someone confirm it? > > I can confirm that the current join implementation will potentially > reorder input. The larger the input the more likely the chance of > reordering. > > > I think that ordering is only guaranteed if it has been sorted. > > Close enough probably. I think there is an implicit order (the order of > the defined by the files in the dataset and the rows in those files, or the > original order when the input is in memory) that will be respected if there > are no joins or aggregates. > > On Tue, Apr 16, 2024 at 8:19 AM Aldrin <octalene....@pm.me> wrote: > >> I think that ordering is only guaranteed if it has been sorted. >> >> Sent from Proton Mail <https://proton.me/mail/home> for iOS >> >> >> On Tue, Apr 16, 2024 at 08:12, Jacek Pliszka <jacek.plis...@gmail.com >> <On+Tue,+Apr+16,+2024+at+08:12,+Jacek+Pliszka+%3C%3Ca+href=>> wrote: >> >> Hi! >> >> I just hit a very strange behaviour. >> >> I am joining two tables with "left outer" join. >> >> Naively I would expect that the output rows will match the order of the >> left table. >> >> But sometimes the order of rows is different ... >> >> Can someone confirm it? >> >> I would expect this would be mentioned in the docs. >> >> I am using 12.0.1 due to Python 3.7 dependency. >> >> Best Regards, >> >> Jacek Pliszka >> >> >> >>