Hi Jacek,

I recall an issue with similar concern [1] that I was trying to answer,
hope that can help.

Besides, if you do the join in parallel, e.g. by directly calling acero API
in C++ and the source node is parallel, there is another level of
uncertainty of the order of output rows, depending on the timing of each
thread finishes.

I think acero is kind of a SQL-like query engine. So, though not explicitly
documented, it follows the order convention of SQL - no order guarantee
unless specified using `order by`.

[1] https://github.com/apache/arrow/issues/37542#issuecomment-1871692692

Thanks.

*Regards,*
*Rossi SUN*


Weston Pace <weston.p...@gmail.com> 于2024年4月16日周二 23:34写道:

> > Can someone confirm it?
>
> I can confirm that the current join implementation will potentially
> reorder input.  The larger the input the more likely the chance of
> reordering.
>
> > I think that ordering is only guaranteed if it has been sorted.
>
> Close enough probably.  I think there is an implicit order (the order of
> the defined by the files in the dataset and the rows in those files, or the
> original order when the input is in memory) that will be respected if there
> are no joins or aggregates.
>
> On Tue, Apr 16, 2024 at 8:19 AM Aldrin <octalene....@pm.me> wrote:
>
>> I think that ordering is only guaranteed if it has been sorted.
>>
>> Sent from Proton Mail <https://proton.me/mail/home> for iOS
>>
>>
>> On Tue, Apr 16, 2024 at 08:12, Jacek Pliszka <jacek.plis...@gmail.com
>> <On+Tue,+Apr+16,+2024+at+08:12,+Jacek+Pliszka+%3C%3Ca+href=>> wrote:
>>
>> Hi!
>>
>> I just hit a very strange behaviour.
>>
>> I am joining two tables with "left outer" join.
>>
>> Naively I would expect that the output rows will match the order of the
>> left table.
>>
>> But sometimes the order of rows is different ...
>>
>> Can someone confirm it?
>>
>> I would expect this would be mentioned in the docs.
>>
>> I am using 12.0.1 due to Python 3.7 dependency.
>>
>> Best Regards,
>>
>> Jacek Pliszka
>>
>>
>>
>>

Reply via email to