Re: rows reshuffled on join

2024-04-16 Thread Jacek Pliszka
Hi! https://arrow.apache.org/docs/python/generated/pyarrow.dataset.Dataset.html#pyarrow.dataset.Dataset.join However in my case I want to stay within memory and I found an ugly workaround through unifying dictionaries and then building final column with pa.DictionaryArray.from_arrays BR, Jacek

Re: rows reshuffled on join

2024-04-16 Thread PASSWORD ADMINISTRATOR
Can we join on a "dataset" yet using pyarrow? What I mean is, my parquet file, which is larger than memory, can I read it using dataset API and join with other dataset/in memory table? If yes, I couldn't find it in documentation, can you please guide how to do that join On Tue, Apr 16, 2024, 9:59

Re: rows reshuffled on join

2024-04-16 Thread Ruoxi Sun
Hi Jacek, I recall an issue with similar concern [1] that I was trying to answer, hope that can help. Besides, if you do the join in parallel, e.g. by directly calling acero API in C++ and the source node is parallel, there is another level of uncertainty of the order of output rows, depending

Re: rows reshuffled on join

2024-04-16 Thread Weston Pace
> Can someone confirm it? I can confirm that the current join implementation will potentially reorder input. The larger the input the more likely the chance of reordering. > I think that ordering is only guaranteed if it has been sorted. Close enough probably. I think there is an implicit

Re: rows reshuffled on join

2024-04-16 Thread Aldrin
I think that ordering is only guaranteed if it has been sorted. Sent from Proton Mail for iOS On Tue, Apr 16, 2024 at 08:12, Jacek Pliszka jacek.plis...@gmail.com wrote: Hi! I just hit a very strange behaviour. I am joining two tables with "left outer" join. Naively I would expect that the

rows reshuffled on join

2024-04-16 Thread Jacek Pliszka
Hi! I just hit a very strange behaviour. I am joining two tables with "left outer" join. Naively I would expect that the output rows will match the order of the left table. But sometimes the order of rows is different ... Can someone confirm it? I would expect this would be mentioned in the