Ok I figured out..
You have to create a pyarrow.dataset.CsvFileFormat object first and generate a
csv_file_options=CsvFileFormat.make_write_options(**{include_header: True})
first..
Then pass file_options = csv_file_options in write_dataset()..
The only issue I’ve seen is that when using
Hi!
https://arrow.apache.org/docs/python/generated/pyarrow.dataset.Dataset.html#pyarrow.dataset.Dataset.join
However in my case I want to stay within memory and I found an ugly
workaround through unifying dictionaries
and then building final column with pa.DictionaryArray.from_arrays
BR,
Jacek
Can we join on a "dataset" yet using pyarrow? What I mean is, my parquet
file, which is larger than memory, can I read it using dataset API and join
with other dataset/in memory table? If yes, I couldn't find it in
documentation, can you please guide how to do that join
On Tue, Apr 16, 2024, 9:59
How do you pass a csv.WriteOptions() class to pyarrow.dataset.write_dataset() ??
I tried pass in file_options = pa.csv.WriteOptions(include_header=True) and
file_options = {“include_header”: True}
Both attempts came back with an error: object has no attribute 'format'
CSV cookbook example:
Hi Jacek,
I recall an issue with similar concern [1] that I was trying to answer,
hope that can help.
Besides, if you do the join in parallel, e.g. by directly calling acero API
in C++ and the source node is parallel, there is another level of
uncertainty of the order of output rows, depending
> Can someone confirm it?
I can confirm that the current join implementation will potentially reorder
input. The larger the input the more likely the chance of reordering.
> I think that ordering is only guaranteed if it has been sorted.
Close enough probably. I think there is an implicit
I think that ordering is only guaranteed if it has been sorted.
Sent from Proton Mail for iOS
On Tue, Apr 16, 2024 at 08:12, Jacek Pliszka jacek.plis...@gmail.com
wrote: Hi!
I just hit a very strange behaviour.
I am joining two tables with "left outer" join.
Naively I would expect that the
Hi!
I just hit a very strange behaviour.
I am joining two tables with "left outer" join.
Naively I would expect that the output rows will match the order of the
left table.
But sometimes the order of rows is different ...
Can someone confirm it?
I would expect this would be mentioned in the