Re: rows reshuffled on join

2024-04-16 Thread PASSWORD ADMINISTRATOR
Can we join on a "dataset" yet using pyarrow? What I mean is, my parquet file, which is larger than memory, can I read it using dataset API and join with other dataset/in memory table? If yes, I couldn't find it in documentation, can you please guide how to do that join On Tue, Apr 16, 2024, 9:59

Re: Equivalent of R arrow API in pyarrow for dataset group size

2023-08-31 Thread PASSWORD ADMINISTRATOR
Anyone? On Sun, Aug 27, 2023 at 2:21 AM PASSWORD ADMINISTRATOR < ultimatepwdmas...@gmail.com> wrote: > First time using a mailing list so bear with me. > > I am trying to run a simple query on full NYC taxi dataset (my local copy > on HDD), which counts number of rows per gro

Equivalent of R arrow API in pyarrow for dataset group size

2023-08-26 Thread PASSWORD ADMINISTRATOR
First time using a mailing list so bear with me. I am trying to run a simple query on full NYC taxi dataset (my local copy on HDD), which counts number of rows per group, i.e group by X then count (*) In R-arrow, this can be done using nyc_taxi = arrow::open_dataset('aria_nyc/',partitioning =