Re: Any efficient way of partitioning tables in memory?

2023-09-29 Thread Ian Cook
Hi Jacek, I don't think there is any great way to do this today. There is an open issue for this that describes a possible workaround: https://github.com/apache/arrow/issues/14882 Ian On Fri, Sep 29, 2023 at 3:50 PM Jacek Pliszka wrote: > > Hi! > > I am looking for an efficient way of working

Any efficient way of partitioning tables in memory?

2023-09-29 Thread Jacek Pliszka
Hi! I am looking for an efficient way of working with pieces of a Table Let's say I got a table with 100M rows with a key column having 100 different values. I would like to be able to quickly get a subtable with just rows for the given key value. Currently I run filter 100 times to generate

Re: Optimizing read performance - wide data frames

2023-09-29 Thread Aldrin
how many rows are you including in a batch? you might want to try with smaller row batches since your columns are so wide.the other thing you can try instead of parquet is testing files with progressively more columns. If the width of your tables are the problem then you'll be able to see when