Re: Issue filtering partitioned Parquet files on partition keys using PyArrow

2022-08-05 Thread Weston Pace
ing > > There is an entirely different approach that could be taken which wouldn't > speed up the discovery at all, but should speed up the filtering. In this > approach you could modify FileSystemDataset so that, instead of storing a > flat list of FileFragment objects, it stored a tree of FileFragment > obj

Re: Issue filtering partitioned Parquet files on partition keys using PyArrow

2022-08-04 Thread David Li
ring > > There is an entirely different approach that could be taken which wouldn't > speed up the discovery at all, but should speed up the filtering. In this > approach you could modify FileSystemDataset so that, instead of storing a > flat list of FileFragment objects, it stored a

Re: Issue filtering partitioned Parquet files on partition keys using PyArrow

2022-08-04 Thread Weston Pace
ts(predicate) could walk the tree (in DFS order), skipping entire nodes that fail the predicate. On Thu, Aug 4, 2022 at 9:54 AM Tomaz Maia Suller wrote: > Weston, I'm interested in following up. > > > ---------- > *De:* Weston Pace > *Enviado:* quinta-feira, 4 de agos

RE: Issue filtering partitioned Parquet files on partition keys using PyArrow

2022-08-04 Thread Tomaz Maia Suller
Weston, I'm interested in following up. De: Weston Pace Enviado: quinta-feira, 4 de agosto de 2022 12:15 Para: user@arrow.apache.org Assunto: Re: Issue filtering partitioned Parquet files on partition keys using PyArrow Você não costuma receber emails de

Re: Issue filtering partitioned Parquet files on partition keys using PyArrow

2022-08-04 Thread Weston Pace
onds as I've said. > > I'm starting to think I should send this to the development mailing list > rather than the user one, since the obvious solution is specifying the > paths directly rather than trying to use the API. > ------ > *De:* Lee, D

RE: Issue filtering partitioned Parquet files on partition keys using PyArrow

2022-08-04 Thread Tomaz Maia Suller
is to the development mailing list rather than the user one, since the obvious solution is specifying the paths directly rather than trying to use the API. De: Lee, David Enviado: quarta-feira, 3 de agosto de 2022 19:49 Para: user@arrow.apache.org Assunto: RE: Issue fi

RE: Issue filtering partitioned Parquet files on partition keys using PyArrow

2022-08-03 Thread Lee, David
s (e.g. specifying the metadata, or the pieces property API). Feedback is very welcome. From: Tomaz Maia Suller Sent: Wednesday, August 3, 2022 2:54 PM To: user@arrow.apache.org Subject: Issue filtering partitioned Parquet files on partition keys using PyArrow External Email: Use caution with