Re: InputSplit support for parquet reads using Arrow

2023-06-07 Thread Weston Pace
I'm not familiar with the "parquet block size". However, you can use row groups to accomplish this task. You could write a single 10GB file with 5 row groups. Then, when reading, the arrow readers allow you to specify which row groups you would like to read. On Wed, Jun 7, 2023 at 6:13 AM

InputSplit support for parquet reads using Arrow

2023-06-07 Thread Sanskar Modi
Hi everyone, We have a use case where we're writing a parquet file to a remote server and we want to read this parquet file using arrow. But we want multiple hosts to read splits of the parquet file based on parquet block size. Ex: If the parquet file size is 10 GB, we want 5 hosts to read a 2

Re: Nested Types Question

2023-06-07 Thread Aldrin
It can be any type. You can find it mentioned around [1].[1]: https://arrow.apache.org/docs/format/Columnar.html#variable-size-list-layout Sent from Proton Mail for iOS On Tue, Jun 6, 2023 at 23:10, James wrote: The columnar format spec doesn’t specify (or if it does,

Nested Types Question

2023-06-07 Thread James
The columnar format spec doesn’t specify (or if it does, then I missed it) what kind of types are allowed to be nested within a List. Is any type valid as a child array? List of structs? Maps?