I'm not familiar with the "parquet block size". However, you can use row
groups to accomplish this task. You could write a single 10GB file with 5
row groups. Then, when reading, the arrow readers allow you to specify
which row groups you would like to read.
On Wed, Jun 7, 2023 at 6:13 AM
Hi everyone,
We have a use case where we're writing a parquet file to a remote server
and we want to read this parquet file using arrow. But we want multiple
hosts to read splits of the parquet file based on parquet block size.
Ex: If the parquet file size is 10 GB, we want 5 hosts to read a 2
It can be any type. You can find it mentioned around [1].[1]: https://arrow.apache.org/docs/format/Columnar.html#variable-size-list-layout Sent from Proton Mail for iOS On Tue, Jun 6, 2023 at 23:10, James wrote: The columnar format spec doesn’t specify (or if it does,
The columnar format spec doesn’t specify (or if it does, then I missed it)
what kind of types are allowed to be nested within a List. Is any type
valid as a child array? List of structs? Maps?