Hi,

Going out on a limb here, but maybe storing individual values that are
hundreds of megabytes isn't really the best fit for Parquet files. Or at
least this isn't a common-enough use case for shared/public files to
warrant a complicating change in the format.

Given the requests/proposals of late, I wonder if there isn't good reason
for someone to come up with another file format that is made specifically
to handle rows with tons of columns and/or very large values.

On Mon, May 4, 2026 at 7:17 PM Daniel Weeks <[email protected]>
wrote:

> Hey Parquet Devs,
>
> The core problem is writer memory pressure caused by wide schemas and
> asymmetric column sizes. Today a writer must buffer every column chunk in
> memory until a row group is complete, because each column chunk must land
> as a single contiguous byte range. For wide schemas, or schemas mixing
> small fixed-width columns with very large variable-length values, this can
> drive high memory usage even when individual pages are fully encoded,
> compressed, and ready to flush, or it can result in row groups being
> produced at inconsistent or inefficient boundaries.


-- 
Andrew Bell
[email protected]

Reply via email to