Hi, Going out on a limb here, but maybe storing individual values that are hundreds of megabytes isn't really the best fit for Parquet files. Or at least this isn't a common-enough use case for shared/public files to warrant a complicating change in the format.
Given the requests/proposals of late, I wonder if there isn't good reason for someone to come up with another file format that is made specifically to handle rows with tons of columns and/or very large values. On Mon, May 4, 2026 at 7:17 PM Daniel Weeks <[email protected]> wrote: > Hey Parquet Devs, > > The core problem is writer memory pressure caused by wide schemas and > asymmetric column sizes. Today a writer must buffer every column chunk in > memory until a row group is complete, because each column chunk must land > as a single contiguous byte range. For wide schemas, or schemas mixing > small fixed-width columns with very large variable-length values, this can > drive high memory usage even when individual pages are fully encoded, > compressed, and ready to flush, or it can result in row groups being > produced at inconsistent or inefficient boundaries. -- Andrew Bell [email protected]
