BTW, I did propose a new RANDOM_ACCESS_BYTE_ARRAY encoding (effectively
Arrow's representation) as part footer improvements [1] to help allow for
O(1) access to particular column metadata, once a column is identified.

[1] https://github.com/apache/parquet-format/pull/250

On Mon, May 27, 2024 at 11:16 PM Micah Kornfield <[email protected]>
wrote:

> As a follow-up to the "V3" Discussions [1][2] I wanted to start a thread
> on improvements to encodings.
>
> There are several areas to pursue here:
> 1.  Curating a standard set of benchmarks and criteria for determining if
> a new encoding is worth adding.
> 2.  Developing new encodings
> 3.  Better implementations to select existing encodings.
> 4.  Better support for encodings with point/indexed lookups.
> 5.  Benchmarking frameworks that allow assessing trade-off of encodings on
> storage systems with different latency/throughput.
>
> Realistically, given my current commitments, I don't think I have
> bandwidth to help with this track in the near term. If someone else would
> like to help drive this and make concrete proposals in these areas it would
> be greatly appreciated.
>
> Thanks,
> Micah
>
>
> [1] https://lists.apache.org/thread/5jyhzkwyrjk9z52g0b49g31ygnz73gxo
> [2]
> https://docs.google.com/document/d/19hQLYcU5_r5nJB7GtnjfODLlSDiNS24GXAtKg9b0_ls/edit
>

Reply via email to