Hi Rok, Happy to see you here :)
According to my past experience, it would be more helpful to open a PR against the parquet-format repository and post it here. Best, Gang On Wed, May 15, 2024 at 7:25 PM Rok Mihevc <[email protected]> wrote: > Hi all, > > Arrow recently introduced FixedShapeTensor and VariableShapeTensor > canonical extension types [1] that use FixedSizeList and StructArray(List, > FixedSizeList) as storage respectfully. These are targeted at machine > learning and scientific applications that deal with large datasets and > would benefit from using Parquet as on disk storage. > > However currently FixedSizeList is stored as List in Parquet which adds > significant conversion overhead when reading and writing [2]. It would > therefore be beneficial to introduce a FIXED_SIZE_LIST logical type. > > I would like to open a discussion on potentially adding FIXED_SIZE_LIST > type and prepare a proposal if discussion supports it. > > > Best, > Rok > > [1] > https://arrow.apache.org/docs/format/CanonicalExtensions.html#official-list > [2] https://github.com/apache/arrow/issues/34510 >
