Hi all,

Arrow recently introduced FixedShapeTensor and VariableShapeTensor
canonical extension types [1] that use FixedSizeList and StructArray(List,
FixedSizeList) as storage respectfully. These are targeted at machine
learning and scientific applications that deal with large datasets and
would benefit from using Parquet as on disk storage.

However currently FixedSizeList is stored as List in Parquet which adds
significant conversion overhead when reading and writing [2]. It would
therefore be beneficial to introduce a FIXED_SIZE_LIST logical type.

I would like to open a discussion on potentially adding FIXED_SIZE_LIST
type and prepare a proposal if discussion supports it.


Best,
Rok

[1]
https://arrow.apache.org/docs/format/CanonicalExtensions.html#official-list
[2] https://github.com/apache/arrow/issues/34510

Reply via email to