Hi Rok,

Happy to see you here :)

According to my past experience, it would be more helpful to open
a PR against the parquet-format repository and post it here.

Best,
Gang

On Wed, May 15, 2024 at 7:25 PM Rok Mihevc <[email protected]> wrote:

> Hi all,
>
> Arrow recently introduced FixedShapeTensor and VariableShapeTensor
> canonical extension types [1] that use FixedSizeList and StructArray(List,
> FixedSizeList) as storage respectfully. These are targeted at machine
> learning and scientific applications that deal with large datasets and
> would benefit from using Parquet as on disk storage.
>
> However currently FixedSizeList is stored as List in Parquet which adds
> significant conversion overhead when reading and writing [2]. It would
> therefore be beneficial to introduce a FIXED_SIZE_LIST logical type.
>
> I would like to open a discussion on potentially adding FIXED_SIZE_LIST
> type and prepare a proposal if discussion supports it.
>
>
> Best,
> Rok
>
> [1]
> https://arrow.apache.org/docs/format/CanonicalExtensions.html#official-list
> [2] https://github.com/apache/arrow/issues/34510
>

Reply via email to