Hi all, I am starting a new voting thread with this email as the first voting thread [1] opened up new comments and suggestions and we wanted to take time to see how that evolves.
*I would like to propose we vote on adding the fixed shape tensor canonical extension type* *with the following specification:* Fixed shape tensor ================== * Extension name: `arrow.fixed_shape_tensor`. * The storage type of the extension: ``FixedSizeList`` where: * **value_type** is the data type of individual tensor elements. * **list_size** is the product of all the elements in tensor shape. * Extension type parameters: * **value_type** = the Arrow data type of individual tensor elements. * **shape** = the physical shape of the contained tensors as an array. Optional parameters describing the logical layout: * **dim_names** = explicit names to tensor dimensions as an array. The length of it should be equal to the shape length and equal to the number of dimensions. ``dim_names`` can be used if the dimensions have well-known names and they map to the physical layout (row-major). * **permutation** = indices of the desired ordering of the original dimensions, defined as an array. The indices contain a permutation of the values [0, 1, .., N-1] where N is the number of dimensions. The permutation indicates which dimension of the logical layout corresponds to which dimension of the physical tensor (the i-th dimension of the logical view corresponds to the dimension with number ``permutations[i]`` of the physical tensor). Permutation can be useful in case the logical order of the tensor is a permutation of the physical order (row-major). When logical and physical layout are equal, the permutation will always be ([0, 1, .., N-1]) and can therefore be left out. * Description of the serialization: The metadata must be a valid JSON object including shape of the contained tensors as an array with key **"shape"** plus optional dimension names with keys **"dim_names"** and ordering of the dimensions with key **"permutation"**. - Example: ``{ "shape": [2, 5]}`` - Example with ``dim_names`` metadata for NCHW ordered data: ``{ "shape": [100, 200, 500], "dim_names": ["C", "H", "W"]}`` - Example of permuted 3-dimensional tensor: ``{ "shape": [100, 200, 500], "permutation": [2, 0, 1]}`` This is the physical layout shape and the the shape of the logical layout would in this case be ``[500, 100, 200]``. .. note:: Elements in a fixed shape tensor extension array are stored in row-major/C-contiguous order. * The specification is submitted as a PR [2] to Canonical Extension Types document under the format specifications directory [3]. There are also two implementations submitted to Apache Arrow repository: * C++ implementation of the proposed specification [4] * Python example implementation of the proposed specification and usage (only illustrative) [5] The vote will be open for at least 72 hours. [ ] +1 Accept this proposal [ ] +0 [ ] -1 Do not accept this proposal because... Regards, Alenka [1]: https://lists.apache.org/thread/3cj0cr44hg3t2rn0kxly8td82yfob1nd [2]: https://github.com/apache/arrow/pull/33925/files [3]: https://github.com/apache/arrow/blob/main/docs/source/format/CanonicalExtensions.rst [4]: https://github.com/apache/arrow/pull/8510/files [5]: https://github.com/apache/arrow/pull/33948/files