Hi all,

I am starting a new voting thread with this email as the first voting
thread [1] opened up new
comments and suggestions and we wanted to take time to see how that evolves.

*I would like to propose we vote on adding the fixed shape tensor canonical
extension type*
*with the following specification:*

Fixed shape tensor
==================

* Extension name: `arrow.fixed_shape_tensor`.

* The storage type of the extension: ``FixedSizeList`` where:

  * **value_type** is the data type of individual tensor elements.
  * **list_size** is the product of all the elements in tensor shape.

* Extension type parameters:

  * **value_type** = the Arrow data type of individual tensor elements.
  * **shape** = the physical shape of the contained tensors
    as an array.

  Optional parameters describing the logical layout:

  * **dim_names** = explicit names to tensor dimensions
    as an array. The length of it should be equal to the shape
    length and equal to the number of dimensions.

    ``dim_names`` can be used if the dimensions have well-known
    names and they map to the physical layout (row-major).

  * **permutation**  = indices of the desired ordering of the
    original dimensions, defined as an array.

    The indices contain a permutation of the values [0, 1, .., N-1] where
    N is the number of dimensions. The permutation indicates which
    dimension of the logical layout corresponds to which dimension of the
    physical tensor (the i-th dimension of the logical view corresponds
    to the dimension with number ``permutations[i]`` of the physical tensor).

    Permutation can be useful in case the logical order of
    the tensor is a permutation of the physical order (row-major).

    When logical and physical layout are equal, the permutation will always
    be ([0, 1, .., N-1]) and can therefore be left out.

* Description of the serialization:

  The metadata must be a valid JSON object including shape of
  the contained tensors as an array with key **"shape"** plus optional
  dimension names with keys **"dim_names"** and ordering of the
  dimensions with key **"permutation"**.

  - Example: ``{ "shape": [2, 5]}``
  - Example with ``dim_names`` metadata for NCHW ordered data:

    ``{ "shape": [100, 200, 500], "dim_names": ["C", "H", "W"]}``

  - Example of permuted 3-dimensional tensor:

    ``{ "shape": [100, 200, 500], "permutation": [2, 0, 1]}``

    This is the physical layout shape and the the shape of the logical
    layout would in this case be ``[500, 100, 200]``.

.. note::

  Elements in a fixed shape tensor extension array are stored
  in row-major/C-contiguous order.

* The specification is submitted as a PR [2] to Canonical Extension Types
document under the
   format specifications directory [3].

There are also two implementations submitted to Apache Arrow repository:
* C++ implementation of the proposed specification [4]
* Python example implementation of the proposed specification and usage
(only illustrative) [5]


The vote will be open for at least 72 hours.

[ ] +1 Accept this proposal
[ ] +0
[ ] -1 Do not accept this proposal because...


Regards, Alenka

[1]: https://lists.apache.org/thread/3cj0cr44hg3t2rn0kxly8td82yfob1nd
[2]: https://github.com/apache/arrow/pull/33925/files
[3]:
https://github.com/apache/arrow/blob/main/docs/source/format/CanonicalExtensions.rst

[4]: https://github.com/apache/arrow/pull/8510/files
[5]: https://github.com/apache/arrow/pull/33948/files

Reply via email to