jorisvandenbossche commented on code in PR #33925:
URL: https://github.com/apache/arrow/pull/33925#discussion_r1090419871


##########
docs/source/format/CanonicalExtensions.rst:
##########
@@ -72,4 +72,28 @@ same rules as laid out above, and provide backwards 
compatibility guarantees.
 Official List
 =============
 
-No canonical extension types have been standardized yet.
+Fixed size tensor
+=================
+
+* Extension name: `arrow.fixed_size_tensor`.
+
+* The storage type of the extension: ``List``.
+
+* Extension type parameters:
+
+  * **value_type** = pyarrow DataType of the tensor elements
+  * **shape** = shape of the contained tensors as a tuple
+  * **order** = string indicating the order of elements in memory;
+    ‘C’ for row major order and ‘F’ for column major order
+
+* Description of the serialization:
+
+  The metadata MUST be a valid JSON object including:
+
+  * shape of the contained tensors as a tuple with key “shape”,
+  * string defining the order of elements in memory with key “order”.
+
+  For example: `{ “shape”: (2, 5), “order”: ‘C’}`
+
+  Implementations MAY include implementation-specific metadata by using a
+  namespaced key. For example `{"package.name": {"my": "metadata"}}`

Review Comment:
   I forgot to ask earlier, but what is the benefit of explicitly allowing this 
_inside_ the serialized metadata? Because the extension metadata is part of the 
general custom key-value metadata of the Field (where the extension metadata is 
stored under the `"ARROW:extension:metadata"` key, 
https://arrow.apache.org/docs/dev/format/Columnar.html#extension-types). So 
users can still store custom implementation-specific metadata in the general 
Field metadata, if needed (of course, this isn't seen by the "deserialize" step 
to instantiate an ExtensionType from the serialized schema, so it would work 
differently)
   
   Are there concrete use cases for allowing this?
   
   



##########
docs/source/format/CanonicalExtensions.rst:
##########
@@ -72,4 +72,28 @@ same rules as laid out above, and provide backwards 
compatibility guarantees.
 Official List
 =============
 
-No canonical extension types have been standardized yet.
+Fixed size tensor
+=================
+
+* Extension name: `arrow.fixed_size_tensor`.
+
+* The storage type of the extension: ``List``.
+
+* Extension type parameters:
+
+  * **value_type** = pyarrow DataType of the tensor elements
+  * **shape** = shape of the contained tensors as a tuple
+  * **order** = string indicating the order of elements in memory;
+    ‘C’ for row major order and ‘F’ for column major order
+
+* Description of the serialization:
+
+  The metadata MUST be a valid JSON object including:
+
+  * shape of the contained tensors as a tuple with key “shape”,
+  * string defining the order of elements in memory with key “order”.
+
+  For example: `{ “shape”: (2, 5), “order”: ‘C’}`

Review Comment:
   ```suggestion
     For example: `{ "shape":  [2, 5], "order": "C" }`
   ```
   
   To make it formatted as valid JSON. Also, JSON doesn't know "tuple" but has 
an "array" (https://www.json.org/json-en.html), so you can update that in the 
text above.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to