ianmcook commented on code in PR #43849:
URL: https://github.com/apache/arrow/pull/43849#discussion_r1733591036
##########
docs/source/python/extending_types.rst:
##########
@@ -131,58 +131,82 @@ and serialization mechanism. The extension name and
serialized metadata
can potentially be recognized by other (non-Python) Arrow implementations
such as PySpark.
-For example, we could define a custom UUID type for 128-bit numbers which can
-be represented as ``FixedSizeBinary`` type with 16 bytes::
+For example, we could define a custom rational type for fractions which can
+be represented as a pair of integers::
- class UuidType(pa.ExtensionType):
+ import pyarrow as pa
+ import pyarrow.types as pt
+
+ class RationalType(pa.ExtensionType):
def __init__(self):
- super().__init__(pa.binary(16), "my_package.uuid")
- def __arrow_ext_serialize__(self):
- # Since we don't have a parameterized type, we don't need extra
- # metadata to be deserialized
- return b''
+ super().__init__(
+ pa.struct(
+ [
+ ("numer", pa.int32()),
+ ("denom", pa.int32()),
+ ],
+ ),
+ "my_package.rational",
+ )
+
+ def __arrow_ext_serialize__(self) -> bytes:
+ # No serialized metadata necessary
+ return b""
@classmethod
- def __arrow_ext_deserialize__(cls, storage_type, serialized):
+ def __arrow_ext_deserialize__(self, storage_type, serialized):
# Sanity checks, not required but illustrate the method signature.
- assert storage_type == pa.binary(16)
+ assert pt.is_struct(storage_type)
+ assert pt.is_int32(storage_type[0].type)
assert serialized == b''
- # Return an instance of this subclass given the serialized
- # metadata.
- return UuidType()
+
+ # return an instance of this subclass given the serialized
+ # metadata
+ return RationalType()
+
The special methods ``__arrow_ext_serialize__`` and
``__arrow_ext_deserialize__``
-define the serialization of an extension type instance. For non-parametric
-types such as the above, the serialization payload can be left empty.
Review Comment:
Oh, I see. I think this relates to what exactly parameterized / parametric
means, correct?
- If the storage type requires one or more parameters to initially specify,
does that qualify as "parameterized"?
- Or does "parameterized" mean that we need additional metadata (that is not
encoded in the underlying storage type) in order to reconstruct the instance?
I think you are defining it to mean the latter, correct?
I think that makes sense. But it would be good to explicitly say this in the
docs. See my other comment about that.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]