pitrou commented on code in PR #41823:
URL: https://github.com/apache/arrow/pull/41823#discussion_r1681287867


##########
docs/source/format/CanonicalExtensions.rst:
##########
@@ -283,6 +283,148 @@ UUID
    A specific UUID version is not required or guaranteed. This extension 
represents
    UUIDs as FixedSizeBinary(16) with big-endian notation and does not 
interpret the bytes in any way.
 
+Opaque
+=======
+
+Opaque represents a type or array that an Arrow-based system received from an
+external (often non-Arrow) system, which it cannot interpret or did not have
+support for in advance.  In this case, it can pass on Opaque to its clients to
+show that a field exists, but that it cannot interpret the field or data.
+
+Extension parameters:
+
+* Extension name: ``arrow.opaque``.
+
+* The storage type of this extension is any type.  If there is no underlying
+  data, the storage type should be Null.
+
+* Extension type parameters:
+
+  * **type_name** = the name of the unknown type in the external system.
+  * **vendor_name** = the name of the external system.
+
+* Description of the serialization:
+
+  A valid JSON object containing the parameters as fields.  In the future,
+  additional fields may be added, but all fields current and future are never
+  required to interpret the array.
+
+Rationale
+---------
+
+Arrow systems often wrap non-Arrow systems, and so they must be prepared to
+handle data types and data that don't have an equivalent Arrow type.  A client
+may still want to know of the existence of a field, or the types of other,
+supported fields.  So returning an error because of an unrecognized type in
+one column, or dropping unsupported fields/columns, are poor solutions.
+
+Of course, the Arrow system can use extension types.  But it will not have an
+extension type prepared for every possible type in advance; for example, the
+non-Arrow system may have its own extension mechanisms.  It could "make up" an
+extension type on the fly.  But this misleads clients who cannot tell if the
+type is truly supported or not by the intermediate Arrow application.
+
+The Opaque type can be used instead.  Because it explicitly means that the
+*intermediate* system does not support a type, it can be used to declare an
+unsupported field or column without silently losing data or erroring.  In
+other words: if an Arrow system encounters a non-Arrow type it was not
+prepared to handle, it can use Opaque to still pass the type on to a client.
+
+Applications **should not** make conventions around vendor_name and type_name.
+If there is a type that multiple systems want to support, they should create a
+formal extension type.  They *should not* try to agree on particular
+parameters of the Opaque type.  These parameters are meant for human end users
+to understand what type was not supported.  Of course, applications may
+interpret these fields regardless, but must be prepared for breakage (if for
+example the type becomes supported with a custom extension type in a later
+software revision).
+
+Opaque is not about file formats.  Considerations such as MIME types are
+irrelevant, and Opaque should not be thought of as a generic container for
+file format data (XML/JSON/etc.).
+
+Examples:
+
+* Consider a Flight SQL service that supports connecting external databases.

Review Comment:
   The examples are similarly long-winded. We could perhaps keep the gist of 
them, but without the overly long explanations? Example:
   
   > A Flight SQL service can be used to connect to an external database that 
supports types not representable in Arrow. When listing the types of table 
columns in such a database, the Flight SQL service would fail to return 
information about non-representable types. Using the Opaque type, however, it 
can still communicate information about those columns instead of ignoring them 
entirely.
   >
   > An example of the extension metadata, when connected to a well-known 
proprietary database, would be [etc.]



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to