emkornfield commented on code in PR #41823: URL: https://github.com/apache/arrow/pull/41823#discussion_r1639304889
########## docs/source/format/CanonicalExtensions.rst: ########## @@ -283,6 +283,132 @@ UUID A specific UUID version is not required or guaranteed. This extension represents UUIDs as FixedSizeBinary(16) with big-endian notation and does not interpret the bytes in any way. +Unknown +======= + +Unknown represents a type or array that an Arrow-based system received from an +external (often non-Arrow) system, which it cannot interpret itself or did not +have support for in advance. In this case, it can pass on Unknown to its own +clients to communicate that a field exists, but that it cannot interpret the +field or data. + +Extension parameters: + +* Extension name: ``arrow.unknown``. + +* The storage type of this extension is any type. If there is no underlying + data, the storage type should be Null. If there is data, the storage type + should preferably be binary or fixed-size binary, but may be any type. + +* Extension type parameters: + + * **type_name** = the name of the unknown type in the external system. + * **vendor_name** = the name of the external system. + +* Description of the serialization: + + A valid JSON object containing the parameters as fields. In the future, + additional fields may be added, but all fields current and future are never + required to interpret the array. + +Examples: + +* Consider a Flight SQL service that supports connecting external databases. + Its clients may request the names and types of columns of tables in those + databases, but then there may be types that the Flight SQL service does not + recognize, due to lack of support or because those systems have their own + extensions or user-defined types. + + The Flight SQL service can use the Unknown[Null] type to report that a + column exists with a particular name and type name in the external database. + This lets clients know that a column exists, but is not supported. Null is + used as the storage type here because only schemas are involved. + + The client would presumably not be able to query such columns from the + Flight SQL service, but there may be other columns in the table that it + could query, or it could prepare a query that references the unknown column + in an expression and produces a result that *is* supported. The Unknown + type is a better experience than erroring or silently dropping columns from + the catalog. + + An example of the extension metadata would be:: + + {"type_name": "varray", "vendor_name": "Oracle"} + +* The ADBC PostgreSQL driver may get bytes for a field whose type it does not + recognize. This is because of how PostgreSQL and its wire protocol work: + the driver will always get bytes for fields and must implement support for + all potential types to interpret those bytes. But the driver cannot know + about all types in advance, as there may be extensions (e.g. PostGIS for + geospatial functionality). + + Beacuse the driver still has the raw bytes, it can use Unknown[Binary] to + return those bytes to the application, which may be able to parse the data + itself. Unknown differentiates the column from an actual binary column. + + An example of the extension metadata would be:: + + {"type_name": "geometry", "vendor_name": "PostGIS"} Review Comment: will it always be the case that type_name will be utf-8 compatible? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org