westonpace commented on code in PR #41257:
URL: https://github.com/apache/arrow/pull/41257#discussion_r1589739730
##########
docs/source/format/CanonicalExtensions.rst:
##########
@@ -251,6 +251,27 @@ Variable shape tensor
Values inside each **data** tensor element are stored in
row-major/C-contiguous
order according to the corresponding **shape**.
+.. _json_extension:
+
+JSON
+====
+
+* Extension name: `arrow.json`.
+
+* The storage type of this extension is ``StringArray`` or
+ or ``LargeStringArray`` or ``StringViewArray``.
+ Only UTF-8 encoded JSON is supported.
Review Comment:
> By this you mean serializing binary data into UTF-8 string and including
it as a String? What would be a downside to allowing that?
I think the argument was that we might want to support `BinaryArray` as a
storage type since a JSON document could be encoded with a non-utf8 encoding
and thus should not be stored in a `StringArray`. However, I agree we probably
don't want to worry about this since the RFC is pretty clear that JSON must be
UTF-8 and users can always make up their own extension type if they need to.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]