westonpace commented on code in PR #41257:
URL: https://github.com/apache/arrow/pull/41257#discussion_r1589739730


##########
docs/source/format/CanonicalExtensions.rst:
##########
@@ -251,6 +251,27 @@ Variable shape tensor
    Values inside each **data** tensor element are stored in 
row-major/C-contiguous
    order according to the corresponding **shape**.
 
+.. _json_extension:
+
+JSON
+====
+
+* Extension name: `arrow.json`.
+
+* The storage type of this extension is ``StringArray`` or
+  or ``LargeStringArray`` or ``StringViewArray``.
+  Only UTF-8 encoded JSON is supported.

Review Comment:
   > By this you mean serializing binary data into UTF-8 string and including 
it as a String? What would be a downside to allowing that?
   
   I think the argument was that we might want to support `BinaryArray` as a 
storage type since a JSON document could be encoded with a non-utf8 encoding 
and thus should not be stored in a `StringArray`.  However, I agree we probably 
don't want to worry about this since the RFC is pretty clear that JSON must be 
UTF-8 and users can always make up their own extension type if they need to.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to