This seems like a reasonable definition to me.  Since there hasn't been
much feedback, I think maybe following through an implementation + this
description in a PR would be the next steps.  If there isn't further
feedback on this, once the PR is up we can have try to vote (which might
bring up some more feedback, but hopefully wouldn't cause too much
implementation churn).

Thanks,
Micah

On Thu, Nov 17, 2022 at 3:58 PM Pradeep Gollakota
<pgollak...@google.com.invalid> wrote:

> Hi folks!
>
> I put together this specification for canonicalizing the JSON type in
> Arrow.
>
> ## Introduction
> JSON is a widely used text based data interchange format. There are many
> use cases where a user has a column whose contents are a JSON encoded
> string. BigQuery's [JSON Type][1] and Parquet’s [JSON Logical Type][2] are
> two such examples.
>
> The JSON specification is defined in [RFC-8259][3]. However, many of the
> most popular parsers support non standard extensions. Examples of non
> standard extensions to JSON include comments, unquoted keys, trailing
> commas, etc.
>
> ## Extension Specification
> * The name of the extension is `arrow.json`
> * The storage type of the extension is `utf8`
> * The extension type has no parameters
> * The metadata MUST be either empty or a valid JSON object
>     - There is no canonical metadata
>     - Implementations MAY include implementation-specific metadata by using
> a namespaced key. For example `{"google.bigquery": {"my": "metadata"}}`
> * Implementations...
>     - MUST produce valid UTF-8 encoded text
>     - SHOULD produce valid standard JSON
>     - MAY produce valid non-standard JSON
>     - MUST support parsing standard JSON
>     - MAY support parsing non standard JSON
>     - SHOULD pass through contents that they do not understand
>
> ## Forward compatibility
> In the future we might allow this logical type to annotate a byte storage
> type with a different text encoding.  Implementations consuming JSON
> logical types should verify this.
>
>     [1]:
>
> https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#json_type
>     [2]:
> https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#json
>     [3]: https://datatracker.ietf.org/doc/html/rfc8259
>

Reply via email to