[
https://issues.apache.org/jira/browse/AVRO-672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12918322#action_12918322
]
Ron Bodkin commented on AVRO-672:
---------------------------------
The use case I'm most interested in supporting is converting from JSON data to
a previously-defined Avro schema, either in a batch file conversion, or in
memory (for use with map-reduce).
This newer patch emits the output in a standard, different schema and
conversion to a previously-defined (custom) schema seems to be a problem that
would require code like I wrote in my patch. Also, it'd be nice to be able to
read in a value like "1" even to a double or a long field, even though it'd be
parsed as a JSON integer node.
Also I have found it valuable to have transformation of names that have invalid
characters since there's lots of valid JSON with identifiers that don't conform
to the Avro identifier grammar. That would be pretty easy to put in this patch
(although the regexp I used before was way too slow so I have a newer version
that's efficient).
To allow reading in JSON text and creating objects in memory that conform to
that schema, I think it'd be necessary to have hints for the type of data that
arrays contain (e.g., in generated code or in runtime annotations if using a
reflective style). That is something that I already ran into in trying to get
the reflection reader to work with specific data (on AVRO-669).
> Convert JSON Text Input to Avro Tool
> ------------------------------------
>
> Key: AVRO-672
> URL: https://issues.apache.org/jira/browse/AVRO-672
> Project: Avro
> Issue Type: New Feature
> Reporter: Ron Bodkin
> Attachments: AVRO-672.patch, AVRO-672.patch
>
>
> The attached patch allows reading a JSON-formatted text file in, converting
> to a conforming Avro text file, emitting one record per line, e.g., it can
> read this input file:
> {"intval":12}
> {"intval":-73,"strval":"hello, there!!"}
> with this schema:
> { "type":"record", "name":"TestRecord", "fields": [
> {"name":"intval","type":"int"}, {"name":"strval","type":["string", "null"]}]}
> returning valid Avro. This is different than the DataFileWriteTool, which
> would read in the following internal encoding:
> {"intval":12,"strval":null}
> {"intval":-73,"strval":{"string":"hello, there!!"}}
> In general, the internal encodings used by Avro aren't natural when reading
> in JSON text that appears in the wild. Likewise, this utility allows changing
> invalid Avro identifier characters into an underscore, again to tolerate JSON
> that wasn't designed to be readable by Avro.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.