[ 
https://issues.apache.org/jira/browse/AVRO-672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12918322#action_12918322
 ] 

Ron Bodkin commented on AVRO-672:
---------------------------------

The use case I'm most interested in supporting is converting from JSON data to 
a previously-defined Avro schema, either in a batch file conversion, or in 
memory (for use with map-reduce). 

This newer patch emits the output in a standard, different schema and 
conversion to a previously-defined (custom) schema seems to be a problem that 
would require code like I wrote in my patch. Also, it'd be nice to be able to 
read in a value like "1" even to a double or a long field, even though it'd be 
parsed as a JSON integer node.

Also I have found it valuable to have transformation of names that have invalid 
characters since there's lots of valid JSON with identifiers that don't conform 
to the Avro identifier grammar. That would be pretty easy to put in this patch 
(although the regexp I used before was way too slow so I have a newer version 
that's efficient).

To allow reading in JSON text and creating objects in memory that conform to 
that schema, I think it'd be necessary to have hints for the type of data that 
arrays contain (e.g., in generated code or in runtime annotations if using a 
reflective style). That is something that I already ran into in trying to get 
the reflection reader to work with specific data (on AVRO-669).


> Convert JSON Text Input to Avro Tool
> ------------------------------------
>
>                 Key: AVRO-672
>                 URL: https://issues.apache.org/jira/browse/AVRO-672
>             Project: Avro
>          Issue Type: New Feature
>            Reporter: Ron Bodkin
>         Attachments: AVRO-672.patch, AVRO-672.patch
>
>
> The attached patch allows reading a JSON-formatted text file in, converting 
> to a conforming Avro text file, emitting one record per line, e.g., it can 
> read this input file:
> {"intval":12}
> {"intval":-73,"strval":"hello, there!!"}
> with this schema:
> { "type":"record", "name":"TestRecord", "fields": [ 
> {"name":"intval","type":"int"}, {"name":"strval","type":["string", "null"]}]}
> returning valid Avro. This is different than the DataFileWriteTool, which 
> would read in the following internal encoding:
> {"intval":12,"strval":null}
> {"intval":-73,"strval":{"string":"hello, there!!"}}
> In general, the internal encodings used by Avro aren't natural when reading 
> in JSON text that appears in the wild. Likewise, this utility allows changing 
> invalid Avro identifier characters into an underscore, again to tolerate JSON 
> that wasn't designed to be readable by Avro.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to