[
https://issues.apache.org/jira/browse/AVRO-672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12918348#action_12918348
]
Doug Cutting commented on AVRO-672:
-----------------------------------
> I like the idea of having tools that manipulate "traditional" data formats
> into avro records, including guessing at the schema.
Do you think Ron's patch here is a good example of this that we should commit?
I worry that such tools might do 90% of what each application wants and require
constant tweaking. And each tweak might break other users. So a tool has to
either have lots of flexibility or be lossless. But perhaps I'm just
paranoid...
> Convert JSON Text Input to Avro Tool
> ------------------------------------
>
> Key: AVRO-672
> URL: https://issues.apache.org/jira/browse/AVRO-672
> Project: Avro
> Issue Type: New Feature
> Reporter: Ron Bodkin
> Attachments: AVRO-672.patch, AVRO-672.patch
>
>
> The attached patch allows reading a JSON-formatted text file in, converting
> to a conforming Avro text file, emitting one record per line, e.g., it can
> read this input file:
> {"intval":12}
> {"intval":-73,"strval":"hello, there!!"}
> with this schema:
> { "type":"record", "name":"TestRecord", "fields": [
> {"name":"intval","type":"int"}, {"name":"strval","type":["string", "null"]}]}
> returning valid Avro. This is different than the DataFileWriteTool, which
> would read in the following internal encoding:
> {"intval":12,"strval":null}
> {"intval":-73,"strval":{"string":"hello, there!!"}}
> In general, the internal encodings used by Avro aren't natural when reading
> in JSON text that appears in the wild. Likewise, this utility allows changing
> invalid Avro identifier characters into an underscore, again to tolerate JSON
> that wasn't designed to be readable by Avro.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.