Re: json2avro
I wrote a little C tool using Avro-C to convert JSON to Avro and thought may be someone here may find it useful. https://github.com/grisha/json2avro The purpose is to be useful in converting messy legacy JSON in which some elements might be missing or of wrong type. Even though there is no schema resolution per se here, json2avro will attempt to use the default specified in the schema if the corresponding JSON element is missing and will attempt to try the types specified in a union until one succeeds. json2avro lets you pick from null, snappy, deflate and lzma codecs, specify a custom block size and optionally skips over JSON lines that it is unable to parse. I'm also thinking of adding a target max file size so that it would automatically split output into multiple sizes. Very cool! Kind of the reverse of avrocat or avropipe. We could clean it up and add it as another C command-line tool if you like. It uses Jansson as the JSON parser which is conveniently bundled with Avro-C. (One thing that I'm not clear on is that Jansson cannot handle nulls, not sure if this is a Jansson-specific limitation or something inherent to JSON.) Can you elaborate on this? Jansson should support null JSON values (it's the keyword null, not the string value null). And the Avro C bindings should use that for Avro null values. cheers –doug signature.asc Description: OpenPGP digital signature
Re: json2avro
On Tue, 25 Jun 2013, Douglas Creager wrote: json2avro lets you pick from null, snappy, deflate and lzma codecs, specify a custom block size and optionally skips over JSON lines that it is unable to parse. I'm also thinking of adding a target max file size so that it would automatically split output into multiple sizes. Very cool! Kind of the reverse of avrocat or avropipe. We could clean it up and add it as another C command-line tool if you like. Sure I'm all for it! It uses Jansson as the JSON parser which is conveniently bundled with Avro-C. (One thing that I'm not clear on is that Jansson cannot handle nulls, not sure if this is a Jansson-specific limitation or something inherent to JSON.) Can you elaborate on this? Jansson should support null JSON values (it's the keyword null, not the string value null). And the Avro C bindings should use that for Avro null values. Sorry I wasn't clear. Jansson uses null-terminated strings. The docs state Normal null terminated C strings are used, so JSON strings may not contain embedded null characters. I've tested it and indeed, they cannot, Jansson cannot parse a string like abc\udef. Grisha
Re: json2avro
Sorry I wasn't clear. Jansson uses null-terminated strings. The docs state Normal null terminated C strings are used, so JSON strings may not contain embedded null characters. I've tested it and indeed, they cannot, Jansson cannot parse a string like abc\udef. Oh yes, NUL bytes in a string, gotcha. You're right, that's my least favorite thing about Jansson. There's an open issue with a patch [1], but there hasn't been any movement on it in a year, unfortunately. [1] https://github.com/akheron/jansson/pull/63 signature.asc Description: OpenPGP digital signature
Re: Avro Schema to SQL
Might be worth looking at Sqoop's source. On 6/19/13 02:31 AM, Avinash Dongre wrote: Is there know tool/framework available to convert Avro Schema into SQL. If now , How Do i iterate over the schema to find out what records, enums are there. I can think of how to achieve this with simple Schema, but I am not able to figure out a way for nested schemas. Thanks Avinash