Re: json2avro

2013-06-25 Thread Douglas Creager
 I wrote a little C tool using Avro-C to convert JSON to Avro and thought
 may be someone here may find it useful.
 
 https://github.com/grisha/json2avro
 
 The purpose is to be useful in converting messy legacy JSON in which
 some elements might be missing or of wrong type. Even though there is no
 schema resolution per se here, json2avro will attempt to use the default
 specified in the schema if the corresponding JSON element is missing and
 will attempt to try the types specified in a union until one succeeds.
 
 json2avro lets you pick from null, snappy, deflate and lzma codecs,
 specify a custom block size and optionally skips over JSON lines that it
 is unable to parse. I'm also thinking of adding a target max file size
 so that it would automatically split output into multiple sizes.

Very cool!  Kind of the reverse of avrocat or avropipe.  We could clean
it up and add it as another C command-line tool if you like.

 It uses Jansson as the JSON parser which is conveniently bundled with
 Avro-C. (One thing that I'm not clear on is that Jansson cannot handle
 nulls, not sure if this is a Jansson-specific limitation or something
 inherent to JSON.)

Can you elaborate on this?  Jansson should support null JSON values
(it's the keyword null, not the string value null).  And the Avro C
bindings should use that for Avro null values.

cheers
–doug



signature.asc
Description: OpenPGP digital signature


Re: json2avro

2013-06-25 Thread Gregory (Grisha) Trubetskoy


On Tue, 25 Jun 2013, Douglas Creager wrote:


json2avro lets you pick from null, snappy, deflate and lzma codecs,
specify a custom block size and optionally skips over JSON lines that it
is unable to parse. I'm also thinking of adding a target max file size
so that it would automatically split output into multiple sizes.


Very cool!  Kind of the reverse of avrocat or avropipe.  We could clean
it up and add it as another C command-line tool if you like.


Sure I'm all for it!


It uses Jansson as the JSON parser which is conveniently bundled with
Avro-C. (One thing that I'm not clear on is that Jansson cannot handle
nulls, not sure if this is a Jansson-specific limitation or something
inherent to JSON.)


Can you elaborate on this?  Jansson should support null JSON values
(it's the keyword null, not the string value null).  And the Avro C
bindings should use that for Avro null values.


Sorry I wasn't clear. Jansson uses null-terminated strings. The docs state 
Normal null terminated C strings are used, so JSON strings may not 
contain embedded null characters. I've tested it and indeed, they cannot, 
Jansson cannot parse a string like abc\udef.


Grisha


Re: json2avro

2013-06-25 Thread Douglas Creager
 Sorry I wasn't clear. Jansson uses null-terminated strings. The docs
 state Normal null terminated C strings are used, so JSON strings may
 not contain embedded null characters. I've tested it and indeed, they
 cannot, Jansson cannot parse a string like abc\udef.

Oh yes, NUL bytes in a string, gotcha.  You're right, that's my least
favorite thing about Jansson.  There's an open issue with a patch [1],
but there hasn't been any movement on it in a year, unfortunately.

[1] https://github.com/akheron/jansson/pull/63




signature.asc
Description: OpenPGP digital signature


Re: Avro Schema to SQL

2013-06-25 Thread Mason

Might be worth looking at Sqoop's source.

On 6/19/13 02:31 AM, Avinash Dongre wrote:

Is there know tool/framework available to convert Avro Schema into SQL.
If now , How Do i iterate over the schema to find out what records, 
enums are there. I can think of how to achieve this with simple 
Schema, but I am not able to figure out a way for nested schemas.




Thanks
Avinash