Re: AvroStorage pig adapter

Scott Carey Fri, 14 May 2010 11:52:20 -0700

On May 14, 2010, at 12:17 AM, Thiruvalluvan M. G. wrote:

> 
> What you are looking for is the inner record called Element within the
> AvroTuple. I named the outer record AvroTuple because I wrote IDLs for
> Protocol Buffers and Thrift and wanted the class names to be unambiguous.
> 
> The tuple should actually be an array rather than a record. But since arrays
> cannot be named in Avro, I wrapped the array with a record. Please note
> wrapping objects by records in Avro does not cost anything in the binary
> format.


It doesn't cost on the serialization size but currently it costs a lot on the 
performance side.

> I use the same technique to represent more than one type by a single
> Avro type. For instance Pig's string and Pig's BigCharArray are both
> represented by Avro string. I use the a record to distinguish between them.
> 
> Does it solve your problem?

Yes, it helps a lot.  One question remains, how can I construct a recursive 
schema programmatically?
I have a couple options for the pig Tuple avro schema -- write it in JSON and 
put that in the source code or programmatically construct it.
I'm currently programmatically constructing a schema specific to the Pig schema 
that is serialized, which is straightforward until I hit the map type and 
recursion.

Thanks,

-Scott

Re: AvroStorage pig adapter

Reply via email to