On May 14, 2010, at 12:17 AM, Thiruvalluvan M. G. wrote: > > What you are looking for is the inner record called Element within the > AvroTuple. I named the outer record AvroTuple because I wrote IDLs for > Protocol Buffers and Thrift and wanted the class names to be unambiguous. > > The tuple should actually be an array rather than a record. But since arrays > cannot be named in Avro, I wrapped the array with a record. Please note > wrapping objects by records in Avro does not cost anything in the binary > format.
It doesn't cost on the serialization size but currently it costs a lot on the performance side. > I use the same technique to represent more than one type by a single > Avro type. For instance Pig's string and Pig's BigCharArray are both > represented by Avro string. I use the a record to distinguish between them. > > Does it solve your problem? Yes, it helps a lot. One question remains, how can I construct a recursive schema programmatically? I have a couple options for the pig Tuple avro schema -- write it in JSON and put that in the source code or programmatically construct it. I'm currently programmatically constructing a schema specific to the Pig schema that is serialized, which is straightforward until I hit the map type and recursion. Thanks, -Scott
