How does Avro mark (string) field delimition?

2012-01-23 Thread Andrew Kenworthy
I have looked at the Avro 1.6.0 code and am not sure how Avro distinguishes between field boundaries when reading null values. The BinaryEncoder class (which is where I land when debugging my code) has an empty method for writeNull: how does the parser then distinguuish between adjacent

Re: How does Avro mark (string) field delimition?

2012-01-23 Thread Andrew Kenworthy
I don't have a specific use-class that is problematic, but was trying to understand how it all works internally. Following your comment about indexes I looked in GenericDatumWriter and sure enough the union is tagged so we know which part of the union was written: case UNION:         int index

using avro schemas to select columns (abusing versioning?)

2012-01-23 Thread Koert Kuipers
we are working on a very sparse table with say 500 columns where we do batch uploads that typically only contain a subset of the columns (say 100), and we run multiple map-reduce queries on subsets of the columns (typically less than 50 columns go into a single map-reduce job). my question is the

Re: using avro schemas to select columns (abusing versioning?)

2012-01-23 Thread Doug Cutting
On 01/23/2012 02:18 PM, Koert Kuipers wrote: is this considered abuse of avro's versioning capabilities? Not at all. Using a subset of the fields in Avro is called projection and can provide significant performance improvements. Doug

AVRO-981 - Removed snappy as requirement

2012-01-23 Thread Russell Jurney
https://issues.apache.org/jira/browse/AVRO-981 I took Joe Crobak's advice and removed snappy as a dependency in the python client for avro. With the patch in AVRO-981 applied, Avro installs, builds and functions on Mac OS X. -- Russell Jurney twitter.com/rjurney russell.jur...@gmail.com