How does Avro mark (string) field delimition?

2012-01-23 Thread Andrew Kenworthy
I have looked at the Avro 1.6.0 code and am not sure how Avro distinguishes 
between field boundaries when reading null values.

The BinaryEncoder class (which is where I land when debugging my code) has an 
empty method for writeNull: how does the parser then distinguuish between 
adjacent nullable fields when reading that data?

Thanks in advance,

Andrew

Re: How does Avro mark (string) field delimition?

2012-01-23 Thread Andrew Kenworthy
I don't have a specific use-class that is problematic, but was trying to 
understand how it all works internally. Following your comment about indexes I 
looked in GenericDatumWriter and sure enough the union is tagged so we know 
which part of the union was written:

case UNION:
        int index = data.resolveUnion(schema, datum);
        out.writeIndex(index);
        write(schema.getTypes().get(index), datum, out);
        break;

That's the bit I was missing! Thanks for the input.

Andrew




 From: Harsh J ha...@cloudera.com
To: user@avro.apache.org; Andrew Kenworthy adwkenwor...@yahoo.com 
Sent: Monday, January 23, 2012 4:04 PM
Subject: Re: How does Avro mark (string) field delimition?
 
The read part is empty as well, when the decoder is asked to read a
'null' type. For null carrying unions, I believe an index is written
out so if the index evals to a null, the same logic works yet again.

Does not matter if there are two nulls adjacent to one another,
therefore. How do you imagine this ends up being a problem? What
trouble are you running into?

On Mon, Jan 23, 2012 at 8:08 PM, Andrew Kenworthy
adwkenwor...@yahoo.com wrote:
 I have looked at the Avro 1.6.0 code and am not sure how Avro distinguishes
 between field boundaries when reading null values.

 The BinaryEncoder class (which is where I land when debugging my code) has
 an empty method for writeNull: how does the parser then distinguuish between
 adjacent nullable fields when reading that data?

 Thanks in advance,

 Andrew



-- 
Harsh J
Customer Ops. Engineer, Cloudera




using avro schemas to select columns (abusing versioning?)

2012-01-23 Thread Koert Kuipers
we are working on a very sparse table with say 500 columns where we do
batch uploads that typically only contain a subset of the columns (say
100), and we run multiple map-reduce queries on subsets of the columns
(typically less than 50 columns go into a single map-reduce job).

my question is the following: if i use avro, do i ever actually need the
use the full schema of the table?

if i understand avro correctly, then the batch uploads could simply add
avro files with the schema reflective of the columns that are in the file
(as opposed to first inserting many nulls into the data and then saving it
with the full schema).

the queries could also simply query with the schema that is reflective of
the query (as opposed to querying with the full schema with 500 columns and
then picking out the relevant columns).

as long as i provide defaults of null in the query schemas, i think this
would work! correct? is this considered abuse of avro's versioning
capabilities?


Re: using avro schemas to select columns (abusing versioning?)

2012-01-23 Thread Doug Cutting
On 01/23/2012 02:18 PM, Koert Kuipers wrote:
 is this considered abuse of avro's versioning capabilities?

Not at all.  Using a subset of the fields in Avro is called projection
and can provide significant performance improvements.

Doug


AVRO-981 - Removed snappy as requirement

2012-01-23 Thread Russell Jurney
https://issues.apache.org/jira/browse/AVRO-981

I took Joe Crobak's advice and removed snappy as a dependency in the python
client for avro.  With the patch in AVRO-981 applied, Avro installs, builds
and functions on Mac OS X.

-- 
Russell Jurney
twitter.com/rjurney
russell.jur...@gmail.com
datasyndrome.com