Avro schema and data read with it.

2014-12-17 Thread ๏̯͡๏
I have a data that is persisted in Avro format. Each record has a certain
schema and it contains 10 fields while it is persisted.

When I read the same record(s) from other process, i also specify a schema
with a subset of fields (5).

Will only 5 columns be read from disk?
or
Will all the columns be read but 5 are later discarded?
or
Are all the columns read but only five are accessible since the schema used
to read contain only five columns?

Please suggest.

Regards,
Deepak


Re: Avro schema and data read with it.

2014-12-17 Thread Doug Cutting
Avro skips over fields that were present in the writer's schema but
are no longer present in the reader's schema.  Skipping is
substantially faster than reading for most types.  For known-size
types like string, bytes, fixed, double and float the file pointer can
be incremented past skipped values.  For skipped structures like
records, maps and arrays, no memory is allocated and no stores are
made.  Avro data files are not in a columnar format however, so the
i/o and decompression of skipped fields is not generally avoided.

Doug

On Wed, Dec 17, 2014 at 7:53 AM, ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.com wrote:
 I have a data that is persisted in Avro format. Each record has a certain
 schema and it contains 10 fields while it is persisted.

 When I read the same record(s) from other process, i also specify a schema
 with a subset of fields (5).

 Will only 5 columns be read from disk?
 or
 Will all the columns be read but 5 are later discarded?
 or
 Are all the columns read but only five are accessible since the schema used
 to read contain only five columns?

 Please suggest.

 Regards,
 Deepak