Steinar Knutsen wrote:
Avro supports skip information, but it
is somewhat inefficient to skip across a block of an array, a record or a
map, if any of these contain a variable length object. The headers only
contain the number of objects contained, not the length in bytes.

Arrays and maps can optionally encode their length in bytes. If the item count is a negative number, then -count is the actual count and immediately following the count is the size in bytes. Java's BlockingBinaryEncoder implements this. All decoders must implement it.

http://hadoop.apache.org/avro/docs/current/api/java/org/apache/avro/io/BlockingBinaryEncoder.html

It does not emit a size for every array and map, but only for arrays and maps whose contained size exceeds a threshold, so the overhead of adding the size is limited. It also splits arrays and maps that are larger than can be buffered as a whole into a sequence of blocks.

Doug

Reply via email to