[ 
https://issues.apache.org/jira/browse/AVRO-160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12795161#action_12795161
 ] 

Doug Cutting commented on AVRO-160:
-----------------------------------

Jeff> The most recent patch seems to write both the length of the block in 
number of entries as well as bytes.

Yes.  I've vacillated on that.  The existing code does not use the byte count, 
but I suspect when we add compression codecs the length will be useful.  If we 
wish to support a pluggable codec API, we could either make it stream-based or 
buffer-based.  If we have block lengths written, then the codec API can be a 
simple buffer-based API like 'byte[] compress(byte[]); byte[] 
decompress(byte[])'.  But if we don't have block lengths written, then the 
contract for codec plugins is more complex, so I'm leaning towards them.  This 
would make it really easy to add, e.g., a FastLZ codec (AVRO-135).

> file format should be friendly to streaming
> -------------------------------------------
>
>                 Key: AVRO-160
>                 URL: https://issues.apache.org/jira/browse/AVRO-160
>             Project: Avro
>          Issue Type: Improvement
>          Components: spec
>            Reporter: Doug Cutting
>            Assignee: Doug Cutting
>         Attachments: AVRO-160-python.patch, AVRO-160.patch, AVRO-160.patch, 
> AVRO-160.patch
>
>
> It should be possible to stream through an Avro data file without seeking to 
> the end.
> Currently the interpretation is that schemas written to the file apply to all 
> entries before them.  If this were changed so that they instead apply to all 
> entries that follow, and the initial schema is written at the start of the 
> file, then streaming could be supported.
> Note that the only change permitted to a schema as a file is written is to, 
> if it is a union, to add new branches at the end of that union.  If it is not 
> a union, no changes may be made.  So it is still the case that the final 
> schema in a file can read every entry in the file and thus may be used to 
> randomly access the file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to