[ 
https://issues.apache.org/jira/browse/AVRO-160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12795529#action_12795529
 ] 

Andrew Purtell commented on AVRO-160:
-------------------------------------

Some quick comments from over on HBASE-2055:

- I see that SYNC_INTERVAL is a constant. Should be configurable? We want 64k, 
others might want different?

- Looking at the most recent patch (2009-12-30 10:35 PM), DataFileWriter will 
hold up to SYNC_INTERVAL bytes in a buffer before writing out the block, via 
writeBlock(). We want to hsync after a group of related commits in our write 
ahead log whether SYNC_INTERVAL is reached or not, but also have the stream 
marked with a sync marker at each SYNC_INTERVAL. Some kind of flush method that 
forces writeBlock() would work.

- What happens if the first block is not available but others are? It makes 
sense to me not to support changing the schema mid-file, but does it make sense 
to put the schema in multiple places, like super blocks in ext3?


> file format should be friendly to streaming
> -------------------------------------------
>
>                 Key: AVRO-160
>                 URL: https://issues.apache.org/jira/browse/AVRO-160
>             Project: Avro
>          Issue Type: Improvement
>          Components: spec
>            Reporter: Doug Cutting
>            Assignee: Doug Cutting
>         Attachments: AVRO-160-python.patch, AVRO-160.patch, AVRO-160.patch, 
> AVRO-160.patch, AVRO-160.patch
>
>
> It should be possible to stream through an Avro data file without seeking to 
> the end.
> Currently the interpretation is that schemas written to the file apply to all 
> entries before them.  If this were changed so that they instead apply to all 
> entries that follow, and the initial schema is written at the start of the 
> file, then streaming could be supported.
> Note that the only change permitted to a schema as a file is written is to, 
> if it is a union, to add new branches at the end of that union.  If it is not 
> a union, no changes may be made.  So it is still the case that the final 
> schema in a file can read every entry in the file and thus may be used to 
> randomly access the file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to