[ https://issues.apache.org/jira/browse/AVRO-160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12795529#action_12795529 ]
Andrew Purtell commented on AVRO-160: ------------------------------------- Some quick comments from over on HBASE-2055: - I see that SYNC_INTERVAL is a constant. Should be configurable? We want 64k, others might want different? - Looking at the most recent patch (2009-12-30 10:35 PM), DataFileWriter will hold up to SYNC_INTERVAL bytes in a buffer before writing out the block, via writeBlock(). We want to hsync after a group of related commits in our write ahead log whether SYNC_INTERVAL is reached or not, but also have the stream marked with a sync marker at each SYNC_INTERVAL. Some kind of flush method that forces writeBlock() would work. - What happens if the first block is not available but others are? It makes sense to me not to support changing the schema mid-file, but does it make sense to put the schema in multiple places, like super blocks in ext3? > file format should be friendly to streaming > ------------------------------------------- > > Key: AVRO-160 > URL: https://issues.apache.org/jira/browse/AVRO-160 > Project: Avro > Issue Type: Improvement > Components: spec > Reporter: Doug Cutting > Assignee: Doug Cutting > Attachments: AVRO-160-python.patch, AVRO-160.patch, AVRO-160.patch, > AVRO-160.patch, AVRO-160.patch > > > It should be possible to stream through an Avro data file without seeking to > the end. > Currently the interpretation is that schemas written to the file apply to all > entries before them. If this were changed so that they instead apply to all > entries that follow, and the initial schema is written at the start of the > file, then streaming could be supported. > Note that the only change permitted to a schema as a file is written is to, > if it is a union, to add new branches at the end of that union. If it is not > a union, no changes may be made. So it is still the case that the final > schema in a file can read every entry in the file and thus may be used to > randomly access the file. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.