[ https://issues.apache.org/jira/browse/AVRO-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12806072#action_12806072 ]
Scott Carey commented on AVRO-380: ---------------------------------- bq. I would prefer we use longs instead of ints for both the count and the block size to better future-proof the format. 2GB isn't as big as it used to be and it's only getting smaller. I don't think there are significant performance benefits to using int over long here. Sure. I had assumed that we would never support blocks larger than 2GB because the file format is not designed for large blocks (since the size and count have to be known in advance for appends-only writes). But it is trivial to make it use a long instead just in case, and as you say its not a performance concern. The code will break however if the size ever gets larger than an int, and it is not possible for the count to get larger than the size. Is it worth unit testing those corner conditions? I don't think it is. I'll add javadoc that makes it clear that very large blocks are not recommended. > Avro Container File format change: add block size to block descriptor > ---------------------------------------------------------------------- > > Key: AVRO-380 > URL: https://issues.apache.org/jira/browse/AVRO-380 > Project: Avro > Issue Type: Improvement > Components: doc, java, spec > Affects Versions: 1.3.0 > Reporter: Scott Carey > Fix For: 1.3.0 > > Attachments: AVRO-380.patch > > > The new file format in AVRO-160 limits a few use cases that I have found to > be important. > A block currently contains a count of the number of records, the block data, > and a sync marker. > This change would add the block size, in bytes, along side the number of > records. > This allows efficient access to a block's data without the need to decode the > data into individual Datums, which is useful for various use cases. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.