[ 
https://issues.apache.org/jira/browse/AVRO-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12806072#action_12806072
 ] 

Scott Carey commented on AVRO-380:
----------------------------------

bq. I would prefer we use longs instead of ints for both the count and the 
block size to better future-proof the format. 2GB isn't as big as it used to be 
and it's only getting smaller. I don't think there are significant performance 
benefits to using int over long here. 

Sure. I had assumed that we would never support blocks larger than 2GB because 
the file format is not designed for large blocks (since the size and count have 
to be known in advance for appends-only writes).  But it is trivial to make it 
use a long instead just in case, and as you say its not a performance concern.  

The code will break however if the size ever gets larger than an int, and it is 
not possible for the count to get larger than the size.  Is it worth unit 
testing those corner conditions?  I don't think it is.  I'll add javadoc that 
makes it clear that very large blocks are not recommended.


> Avro Container File format change:  add block size to block descriptor
> ----------------------------------------------------------------------
>
>                 Key: AVRO-380
>                 URL: https://issues.apache.org/jira/browse/AVRO-380
>             Project: Avro
>          Issue Type: Improvement
>          Components: doc, java, spec
>    Affects Versions: 1.3.0
>            Reporter: Scott Carey
>             Fix For: 1.3.0
>
>         Attachments: AVRO-380.patch
>
>
> The new file format in AVRO-160 limits a few use cases that I have found to 
> be important.
> A block currently contains a count of the number of records, the block data, 
> and a sync marker.  
> This change would add the block size, in bytes, along side the number of 
> records.   
> This allows efficient access to a block's data without the need to decode the 
> data into individual Datums, which is useful for various use cases.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to