[ 
https://issues.apache.org/jira/browse/AVRO-806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13448281#comment-13448281
 ] 

Jakob Homan commented on AVRO-806:
----------------------------------

Yes, that's all reasonable.  My concern is just enforcing a 1:1:1 relationship 
between row groups, blocks and files.  RCFile's very tiny recommended row group 
size (4mb, I believe), certainly don't make sense from an IO perspective.  But 
if our only ability to increase parallelism on trevni files is to decrease the 
size of row groups (and correspondingly increase the number of files), this may 
be a problem.  It's not required to enforce a 1:1:1 relationship in the file; 
one could still have row groups large enough to make it worth the IO (and still 
split on block boundaries), but have multiple of them within a single trevni 
file.  This could certainly be supported as an option.

Either way, this is looking good.  
                
> add a column-major codec for data files
> ---------------------------------------
>
>                 Key: AVRO-806
>                 URL: https://issues.apache.org/jira/browse/AVRO-806
>             Project: Avro
>          Issue Type: New Feature
>          Components: java, spec
>            Reporter: Doug Cutting
>            Assignee: Doug Cutting
>         Attachments: AVRO-806.patch, AVRO-806-v2.patch, avro-file-columnar.pdf
>
>
> Define a codec that, when a data file's schema is a record schema, writes 
> blocks within the file in column-major order.  This would permit better 
> compression and also permit efficient skipping of fields that are not of 
> interest.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to