[ https://issues.apache.org/jira/browse/AVRO-806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13448281#comment-13448281 ]
Jakob Homan commented on AVRO-806: ---------------------------------- Yes, that's all reasonable. My concern is just enforcing a 1:1:1 relationship between row groups, blocks and files. RCFile's very tiny recommended row group size (4mb, I believe), certainly don't make sense from an IO perspective. But if our only ability to increase parallelism on trevni files is to decrease the size of row groups (and correspondingly increase the number of files), this may be a problem. It's not required to enforce a 1:1:1 relationship in the file; one could still have row groups large enough to make it worth the IO (and still split on block boundaries), but have multiple of them within a single trevni file. This could certainly be supported as an option. Either way, this is looking good. > add a column-major codec for data files > --------------------------------------- > > Key: AVRO-806 > URL: https://issues.apache.org/jira/browse/AVRO-806 > Project: Avro > Issue Type: New Feature > Components: java, spec > Reporter: Doug Cutting > Assignee: Doug Cutting > Attachments: AVRO-806.patch, AVRO-806-v2.patch, avro-file-columnar.pdf > > > Define a codec that, when a data file's schema is a record schema, writes > blocks within the file in column-major order. This would permit better > compression and also permit efficient skipping of fields that are not of > interest. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira