[ 
https://issues.apache.org/jira/browse/AVRO-806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13449730#comment-13449730
 ] 

Doug Cutting commented on AVRO-806:
-----------------------------------

Also, Jakob, I may have overstated things a bit above.  With a Trevni file one 
can have a higher degree of parallelism than one processor per file.  A Trevni 
file can be efficiently split into row ranges.  So a file with 1M rows could be 
processed as 10 tasks, each processing 100k rows.  Values are chunked into ~64k 
compressed blocks and only those overlapping with the specified range would 
need to be decompressed and processed by a task.
                
> add a column-major codec for data files
> ---------------------------------------
>
>                 Key: AVRO-806
>                 URL: https://issues.apache.org/jira/browse/AVRO-806
>             Project: Avro
>          Issue Type: New Feature
>          Components: java, spec
>            Reporter: Doug Cutting
>            Assignee: Doug Cutting
>         Attachments: AVRO-806.patch, AVRO-806.patch, AVRO-806-v2.patch, 
> avro-file-columnar.pdf
>
>
> Define a codec that, when a data file's schema is a record schema, writes 
> blocks within the file in column-major order.  This would permit better 
> compression and also permit efficient skipping of fields that are not of 
> interest.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to