[ 
https://issues.apache.org/jira/browse/AVRO-806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13267317#comment-13267317
 ] 

alex gemini commented on AVRO-806:
----------------------------------

1.the file header should have some extra space in case we add some column or 
append some value to block end.
2.trevni set different codec for each column,I think there is different between 
compression,decompression and encoding,decoding:for example,run length encoding 
will have high compression ratio by sort them first.we change the sequence of 
data within a minor block(say 64k) but still guarantee the whole block(say hdfs 
block 128M) will looks same as before(except that data sequence change).so,use 
different Codec for each column maybe not get better performance or compression 
ratio than a single codec for all column.I think different column need 
different encodec (for example run length encodec which we didn't implement 
yet).

                
> add a column-major codec for data files
> ---------------------------------------
>
>                 Key: AVRO-806
>                 URL: https://issues.apache.org/jira/browse/AVRO-806
>             Project: Avro
>          Issue Type: New Feature
>          Components: java, spec
>            Reporter: Doug Cutting
>            Assignee: Doug Cutting
>             Fix For: 1.7.0
>
>         Attachments: AVRO-806-v2.patch, AVRO-806.patch, avro-file-columnar.pdf
>
>
> Define a codec that, when a data file's schema is a record schema, writes 
> blocks within the file in column-major order.  This would permit better 
> compression and also permit efficient skipping of fields that are not of 
> interest.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to