[ https://issues.apache.org/jira/browse/AVRO-806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13267465#comment-13267465 ]
alex gemini commented on AVRO-806: ---------------------------------- compression always treat everything as raw bytes,encoding and decoding only apply to a certain pattern.The encodec and decodec should be separated,for example for dictionary encoding(like email column or address column),we will decoding it only when we need exact value,if we just need to count the total number of row,the application program will tell how fileformat treat that.For simplicity,we always decoding it first. > add a column-major codec for data files > --------------------------------------- > > Key: AVRO-806 > URL: https://issues.apache.org/jira/browse/AVRO-806 > Project: Avro > Issue Type: New Feature > Components: java, spec > Reporter: Doug Cutting > Assignee: Doug Cutting > Fix For: 1.7.0 > > Attachments: AVRO-806-v2.patch, AVRO-806.patch, avro-file-columnar.pdf > > > Define a codec that, when a data file's schema is a record schema, writes > blocks within the file in column-major order. This would permit better > compression and also permit efficient skipping of fields that are not of > interest. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira