[ https://issues.apache.org/jira/browse/HIVE-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13188338#comment-13188338 ]
Phabricator commented on HIVE-2604: ----------------------------------- krishnakumar has commented on the revision "HIVE-2604 [jira] Add UberCompressor Serde/Codec to contrib which allows per-column compression strategies". INLINE COMMENTS contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/UberCompressionCodec.java:33 This, itself, is an implementation of the ComressionCodec interface. The only important part of the class are the createInputStream/createOutputStream methods. The dummyCompressor is needed for conforming to the interface. contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/UberCompressionInputStream.java:70 Will add comments. The method is called readFromCompressor as it is reading from the inputreader created off a type-specific compressor. I can rename it to readFromInputReader? If you mean the copying annotated by the FIXME, yes, it can be avoided by having an outputstream on an existing buffer. Did not find a readymade class for that, I will create one. contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/UberCompressionInputStream.java:101 This is the second case (in the jira description) where the user specifies a custom serde+codec to be used for compressing a specific column. So we need to deserialize and reserialize here. contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/UberCompressorUtils.java:38 I needed a simple read/write on outputstream. WritableUtils implements a more complicated mechanism which prefers smaller values. contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/dsalg/Tuple.java:1 data structures and algorithms! contrib/src/test/queries/clientpositive/ubercompressor.q:4 The configs are modelled on existing config for compression, so I guess that means that all output tables will be compressed using the same config? The codec and its child classes do not have access to table/partition, right? How would we populate the metastore from codec implementation classes? REVISION DETAIL https://reviews.facebook.net/D1011 > Add UberCompressor Serde/Codec to contrib which allows per-column compression > strategies > ---------------------------------------------------------------------------------------- > > Key: HIVE-2604 > URL: https://issues.apache.org/jira/browse/HIVE-2604 > Project: Hive > Issue Type: Sub-task > Components: Contrib > Reporter: Krishna Kumar > Assignee: Krishna Kumar > Attachments: HIVE-2604.D1011.1.patch, HIVE-2604.v0.patch, > HIVE-2604.v1.patch, HIVE-2604.v2.patch > > > The strategies supported are > 1. using a specified codec on the column > 2. using a specific codec on the column which is serialized via a specific > serde > 3. using a specific "TypeSpecificCompressor" instance -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira