[
https://issues.apache.org/jira/browse/HIVE-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13188338#comment-13188338
]
Phabricator commented on HIVE-2604:
-----------------------------------
krishnakumar has commented on the revision "HIVE-2604 [jira] Add UberCompressor
Serde/Codec to contrib which allows per-column compression strategies".
INLINE COMMENTS
contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/UberCompressionCodec.java:33
This, itself, is an implementation of the ComressionCodec interface. The only
important part of the class are the createInputStream/createOutputStream
methods. The dummyCompressor is needed for conforming to the interface.
contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/UberCompressionInputStream.java:70
Will add comments.
The method is called readFromCompressor as it is reading from the inputreader
created off a type-specific compressor. I can rename it to readFromInputReader?
If you mean the copying annotated by the FIXME, yes, it can be avoided by
having an outputstream on an existing buffer. Did not find a readymade class
for that, I will create one.
contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/UberCompressionInputStream.java:101
This is the second case (in the jira description) where the user specifies a
custom serde+codec to be used for compressing a specific column. So we need to
deserialize and reserialize here.
contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/UberCompressorUtils.java:38
I needed a simple read/write on outputstream. WritableUtils implements a more
complicated mechanism which prefers smaller values.
contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/dsalg/Tuple.java:1
data structures and algorithms!
contrib/src/test/queries/clientpositive/ubercompressor.q:4 The configs are
modelled on existing config for compression, so I guess that means that all
output tables will be compressed using the same config?
The codec and its child classes do not have access to table/partition, right?
How would we populate the metastore from codec implementation classes?
REVISION DETAIL
https://reviews.facebook.net/D1011
> Add UberCompressor Serde/Codec to contrib which allows per-column compression
> strategies
> ----------------------------------------------------------------------------------------
>
> Key: HIVE-2604
> URL: https://issues.apache.org/jira/browse/HIVE-2604
> Project: Hive
> Issue Type: Sub-task
> Components: Contrib
> Reporter: Krishna Kumar
> Assignee: Krishna Kumar
> Attachments: HIVE-2604.D1011.1.patch, HIVE-2604.v0.patch,
> HIVE-2604.v1.patch, HIVE-2604.v2.patch
>
>
> The strategies supported are
> 1. using a specified codec on the column
> 2. using a specific codec on the column which is serialized via a specific
> serde
> 3. using a specific "TypeSpecificCompressor" instance
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira