[jira] [Commented] (HIVE-2604) Add UberCompressor Serde/Codec to contrib which allows per-column compression strategies

Phabricator (Commented) (JIRA) Wed, 18 Jan 2012 00:46:26 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13188338#comment-13188338
 ]


Phabricator commented on HIVE-2604:
-----------------------------------

krishnakumar has commented on the revision "HIVE-2604 [jira] Add UberCompressor 
Serde/Codec to contrib which allows per-column compression strategies".

INLINE COMMENTS
  
contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/UberCompressionCodec.java:33
 This, itself, is an implementation of the ComressionCodec interface. The only 
important part of the class are the createInputStream/createOutputStream 
methods. The dummyCompressor is needed for conforming to the interface.
  
contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/UberCompressionInputStream.java:70
 Will add comments.

  The method is called readFromCompressor as it is reading from the inputreader 
created off a type-specific compressor. I can rename it to readFromInputReader?

  If you mean the copying annotated by the FIXME, yes, it can be avoided by 
having an outputstream on an existing buffer. Did not find a readymade class 
for that, I will create one.
  
contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/UberCompressionInputStream.java:101
 This is the second case (in the jira description) where the user specifies a 
custom serde+codec to be used for compressing a specific column. So we need to 
deserialize and reserialize here.
  
contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/UberCompressorUtils.java:38
 I needed a simple read/write on outputstream. WritableUtils implements a more 
complicated mechanism which prefers smaller values.
  
contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/dsalg/Tuple.java:1
 data structures and algorithms!
  contrib/src/test/queries/clientpositive/ubercompressor.q:4 The configs are 
modelled on existing config for compression, so I guess that means that all 
output tables will be compressed using the same config?

  The codec and its child classes do not have access to table/partition, right? 
How would we populate the metastore from codec implementation classes?

REVISION DETAIL
  https://reviews.facebook.net/D1011

                
> Add UberCompressor Serde/Codec to contrib which allows per-column compression 
> strategies
> ----------------------------------------------------------------------------------------
>
>                 Key: HIVE-2604
>                 URL: https://issues.apache.org/jira/browse/HIVE-2604
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Contrib
>            Reporter: Krishna Kumar
>            Assignee: Krishna Kumar
>         Attachments: HIVE-2604.D1011.1.patch, HIVE-2604.v0.patch, 
> HIVE-2604.v1.patch, HIVE-2604.v2.patch
>
>
> The strategies supported are
> 1. using a specified codec on the column
> 2. using a specific codec on the column which is serialized via a specific 
> serde
> 3. using a specific "TypeSpecificCompressor" instance

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2604) Add UberCompressor Serde/Codec to contrib which allows per-column compression strategies

Reply via email to