[jira] [Commented] (HIVE-2604) Add UberCompressor Serde/Codec to contrib which allows per-column compression strategies
[ https://issues.apache.org/jira/browse/HIVE-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13450345#comment-13450345 ] Uma Maheswara Rao G commented on HIVE-2604: --- Hi Yongqiang, Any reason for holding this off from commit? > Add UberCompressor Serde/Codec to contrib which allows per-column compression > strategies > > > Key: HIVE-2604 > URL: https://issues.apache.org/jira/browse/HIVE-2604 > Project: Hive > Issue Type: Sub-task > Components: Contrib >Reporter: Krishna Kumar >Assignee: Krishna Kumar > Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2604.D1011.1.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2604.D1011.2.patch, HIVE-2604.v0.patch, > HIVE-2604.v1.patch, HIVE-2604.v2.patch > > > The strategies supported are > 1. using a specified codec on the column > 2. using a specific codec on the column which is serialized via a specific > serde > 3. using a specific "TypeSpecificCompressor" instance -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2604) Add UberCompressor Serde/Codec to contrib which allows per-column compression strategies
[ https://issues.apache.org/jira/browse/HIVE-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205731#comment-13205731 ] He Yongqiang commented on HIVE-2604: +1, will commit after tests pass > Add UberCompressor Serde/Codec to contrib which allows per-column compression > strategies > > > Key: HIVE-2604 > URL: https://issues.apache.org/jira/browse/HIVE-2604 > Project: Hive > Issue Type: Sub-task > Components: Contrib >Reporter: Krishna Kumar >Assignee: Krishna Kumar > Attachments: HIVE-2604.D1011.1.patch, HIVE-2604.D1011.2.patch, > HIVE-2604.v0.patch, HIVE-2604.v1.patch, HIVE-2604.v2.patch > > > The strategies supported are > 1. using a specified codec on the column > 2. using a specific codec on the column which is serialized via a specific > serde > 3. using a specific "TypeSpecificCompressor" instance -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2604) Add UberCompressor Serde/Codec to contrib which allows per-column compression strategies
[ https://issues.apache.org/jira/browse/HIVE-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13188338#comment-13188338 ] Phabricator commented on HIVE-2604: --- krishnakumar has commented on the revision "HIVE-2604 [jira] Add UberCompressor Serde/Codec to contrib which allows per-column compression strategies". INLINE COMMENTS contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/UberCompressionCodec.java:33 This, itself, is an implementation of the ComressionCodec interface. The only important part of the class are the createInputStream/createOutputStream methods. The dummyCompressor is needed for conforming to the interface. contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/UberCompressionInputStream.java:70 Will add comments. The method is called readFromCompressor as it is reading from the inputreader created off a type-specific compressor. I can rename it to readFromInputReader? If you mean the copying annotated by the FIXME, yes, it can be avoided by having an outputstream on an existing buffer. Did not find a readymade class for that, I will create one. contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/UberCompressionInputStream.java:101 This is the second case (in the jira description) where the user specifies a custom serde+codec to be used for compressing a specific column. So we need to deserialize and reserialize here. contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/UberCompressorUtils.java:38 I needed a simple read/write on outputstream. WritableUtils implements a more complicated mechanism which prefers smaller values. contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/dsalg/Tuple.java:1 data structures and algorithms! contrib/src/test/queries/clientpositive/ubercompressor.q:4 The configs are modelled on existing config for compression, so I guess that means that all output tables will be compressed using the same config? The codec and its child classes do not have access to table/partition, right? How would we populate the metastore from codec implementation classes? REVISION DETAIL https://reviews.facebook.net/D1011 > Add UberCompressor Serde/Codec to contrib which allows per-column compression > strategies > > > Key: HIVE-2604 > URL: https://issues.apache.org/jira/browse/HIVE-2604 > Project: Hive > Issue Type: Sub-task > Components: Contrib >Reporter: Krishna Kumar >Assignee: Krishna Kumar > Attachments: HIVE-2604.D1011.1.patch, HIVE-2604.v0.patch, > HIVE-2604.v1.patch, HIVE-2604.v2.patch > > > The strategies supported are > 1. using a specified codec on the column > 2. using a specific codec on the column which is serialized via a specific > serde > 3. using a specific "TypeSpecificCompressor" instance -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2604) Add UberCompressor Serde/Codec to contrib which allows per-column compression strategies
[ https://issues.apache.org/jira/browse/HIVE-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13185756#comment-13185756 ] Phabricator commented on HIVE-2604: --- heyongqiang has commented on the revision "HIVE-2604 [jira] Add UberCompressor Serde/Codec to contrib which allows per-column compression strategies". INLINE COMMENTS contrib/src/test/queries/clientpositive/ubercompressor.q:4 setting a bunch of compression config here is fine for single insert. But how about multi-insert queries? Can u put these configs to table/partition object? And that will make things easy to debug. (if u want to do in a followup, please open a follow up jira.) contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/dsalg/Tuple.java:1 what is the package name "dsalg" contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/UberCompressorUtils.java:38 just curious, can WritableUtils be used here? contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/UberCompressionCodec.java:33 How is this class used? Can it be defined as an interface? DummyCompressor inside it is not doing anything. contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/UberCompressionInputStream.java:70 can u add more comments here? If i understand correctly, it is doing read and decompression here. But there is readFromCompressor. Should it be readFromDecompressor()? And there is some bytes transfer and copied involved here. Can that be avoided? contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/UberCompressionInputStream.java:101 why is the serde involved here? It is deserializing and serializing again here... REVISION DETAIL https://reviews.facebook.net/D1011 > Add UberCompressor Serde/Codec to contrib which allows per-column compression > strategies > > > Key: HIVE-2604 > URL: https://issues.apache.org/jira/browse/HIVE-2604 > Project: Hive > Issue Type: Sub-task > Components: Contrib >Reporter: Krishna Kumar >Assignee: Krishna Kumar > Attachments: HIVE-2604.D1011.1.patch, HIVE-2604.v0.patch, > HIVE-2604.v1.patch, HIVE-2604.v2.patch > > > The strategies supported are > 1. using a specified codec on the column > 2. using a specific codec on the column which is serialized via a specific > serde > 3. using a specific "TypeSpecificCompressor" instance -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2604) Add UberCompressor Serde/Codec to contrib which allows per-column compression strategies
[ https://issues.apache.org/jira/browse/HIVE-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13185348#comment-13185348 ] He Yongqiang commented on HIVE-2604: looking. > Add UberCompressor Serde/Codec to contrib which allows per-column compression > strategies > > > Key: HIVE-2604 > URL: https://issues.apache.org/jira/browse/HIVE-2604 > Project: Hive > Issue Type: Sub-task > Components: Contrib >Reporter: Krishna Kumar >Assignee: Krishna Kumar > Attachments: HIVE-2604.D1011.1.patch, HIVE-2604.v0.patch, > HIVE-2604.v1.patch, HIVE-2604.v2.patch > > > The strategies supported are > 1. using a specified codec on the column > 2. using a specific codec on the column which is serialized via a specific > serde > 3. using a specific "TypeSpecificCompressor" instance -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2604) Add UberCompressor Serde/Codec to contrib which allows per-column compression strategies
[ https://issues.apache.org/jira/browse/HIVE-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171514#comment-13171514 ] jirapos...@reviews.apache.org commented on HIVE-2604: - --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3075/ --- (Updated 2011-12-17 10:41:45.367761) Review request for hive and Yongqiang He. Changes --- Closed the two gaps - support for arbitrary types, and stats Summary --- Add UberCompressor Serde/Codec to contrib which allows per-column compression strategies - gaps - supports only certain complex types - stats This addresses bug HIVE-2604. https://issues.apache.org/jira/browse/HIVE-2604 Diffs (updated) - contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/InputReader.java PRE-CREATION contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/OutputWriter.java PRE-CREATION contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/TypeSpecificCompressor.java PRE-CREATION contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/UberCompressionCodec.java PRE-CREATION contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/UberCompressionInputStream.java PRE-CREATION contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/UberCompressionOutputStream.java PRE-CREATION contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/UberCompressorColumnConfig.java PRE-CREATION contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/UberCompressorConfig.java PRE-CREATION contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/UberCompressorSerde.java PRE-CREATION contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/UberCompressorSerdeField.java PRE-CREATION contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/UberCompressorUtils.java PRE-CREATION contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/compressors/DummyIntegerCompressor.java PRE-CREATION contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/dsalg/Tuple.java PRE-CREATION contrib/src/test/queries/clientpositive/ubercompressor.q PRE-CREATION contrib/src/test/results/clientpositive/ubercompressor.q.out PRE-CREATION Diff: https://reviews.apache.org/r/3075/diff Testing --- test added Thanks, Krishna > Add UberCompressor Serde/Codec to contrib which allows per-column compression > strategies > > > Key: HIVE-2604 > URL: https://issues.apache.org/jira/browse/HIVE-2604 > Project: Hive > Issue Type: Sub-task > Components: Query Processor, Serializers/Deserializers >Reporter: Krishna Kumar >Assignee: Krishna Kumar > Attachments: HIVE-2604.v0.patch, HIVE-2604.v1.patch, > HIVE-2604.v2.patch > > > The strategies supported are > 1. using a specified codec on the column > 2. using a specific codec on the column which is serialized via a specific > serde > 3. using a specific "TypeSpecificCompressor" instance -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2604) Add UberCompressor Serde/Codec to contrib which allows per-column compression strategies
[ https://issues.apache.org/jira/browse/HIVE-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13165815#comment-13165815 ] Krishna Kumar commented on HIVE-2604: - :) I used the word Uber, not in the sense of 'super' but, as [Wikipedia def] 'Über also translates to over, above, meta, but mainly in compound words.'; that is, highlighting the fact that this is not a 'real' compressor but a wrapper on other existing compressors/codecs. Anyway, no hangups, can bulk rename if necessary. > Add UberCompressor Serde/Codec to contrib which allows per-column compression > strategies > > > Key: HIVE-2604 > URL: https://issues.apache.org/jira/browse/HIVE-2604 > Project: Hive > Issue Type: Sub-task > Components: Query Processor, Serializers/Deserializers >Reporter: Krishna Kumar >Assignee: Krishna Kumar > Attachments: HIVE-2604.v0.patch, HIVE-2604.v1.patch > > > The strategies supported are > 1. using a specified codec on the column > 2. using a specific codec on the column which is serialized via a specific > serde > 3. using a specific "TypeSpecificCompressor" instance -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2604) Add UberCompressor Serde/Codec to contrib which allows per-column compression strategies
[ https://issues.apache.org/jira/browse/HIVE-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13165283#comment-13165283 ] Edward Capriolo commented on HIVE-2604: --- I think this is a +1 idea, but I am -1 on the name 'Uber' has to go. The name should describe what the class does ie ColumnarCompressor or PerColumnCompressor. The problem is everything in hive is Uber cool anyway so every class would have to be named as such. > Add UberCompressor Serde/Codec to contrib which allows per-column compression > strategies > > > Key: HIVE-2604 > URL: https://issues.apache.org/jira/browse/HIVE-2604 > Project: Hive > Issue Type: Sub-task > Components: Query Processor, Serializers/Deserializers >Reporter: Krishna Kumar >Assignee: Krishna Kumar > Attachments: HIVE-2604.v0.patch, HIVE-2604.v1.patch > > > The strategies supported are > 1. using a specified codec on the column > 2. using a specific codec on the column which is serialized via a specific > serde > 3. using a specific "TypeSpecificCompressor" instance -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2604) Add UberCompressor Serde/Codec to contrib which allows per-column compression strategies
[ https://issues.apache.org/jira/browse/HIVE-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13165125#comment-13165125 ] jirapos...@reviews.apache.org commented on HIVE-2604: - --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3075/ --- Review request for hive. Summary --- Add UberCompressor Serde/Codec to contrib which allows per-column compression strategies - gaps - supports only certain complex types - stats This addresses bug HIVE-2604. https://issues.apache.org/jira/browse/HIVE-2604 Diffs - contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/compressors/DummyIntegerCompressor.java PRE-CREATION contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/dsalg/Tuple.java PRE-CREATION contrib/src/test/queries/clientpositive/ubercompressor.q PRE-CREATION contrib/src/test/results/clientpositive/ubercompressor.q.out PRE-CREATION contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/UberCompressorUtils.java PRE-CREATION contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/UberCompressorColumnConfig.java PRE-CREATION contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/UberCompressorConfig.java PRE-CREATION contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/UberCompressorSerde.java PRE-CREATION contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/UberCompressionOutputStream.java PRE-CREATION contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/UberCompressionInputStream.java PRE-CREATION contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/InputReader.java PRE-CREATION contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/OutputWriter.java PRE-CREATION contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/TypeSpecificCompressor.java PRE-CREATION contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/UberCompressionCodec.java PRE-CREATION Diff: https://reviews.apache.org/r/3075/diff Testing --- test added Thanks, Krishna > Add UberCompressor Serde/Codec to contrib which allows per-column compression > strategies > > > Key: HIVE-2604 > URL: https://issues.apache.org/jira/browse/HIVE-2604 > Project: Hive > Issue Type: Sub-task > Components: Query Processor, Serializers/Deserializers >Reporter: Krishna Kumar >Assignee: Krishna Kumar > Attachments: HIVE-2604.v0.patch, HIVE-2604.v1.patch > > > The strategies supported are > 1. using a specified codec on the column > 2. using a specific codec on the column which is serialized via a specific > serde > 3. using a specific "TypeSpecificCompressor" instance -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2604) Add UberCompressor Serde/Codec to contrib which allows per-column compression strategies
[ https://issues.apache.org/jira/browse/HIVE-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13162709#comment-13162709 ] Krishna Kumar commented on HIVE-2604: - The current implementation works as follows: - Adds a serde UberCompressorSerde, which is used to convert the cell values to bytes - Adds a codec UberCompressionCodec which uses user-specific config to compress each block of column values through one of three possible mechanisms - Config for the column: "codec:" - Apply a CompressionCodec on the UberCompressorSerde serialized bytestream - Config for the column: "codec:," Re-serialize the bytestream through serdename and then apply codecname on it - Config for the column: "compressor:" compress the cell values by sending them through the type-specific compressor - [As a future enhancement, the config, say if is "dynamic", can let the codec decide the mechanism on the current block stats/previous seen blocks] The idea is to maintain the ability to use a serde/codec combination (as we do now) for any columns which are not 'interesting' and use type-specific compressors only for special columns. Type-specific compressor is also an extension point only; no implementation attached to this jira. Have attached one sample compressor to HIVE-2623, while many others are possible. > Add UberCompressor Serde/Codec to contrib which allows per-column compression > strategies > > > Key: HIVE-2604 > URL: https://issues.apache.org/jira/browse/HIVE-2604 > Project: Hive > Issue Type: Sub-task > Components: Query Processor, Serializers/Deserializers >Reporter: Krishna Kumar >Assignee: Krishna Kumar > Attachments: HIVE-2604.v0.patch > > > The strategies supported are > 1. using a specified codec on the column > 2. using a specific codec on the column which is serialized via a specific > serde > 3. using a specific "TypeSpecificCompressor" instance -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2604) Add UberCompressor Serde/Codec to contrib which allows per-column compression strategies
[ https://issues.apache.org/jira/browse/HIVE-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156758#comment-13156758 ] Krishna Kumar commented on HIVE-2604: - Sure. I'll add some of the compressors in a day or two. > Add UberCompressor Serde/Codec to contrib which allows per-column compression > strategies > > > Key: HIVE-2604 > URL: https://issues.apache.org/jira/browse/HIVE-2604 > Project: Hive > Issue Type: Sub-task > Components: Query Processor, Serializers/Deserializers >Reporter: Krishna Kumar >Assignee: Krishna Kumar > Attachments: HIVE-2604.v0.patch > > > The strategies supported are > 1. using a specified codec on the column > 2. using a specific codec on the column which is serialized via a specific > serde > 3. using a specific "TypeSpecificCompressor" instance -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2604) Add UberCompressor Serde/Codec to contrib which allows per-column compression strategies
[ https://issues.apache.org/jira/browse/HIVE-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156311#comment-13156311 ] He Yongqiang commented on HIVE-2604: Can u give some examples of such compressors? so we can also try that. > Add UberCompressor Serde/Codec to contrib which allows per-column compression > strategies > > > Key: HIVE-2604 > URL: https://issues.apache.org/jira/browse/HIVE-2604 > Project: Hive > Issue Type: Sub-task > Components: Query Processor, Serializers/Deserializers >Reporter: Krishna Kumar >Assignee: Krishna Kumar > Attachments: HIVE-2604.v0.patch > > > The strategies supported are > 1. using a specified codec on the column > 2. using a specific codec on the column which is serialized via a specific > serde > 3. using a specific "TypeSpecificCompressor" instance -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira