[jira] [Comment Edited] (HIVE-10438) Architecture for ResultSet Compression via external plugin

Xuefu Zhang (JIRA) Sun, 28 Jun 2015 10:13:51 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-10438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14604704#comment-14604704
 ]


Xuefu Zhang edited comment on HIVE-10438 at 6/28/15 5:12 PM:
-------------------------------------------------------------

Here are some of my high-level thoughts:

1. I don't think Hive needs to support multiple compressors at the same time. 
This is very unlikely in a real production scenario, though different users 
might choose different compression technologies (i.e. snappy vs lzo). For 
simplicity, we should start just one. Thus, we need to two flags on server 
side: #1, enable/disable compression; #2, the class name (some sort of 
identifier) of the compressor.

2. JDBC client should be able to specify whether to use result set compression. 
This can be done via a hiveconf variable specified in JdBC connection string 
<hiveConfs> section below:
{code}
jdbc:hive2://<host>:<port>/<dbName>;<sessionConfs>?<hiveConfs>#<hiveVars>
{code}
An example of this variable can be "hive.client.use.resultset.compression".

3. When updating patch, please choose "update" patch instead of "add file" so 
as to make it easy to see diffs between the patches.

4. A default implementation such as via Snappy would be nice.

5. Have some testcases using the default implementation and verifying result.


was (Author: xuefuz):
Here are some of my high-level thoughts:

1. I don't think Hive needs to support multiple compressors at the same time. 
This is very unlikely in a real production scenario, though different users 
might choose different compression technologies (i.e. snappy vs lzo). For 
simplicity, we should start just one. Thus, we need to two flags on server 
side: #1, enable/disable compression; #2, the class name (some sort of 
identifier) of the compressor.

2. JDBC client should be able to specify whether to use result set compression. 
This can be done via a hiveconf variable specified in JdBC connection string 
<hiveConfs> section below:
{code}
jdbc:hive2://<host>:<port>/<dbName>;<sessionConfs>?<hiveConfs>#<hiveVars>
{code}
An example of this variable can be "hive.client.use.resultset.compression".

3. When updating patch, please choose "update" patch instead of "add file" so 
as to make it easy to see diffs between the patches.


> Architecture for  ResultSet Compression via external plugin
> -----------------------------------------------------------
>
>                 Key: HIVE-10438
>                 URL: https://issues.apache.org/jira/browse/HIVE-10438
>             Project: Hive
>          Issue Type: New Feature
>          Components: Hive, Thrift API
>    Affects Versions: 1.2.0
>            Reporter: Rohit Dholakia
>            Assignee: Rohit Dholakia
>              Labels: patch
>         Attachments: HIVE-10438-1.patch, HIVE-10438.patch, 
> Proposal-rscompressor.pdf, README.txt, 
> Results_Snappy_protobuf_TBinary_TCompact.pdf, hs2ResultSetCompressor.zip, 
> hs2driver-master.zip
>
>
> This JIRA proposes an architecture for enabling ResultSet compression which 
> uses an external plugin. 
> The patch has three aspects to it: 
> 0. An architecture for enabling ResultSet compression with external plugins
> 1. An example plugin to demonstrate end-to-end functionality 
> 2. A container to allow everyone to write and test ResultSet compressors with 
> a query submitter (https://github.com/xiaom/hs2driver) 
> Also attaching a design document explaining the changes, experimental results 
> document, and a pdf explaining how to setup the docker container to observe 
> end-to-end functionality of ResultSet compression. 
> https://reviews.apache.org/r/35792/ Review board link. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HIVE-10438) Architecture for ResultSet Compression via external plugin

Reply via email to