[ 
https://issues.apache.org/jira/browse/HIVE-819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756881#action_12756881
 ] 

He Yongqiang commented on HIVE-819:
-----------------------------------

>>Can you briefly summarize the current approach of how decompression is done 
>>and the your proposal to the lazy decompression? Also more comments in the 
>>code would be much helpful.
np. Currently compression is eager. The needed columns info is passed into 
reader, and the reader will skip unneeded columns and only read needed columns 
into memory and decompress them immediately when they are read.
Lazy decompression is done by not decompress needed columns at the first place, 
just hold the uncompressed bytes in memory, and pass a call back object to 
BytesRefWritable. The patch added an interface LazyDecompressionCallback, and 
RCFile's reader implemented it as a LazyDecompressionCallbackImpl. 
LazyDecompressionCallback is used to constuct BytesRefWritable, and when 
BytesRefWritable.getData() etc is called(that's the entry between 
ColumnSerde,ColumnStruct and BytesRefWritable) when need to convert underlying 
bytes to objects, the call back method is invoked and decompression happens.

>>Does the performance regression by 4 secs with the query predicate duration > 
>>8 consistent or intermittent?
intermittent. i tested it more times after did the comments. 
>>If the latter, what method of timing are you using?
i just submit a simple hive select query in local mode and use the query finish 
time.

> Add lazy decompress ability to RCFile
> -------------------------------------
>
>                 Key: HIVE-819
>                 URL: https://issues.apache.org/jira/browse/HIVE-819
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor, Serializers/Deserializers
>            Reporter: He Yongqiang
>            Assignee: He Yongqiang
>             Fix For: 0.5.0
>
>         Attachments: hive-819-2009-9-12.patch
>
>
> This is especially useful for a filter scanning. 
> For example, for query 'select a, b, c from table_rc_lazydecompress where 
> a>1;' we only need to decompress the block data of b,c columns when one row's 
> column 'a' in that block satisfies the filter condition.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to