[jira] [Commented] (HIVE-9660) store end offset of compressed data for RG in RowIndex in ORC

Sergey Shelukhin (JIRA) Mon, 25 Apr 2016 15:02:26 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-9660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15257127#comment-15257127
 ]


Sergey Shelukhin commented on HIVE-9660:
----------------------------------------

That is pretty much it. There are some more detailed descriptions in the 
comments. The two complex bits are the integer writers that have their separate 
caches, so one needs to be aware when accounting for a CB that, even though 
some RGs might be fully written, their values could still be in the integer 
writer literals array (or a similar place), and not in this CB. 
Another is the string writer, which is logically simple (we save index entries 
as before, only this time we have to make sure when writing stuff out that we 
maintain a correct set of active RGs for those CB callbacks), but a little bit 
involved code-wise.

I'll look at test failures, I think the last patch was supposed to pass all the 
tests before rebase, probably some stupid error.

> store end offset of compressed data for RG in RowIndex in ORC
> -------------------------------------------------------------
>
>                 Key: HIVE-9660
>                 URL: https://issues.apache.org/jira/browse/HIVE-9660
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>         Attachments: HIVE-9660.01.patch, HIVE-9660.02.patch, 
> HIVE-9660.03.patch, HIVE-9660.04.patch, HIVE-9660.05.patch, 
> HIVE-9660.06.patch, HIVE-9660.07.patch, HIVE-9660.07.patch, 
> HIVE-9660.08.patch, HIVE-9660.09.patch, HIVE-9660.10.patch, 
> HIVE-9660.10.patch, HIVE-9660.patch, HIVE-9660.patch
>
>
> Right now the end offset is estimated, which in some cases results in tons of 
> extra data being read.
> We can add a separate array to RowIndex (positions_v2?) that stores number of 
> compressed buffers for each RG, or end offset, or something, to remove this 
> estimation magic



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9660) store end offset of compressed data for RG in RowIndex in ORC

Reply via email to