[ 
https://issues.apache.org/jira/browse/IMPALA-9512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17628048#comment-17628048
 ] 

ASF subversion and git services commented on IMPALA-9512:
---------------------------------------------------------

Commit a983a347a77af74e1a9bd6156d12a020d6b4df6d in impala's branch 
refs/heads/master from Csaba Ringhofer
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=a983a347a ]

IMPALA-11682: Add tests for minor compacted insert only ACID tables

Only test changes. Minor compacted delta dirs are supported in
Impala since IMPALA-9512, but at that time Hive supported minor
compaction only on full ACID tables. Since that time Hive added
support for minor compacting insert only/MM tables (HIVE-22610).

Change-Id: I7159283f3658f2119d38bd3393729535edd0a76f
Reviewed-on: http://gerrit.cloudera.org:8080/19164
Reviewed-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com>


> Milestone 2: Validate each row against the valid write id list
> --------------------------------------------------------------
>
>                 Key: IMPALA-9512
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9512
>             Project: IMPALA
>          Issue Type: Sub-task
>            Reporter: Zoltán Borók-Nagy
>            Assignee: Zoltán Borók-Nagy
>            Priority: Major
>              Labels: impala-acid
>             Fix For: Impala 4.0.0
>
>
> Minor compactions can compact several delta directories into a single delta 
> directory. The current directory filtering algorithm needs to be modified to 
> handle minor compacted directories and prefer those over plain delta 
> directories.
> On top of that, in minor compacted directories we need to filter out rows we 
> cannot see. E.g. we can have the following delta directory:
> {noformat}
> full_acid/delta_0000001_0000010_0000/0000 # minWriteId: 1
>                                           # maxWriteId: 10
> {noformat}
> So this delta dir contains rows with write ids between 1 and 10. But maybe we 
> are only allowed to see write ids less than 5. Therefore we need to check the 
> ACID write id column (named originalTransaction) for each row to decide 
> whether this row is valid or not.
> There are several ways to optimize this. E.g. based on the min/max write ids 
> of the delta directory, and the validWriteIdList, we can decide whether we 
> need to validate the rows at all. Or, when we reach the high watermark (that 
> tells us the max valid write id) we can stop the scanner since rows are 
> ordered based on record ID.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to