[ https://issues.apache.org/jira/browse/IMPALA-9512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17117593#comment-17117593 ]
Zoltán Borók-Nagy commented on IMPALA-9512: ------------------------------------------- Thanks [~stakiar]. I uploaded a CR for review: [https://gerrit.cloudera.org/#/c/15988/] I skip this test on every filesystem other than HDFS because it simulates Hive Streaming V2, and it only works on HDFS currently (since it needs APPEND). > Milestone 2: Validate each row against the valid write id list > -------------------------------------------------------------- > > Key: IMPALA-9512 > URL: https://issues.apache.org/jira/browse/IMPALA-9512 > Project: IMPALA > Issue Type: Sub-task > Reporter: Zoltán Borók-Nagy > Assignee: Zoltán Borók-Nagy > Priority: Major > Labels: impala-acid > Fix For: Impala 4.0 > > > Minor compactions can compact several delta directories into a single delta > directory. The current directory filtering algorithm needs to be modified to > handle minor compacted directories and prefer those over plain delta > directories. > On top of that, in minor compacted directories we need to filter out rows we > cannot see. E.g. we can have the following delta directory: > {noformat} > full_acid/delta_0000001_0000010_0000/0000 # minWriteId: 1 > # maxWriteId: 10 > {noformat} > So this delta dir contains rows with write ids between 1 and 10. But maybe we > are only allowed to see write ids less than 5. Therefore we need to check the > ACID write id column (named originalTransaction) for each row to decide > whether this row is valid or not. > There are several ways to optimize this. E.g. based on the min/max write ids > of the delta directory, and the validWriteIdList, we can decide whether we > need to validate the rows at all. Or, when we reach the high watermark (that > tells us the max valid write id) we can stop the scanner since rows are > ordered based on record ID. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org