Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/15818 )
Change subject: IMPALA-9512: Full ACID Milestone 2: Validate rows against the valid write id list ...................................................................... IMPALA-9512: Full ACID Milestone 2: Validate rows against the valid write id list Minor compactions can compact several delta directories into a single delta directory. The current directory filtering algorithm had to be modified to handle minor compacted directories and prefer those over plain delta directories. This happens in the Frontend, mostly in AcidUtils.java. Hive Streaming Ingestion writes similar delta directories, but they might contain rows Impala cannot see based on its valid write id list. E.g. we can have the following delta directory: full_acid/delta_0000001_0000010/0000 # minWriteId: 1 # maxWriteId: 10 This delta dir contains rows with write ids between 1 and 10. But maybe we are only allowed to see write ids less than 5. Therefore we need to check the ACID write id column (named originalTransaction) to determine which rows are valid. Delta directories written by Hive Streaming don't have a visibility txn id, so we can recognize them based on the directory name. If there's a visibilityTxnId and it is committed => every row is valid: full_acid/delta_0000001_0000010_v01234 # has visibilityTxnId # every row is valid If there's no visibilityTxnId then it was created via Hive Streaming, therefore we need to validate rows. Fortunately Hive Streaming writes rows with different write ids into different ORC stripes, therefore we don't need to validate the write id per row. If we had statistics, we could validate per stripe, but since Hive Streaming doesn't write statistics we validate the write id per ORC row batch (an alternative could be to do a 2-pass read, first we'd read a single value from each stripe's 'currentTransaction' field, then we'd read the stripe if the write id is valid). Testing * the frontend logic is tested in AcidUtilsTest * the backend row validation is tested in test_acid_row_validation Change-Id: I5ed74585a2d73ebbcee763b0545be4412926299d Reviewed-on: http://gerrit.cloudera.org:8080/15818 Reviewed-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com> --- M be/src/exec/CMakeLists.txt A be/src/exec/acid-metadata-utils-test.cc A be/src/exec/acid-metadata-utils.cc A be/src/exec/acid-metadata-utils.h M be/src/exec/hdfs-orc-scanner.cc M be/src/exec/hdfs-orc-scanner.h M be/src/exec/orc-column-readers.cc M be/src/exec/orc-column-readers.h M be/src/exec/orc-metadata-utils.cc M be/src/exec/orc-metadata-utils.h M be/src/runtime/descriptors.cc M be/src/runtime/descriptors.h M common/thrift/CatalogObjects.thrift M common/thrift/CatalogService.thrift M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java M fe/src/main/java/org/apache/impala/analysis/StmtMetadataLoader.java M fe/src/main/java/org/apache/impala/catalog/DataSourceTable.java M fe/src/main/java/org/apache/impala/catalog/FeTable.java M fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java M fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java M fe/src/main/java/org/apache/impala/catalog/local/MetaProvider.java M fe/src/main/java/org/apache/impala/util/AcidUtils.java M fe/src/test/java/org/apache/impala/analysis/StmtMetadataLoaderTest.java M fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java M fe/src/test/java/org/apache/impala/util/AcidUtilsTest.java M testdata/bin/generate-schema-statements.py M testdata/data/README A testdata/data/streaming.orc M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv M testdata/workloads/functional-query/queries/QueryTest/acid-negative.test A testdata/workloads/functional-query/queries/QueryTest/acid-row-validation-0.test A testdata/workloads/functional-query/queries/QueryTest/acid-row-validation-1.test A testdata/workloads/functional-query/queries/QueryTest/acid-row-validation-2.test M testdata/workloads/functional-query/queries/QueryTest/acid.test M testdata/workloads/functional-query/queries/QueryTest/full-acid-rowid.test A tests/query_test/test_acid_row_validation.py A tests/util/acid_txn.py 43 files changed, 1,425 insertions(+), 189 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/15818 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I5ed74585a2d73ebbcee763b0545be4412926299d Gerrit-Change-Number: 15818 Gerrit-PatchSet: 16 Gerrit-Owner: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>