Hello Norbert Luksa, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/16001 to look at the new patch set (#8). Change subject: IMPALA-9515: Full ACID Milestone 3: Read support for "original files" ...................................................................... IMPALA-9515: Full ACID Milestone 3: Read support for "original files" "Original files" are files that don't have full ACID schema. We can see such files if we upgrade a non-ACID table to full ACID. Also, the LOAD DATA statement can load non-ACID files into full ACID tables. So such files don't store special ACID columns, that means we need to auto-generate their values. These are (operation, originalTransaction, bucket, rowid, and currentTransaction). With the exception of 'rowid', all of them can be calculated based on the file path, so I add their values to the scanner's template tuple. 'rowid' is the ordinal number of the row inside a bucket inside a directory. For now Impala only allows one file per bucket per directory. Therefore we can generate row ids for each file independently. Multiple files in a single bucket in a directory can only be present if the table was non-transactional earlier and we upgraded it to full ACID table. After the first compaction we should only see one original file per bucket per directory. In HdfsOrcScanner we calculate the first row id for our split then the OrcStructReader fills the rowid slot with the proper values. Testing: * added e2e tests to check if the generated values are correct * added e2e test to reject tables that have multiple files per bucket * added unit tests to the new auxiliary functions Change-Id: I176497ef9873ed7589bd3dee07d048a42dfad953 --- M be/src/exec/acid-metadata-utils-test.cc M be/src/exec/acid-metadata-utils.cc M be/src/exec/acid-metadata-utils.h M be/src/exec/hdfs-orc-scanner.cc M be/src/exec/hdfs-orc-scanner.h M be/src/exec/orc-column-readers.cc M be/src/exec/orc-column-readers.h M be/src/exec/orc-metadata-utils.cc M be/src/exec/orc-metadata-utils.h M testdata/data/README A testdata/data/alltypes_non_acid.orc M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv M testdata/workloads/functional-query/queries/QueryTest/acid-negative.test A testdata/workloads/functional-query/queries/QueryTest/full-acid-original-file.test M tests/query_test/test_acid.py 16 files changed, 713 insertions(+), 80 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/01/16001/8 -- To view, visit http://gerrit.cloudera.org:8080/16001 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I176497ef9873ed7589bd3dee07d048a42dfad953 Gerrit-Change-Number: 16001 Gerrit-PatchSet: 8 Gerrit-Owner: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Norbert Luksa <norbert.lu...@cloudera.com>