Hello Aman Sinha, Quanlong Huang, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/16082 to look at the new patch set (#13). Change subject: IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified tables (primitive types) ...................................................................... IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified tables (primitive types) Hive ACID supports row-level DELETE and UPDATE operations on a table. It achieves it via assigning a unique row-id for each row, and maintaining two sets of files in a table. The first set is in the base/delta directories, they contain the INSERTed rows. The second set of files are in the delete-delta directories, they contain the DELETEd rows. (UPDATE operations are implemented via DELETE+INSERT.) In the filesystem it looks like e.g.: * full_acid/delta_0000001_0000001_0000/0000_0 * full_acid/delta_0000002_0000002_0000/0000_0 * full_acid/delete_delta_0000003_0000003_0000/0000_0 During scanning we need to return INSERTed rows minus DELETEd rows. This patch implements it by creating an ANTI JOIN between the INSERT and DELETE sets. It is a planner-only modification. Every HDFS SCAN that scans full ACID tables (that also have deleted rows) are converted to two HDFS SCANs, one for the INSERT deltas, and one for the DELETE deltas. Then a LEFT ANTI HASH JOIN with BROADCAST distribution mode is created above them. Later we can add support for other distribution modes if the performance requires it. E.g. if we have too many deleted rows then probably we are better off with PARTITIONED distribution mode. We could estimate the number of deleted rows by sampling the delete delta files. The current patch only works for primitive types. I.e. we cannot select nested data if the table has deleted rows. Testing: * added planner test * added e2e tests Change-Id: I15c8feabf40be1658f3dd46883f5a1b2aa5d0659 --- M common/thrift/CatalogObjects.thrift M common/thrift/CatalogService.thrift M fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java M fe/src/main/java/org/apache/impala/catalog/FeFsPartition.java M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java M fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java M fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java M fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java M fe/src/main/java/org/apache/impala/catalog/local/MetaProvider.java M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java M fe/src/main/java/org/apache/impala/planner/JoinNode.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/main/java/org/apache/impala/util/AcidUtils.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M fe/src/test/java/org/apache/impala/util/AcidUtilsTest.java M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv A testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test M testdata/workloads/functional-query/queries/QueryTest/acid-negative.test A testdata/workloads/functional-query/queries/QueryTest/full-acid-scans.test M tests/custom_cluster/test_local_catalog.py M tests/query_test/test_acid.py 27 files changed, 1,710 insertions(+), 151 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/16082/13 -- To view, visit http://gerrit.cloudera.org:8080/16082 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I15c8feabf40be1658f3dd46883f5a1b2aa5d0659 Gerrit-Change-Number: 16082 Gerrit-PatchSet: 13 Gerrit-Owner: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Reviewer: Aman Sinha <amsi...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>