Hello Aman Sinha, Gabor Kaszab, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/16228 to look at the new patch set (#7). Change subject: IMPALA-9859: Full ACID Milestone 4: Part 2 Reading modified tables (complex types) ...................................................................... IMPALA-9859: Full ACID Milestone 4: Part 2 Reading modified tables (complex types) This implements scanning full ACID tables that contain complex types. The same technique works that we use for primitive types. I.e. we add a LEFT ANTI JOIN on top of the Hdfs scan node in order to subtract the deleted rows from the inserted rows. However, there were some types of queries where we couldn't do that. These are the queries that scan the nested collection items directly. E.g.: SELECT item FROM complextypestbl.int_array; The above query only creates a single tuple descriptor that holds the collection items. Since this tuple descriptor is not at the table-level, we cannot add slot references to the hidden ACID column which are at the top level of the table schema. To resolve this I added a statement rewriter that rewrites the above statement to the following: SELECT item FROM complextypestbl $a$1, $a$1.int_array; Now in this example we'll have two tuple descriptors, one for the table-level, and one for the collection item. So we can add the ACID slot refs to the table-level tuple descriptor. The rewrite is implemented by the new AcidRewriter class. Performance I executed the following query with num_nodes=1 on a non-transactional table (without the rewrite), and on an ACID table (with the rewrite): select count(*) from customer_nested.c_orders.o_lineitems; Without the rewrite: Fetched 1 row(s) in 0.41s +--------------+--------+-------+----------+----------+-------+------------+----------+---------------+---------------------------------------------------+ | Operator | #Hosts | #Inst | Avg Time | Max Time | #Rows | Est. #Rows | Peak Mem | Est. Peak Mem | Detail | +--------------+--------+-------+----------+----------+-------+------------+----------+---------------+---------------------------------------------------+ | F00:ROOT | 1 | 1 | 13.61us | 13.61us | | | 0 B | 0 B | | | 01:AGGREGATE | 1 | 1 | 3.68ms | 3.68ms | 1 | 1 | 16.00 KB | 10.00 MB | FINALIZE | | 00:SCAN HDFS | 1 | 1 | 280.47ms | 280.47ms | 6.00M | 15.00M | 56.98 MB | 8.00 MB | tpch_nested_orc_def.customer.c_orders.o_lineitems | +--------------+--------+-------+----------+----------+-------+------------+----------+---------------+---------------------------------------------------+ With the rewrite: Fetched 1 row(s) in 0.42s +---------------------------+--------+-------+----------+----------+---------+------------+----------+---------------+---------------------------------------+ | Operator | #Hosts | #Inst | Avg Time | Max Time | #Rows | Est. #Rows | Peak Mem | Est. Peak Mem | Detail | +---------------------------+--------+-------+----------+----------+---------+------------+----------+---------------+---------------------------------------+ | F00:ROOT | 1 | 1 | 25.16us | 25.16us | | | 0 B | 0 B | | | 05:AGGREGATE | 1 | 1 | 3.44ms | 3.44ms | 1 | 1 | 63.00 KB | 10.00 MB | FINALIZE | | 01:SUBPLAN | 1 | 1 | 16.52ms | 16.52ms | 6.00M | 125.92M | 47.00 KB | 0 B | | | |--04:NESTED LOOP JOIN | 1 | 1 | 188.47ms | 188.47ms | 0 | 10 | 24.00 KB | 12 B | CROSS JOIN | | | |--02:SINGULAR ROW SRC | 1 | 1 | 0ns | 0ns | 0 | 1 | 0 B | 0 B | | | | 03:UNNEST | 1 | 1 | 25.37ms | 25.37ms | 0 | 10 | 0 B | 0 B | $a$1.c_orders.o_lineitems o_lineitems | | 00:SCAN HDFS | 1 | 1 | 96.26ms | 96.26ms | 100.00K | 12.59M | 38.19 MB | 72.00 MB | default.customer_nested $a$1 | +---------------------------+--------+-------+----------+----------+---------+------------+----------+---------------+---------------------------------------+ So the overhead is very little. Testing * Added planner tests to PlannerTest/acid-scans.test * E2E query tests to QueryTest/full-acid-complex-type-scans.test * E2E tests for rowid-generation: QueryTest/full-acid-rowid.test Change-Id: I8b2c6cd3d87c452c5b96a913b14c90ada78d4c6f --- M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/FromClause.java M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java M fe/src/main/java/org/apache/impala/analysis/TableRef.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv M testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test M testdata/workloads/functional-query/queries/QueryTest/acid-negative.test A testdata/workloads/functional-query/queries/QueryTest/full-acid-complex-type-scans.test M testdata/workloads/functional-query/queries/QueryTest/full-acid-rowid.test M tests/query_test/test_acid.py 13 files changed, 923 insertions(+), 48 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/28/16228/7 -- To view, visit http://gerrit.cloudera.org:8080/16228 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8b2c6cd3d87c452c5b96a913b14c90ada78d4c6f Gerrit-Change-Number: 16228 Gerrit-PatchSet: 7 Gerrit-Owner: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Reviewer: Aman Sinha <amsi...@cloudera.com> Gerrit-Reviewer: Gabor Kaszab <gaborkas...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>