Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/16228 )
Change subject: IMPALA-9859: Full ACID Milestone 4: Part 2 Reading modified tables (complex types) ...................................................................... Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/16228/3/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java File fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java: http://gerrit.cloudera.org:8080/#/c/16228/3/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@1508 PS3, Line 1508: * SELECT item FROM complextypestbl $a$1, $a$1.int_array; > Thanks Zoltan. Just to clarify further...the plans I see in our test suite It will involve only one subplan, as such queries are not affected by this rewrite. To be precise, the above query raises an error for me (because o.item is not scalar), but the following query has the following plan: [localhost:21000] functional_orc_def> explain SELECT t.id, o.item from functional_orc_def.complextypestbl t left outer join t.int_array o; PLAN-ROOT SINK | 05:EXCHANGE [UNPARTITIONED] | 01:SUBPLAN | row-size=24B cardinality=2.57K | |--04:NESTED LOOP JOIN [RIGHT OUTER JOIN] | | row-size=24B cardinality=1 | | | |--02:SINGULAR ROW SRC | | row-size=20B cardinality=1 | | | 03:UNNEST [t.int_array o] | row-size=0B cardinality=10 | 00:SCAN HDFS [functional_orc_def.complextypestbl t] HDFS partitions=1/1 files=2 size=4.04KB row-size=20B cardinality=2.57K You get the same plan if you run it on a non-transactional table. However, if the query was like this: SELECT item from functional_orc_def.complextypestbl.int_array; Then you'll get the following plan for a non-transactional table: PLAN-ROOT SINK | 01:EXCHANGE [UNPARTITIONED] | 00:SCAN HDFS [functional_parquet.complextypestbl.int_array] HDFS partitions=1/1 files=2 size=6.92KB row-size=4B cardinality=44.00K And the following for a full ACID table: PLAN-ROOT SINK | 05:EXCHANGE [UNPARTITIONED] | 01:SUBPLAN | row-size=16B cardinality=25.68K | |--04:NESTED LOOP JOIN [CROSS JOIN] | | row-size=16B cardinality=10 | | | |--02:SINGULAR ROW SRC | | row-size=12B cardinality=1 | | | 03:UNNEST [$a$1.int_array int_array] | row-size=0B cardinality=10 | 00:SCAN HDFS [functional_orc_def.complextypestbl $a$1] HDFS partitions=1/1 files=2 size=4.04KB predicates: !empty($a$1.int_array) row-size=12B cardinality=2.57K But we cannot really avoid this without making significant changes to the backend, because the HDFS SCAN node in the non-transactional plan only has a single tuple descriptor for the collection item. To return correct results, we'd need to smarten up the BE scanner significantly. It would need to * automatically read the ACID fields * open and read all the relevant delete delta files * only return rows that are not deleted Instead of doing that, with this rewrite we can just create a plan that does everything for us. With the rewrite the scan node will have two tuple descriptors, one at the table level, and one for the collection items. Then in SingleNodePlanner we'll just add the ACID field slot refs to the table level tuple, the rest (the subplan) are added automatically. -- To view, visit http://gerrit.cloudera.org:8080/16228 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8b2c6cd3d87c452c5b96a913b14c90ada78d4c6f Gerrit-Change-Number: 16228 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Reviewer: Aman Sinha <amsi...@cloudera.com> Gerrit-Reviewer: Gabor Kaszab <gaborkas...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Comment-Date: Fri, 07 Aug 2020 12:54:52 +0000 Gerrit-HasComments: Yes