[Impala-ASF-CR] IMPALA-9859: Full ACID Milestone 4: Part 2 Reading modified tables (complex types)

Zoltan Borok-Nagy (Code Review) Fri, 07 Aug 2020 05:55:28 -0700

Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16228 )


Change subject: IMPALA-9859: Full ACID Milestone 4: Part 2 Reading modified 
tables (complex types)
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16228/3/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
File fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java:

http://gerrit.cloudera.org:8080/#/c/16228/3/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@1508
PS3, Line 1508:    *   SELECT item FROM complextypestbl $a$1, $a$1.int_array;
> Thanks Zoltan.  Just to clarify further...the plans I see in our test suite
It will involve only one subplan, as such queries are not affected by this 
rewrite.

To be precise, the above query raises an error for me (because o.item is not 
scalar), but the following query has the following plan:

 [localhost:21000] functional_orc_def> explain SELECT t.id, o.item
    from functional_orc_def.complextypestbl t left outer join t.int_array o;
 PLAN-ROOT SINK
 |
 05:EXCHANGE [UNPARTITIONED]
 |
 01:SUBPLAN
 |  row-size=24B cardinality=2.57K
 |
 |--04:NESTED LOOP JOIN [RIGHT OUTER JOIN]
 |  |  row-size=24B cardinality=1
 |  |
 |  |--02:SINGULAR ROW SRC
 |  |     row-size=20B cardinality=1
 |  |
 |  03:UNNEST [t.int_array o]
 |     row-size=0B cardinality=10
 |
 00:SCAN HDFS [functional_orc_def.complextypestbl t]
    HDFS partitions=1/1 files=2 size=4.04KB
    row-size=20B cardinality=2.57K

You get the same plan if you run it on a non-transactional table.

However, if the query was like this:

 SELECT item from functional_orc_def.complextypestbl.int_array;

Then you'll get the following plan for a non-transactional table:

 PLAN-ROOT SINK
 |
 01:EXCHANGE [UNPARTITIONED]
 |
 00:SCAN HDFS [functional_parquet.complextypestbl.int_array]
    HDFS partitions=1/1 files=2 size=6.92KB
    row-size=4B cardinality=44.00K

And the following for a full ACID table:

 PLAN-ROOT SINK
 |
 05:EXCHANGE [UNPARTITIONED]
 |
 01:SUBPLAN
 |  row-size=16B cardinality=25.68K
 |
 |--04:NESTED LOOP JOIN [CROSS JOIN]
 |  |  row-size=16B cardinality=10
 |  |
 |  |--02:SINGULAR ROW SRC
 |  |     row-size=12B cardinality=1
 |  |
 |  03:UNNEST [$a$1.int_array int_array]
 |     row-size=0B cardinality=10
 |
 00:SCAN HDFS [functional_orc_def.complextypestbl $a$1]
    HDFS partitions=1/1 files=2 size=4.04KB
    predicates: !empty($a$1.int_array)
    row-size=12B cardinality=2.57K

But we cannot really avoid this without making significant changes to the 
backend, because the HDFS SCAN node in the non-transactional plan only has a 
single tuple descriptor for the collection item. To return correct results, 
we'd need to smarten up the BE scanner significantly. It would need to
* automatically read the ACID fields
* open and read all the relevant delete delta files
* only return rows that are not deleted

Instead of doing that, with this rewrite we can just create a plan that does 
everything for us. With the rewrite the scan node will have two tuple 
descriptors, one at the table level, and one for the collection items. Then in 
SingleNodePlanner we'll just add the ACID field slot refs to the table level 
tuple, the rest (the subplan) are added automatically.



--
To view, visit http://gerrit.cloudera.org:8080/16228
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8b2c6cd3d87c452c5b96a913b14c90ada78d4c6f
Gerrit-Change-Number: 16228
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <amsi...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <gaborkas...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Comment-Date: Fri, 07 Aug 2020 12:54:52 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9859: Full ACID Milestone 4: Part 2 Reading modified tables (complex types)

Reply via email to