Tamas Mate has uploaded a new patch set (#15). ( 
http://gerrit.cloudera.org:8080/20010 )

Change subject: IMPALA-11996: Scanner change for Iceberg metadata querying
......................................................................

IMPALA-11996: Scanner change for Iceberg metadata querying

This commit adds a scan node for querying Iceberg metadata tables. The
scan node creates a Java scanner object that creates and scans the
metadata table. The scanner uses the Iceberg API to scan the table after
that the scan node fetches the rows one by one and materialises them
into RowBatches. The Iceberg row reader on the backend does the
translation between Iceberg and Impala types.

There is only one fragment created to query the Iceberg metadata table
which is supposed to be executed on the coordinator node that already
has the Iceberg table loaded. This way there is no need for further
table loading on the executor side.

This change will not cover nested column types, these slots are set to
NULL, it will be done in IMPALA-12205.

Testing:
 - Added e2e tests for querying metadata tables
 - Updated planner tests

Performance testing:
Created a table and inserted ~5500 rows one by one, this generated
~270000 ALL_MANIFESTS metadata table records. This table is quite wide
and has a String column as well.

I only mention count(*) test on ALL_MANIFESTS, because every row is
materialized in every scenario currently:
  - Cold cache: 15.76s
    - IcebergApiScanTime: 124.407ms
    - MaterializeTupleTime: 8s368ms
  - Warm cache: 7.56s
    - IcebergApiScanTime: 3.646ms
    - MaterializeTupleTime: 7s477ms

Change-Id: I0e943cecd77f5ef7af7cd07e2b596f2c5b4331e7
---
M be/CMakeLists.txt
M be/src/exec/CMakeLists.txt
M be/src/exec/exec-node.cc
A be/src/exec/iceberg-metadata/CMakeLists.txt
A be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.cc
A be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h
A be/src/exec/iceberg-metadata/iceberg-row-reader.cc
A be/src/exec/iceberg-metadata/iceberg-row-reader.h
M be/src/scheduling/scheduler.cc
M be/src/service/frontend.cc
M be/src/service/frontend.h
M be/src/service/impalad-main.cc
M be/src/util/jni-util.cc
M be/src/util/jni-util.h
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/IcebergMetadataTableRef.java
M fe/src/main/java/org/apache/impala/analysis/Path.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergMetadataTable.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/IcebergMetadataScanNode.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
A fe/src/main/java/org/apache/impala/util/IcebergMetadataScanner.java
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-metadata-table-scan.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test
M tests/authorization/test_ranger.py
M tests/query_test/test_iceberg.py
32 files changed, 1,419 insertions(+), 167 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/10/20010/15
--
To view, visit http://gerrit.cloudera.org:8080/20010
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0e943cecd77f5ef7af7cd07e2b596f2c5b4331e7
Gerrit-Change-Number: 20010
Gerrit-PatchSet: 15
Gerrit-Owner: Tamas Mate <tma...@apache.org>
Gerrit-Reviewer: Anonymous Coward <lipeng...@apache.org>
Gerrit-Reviewer: Gabor Kaszab <gaborkas...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <g.furnst...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pro...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tma...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>

Reply via email to