[Impala-ASF-CR] IMPALA-12726: Simulate large-scale query in TpcdsCpuCostPlannerTest

2024-02-04 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20922 )

Change subject: IMPALA-12726: Simulate large-scale query in 
TpcdsCpuCostPlannerTest
..


Patch Set 7:

Thank you Wenzhe and Quanlong!


--
To view, visit http://gerrit.cloudera.org:8080/20922
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaffddd70c2da8376ca6c40f65606bbac46c34de7
Gerrit-Change-Number: 20922
Gerrit-PatchSet: 7
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Sun, 04 Feb 2024 22:38:01 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12726: Simulate large-scale query in TpcdsCpuCostPlannerTest

2024-02-04 Thread Riza Suminto (Code Review)
Riza Suminto has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/20922 )

Change subject: IMPALA-12726: Simulate large-scale query in 
TpcdsCpuCostPlannerTest
..

IMPALA-12726: Simulate large-scale query in TpcdsCpuCostPlannerTest

Querying against large-scale databases is a good way for testing Impala.
However, it is impractical to do in a single-node development machine.

Frontend testing does not run the test query in the backend executor and
can benefit from simulated large-scale test cases. This patch attempts
to do it by instrumenting the CatalogD metadata loading code to scale
tpcds_partitioned_parquet_snap by injecting column stats from a 3TB
TPC-DS dataset in TpcdsCpuCostPlannerTest.

The large-scale column stats are expressed in stats-3TB.json, taken by
running "SHOW COLUMN STATS" and "DESCRIBE FORMATTED" queries on a 3TB
dataset loaded using impala-tpcds-kit. It is parsed and then
piggy-backed through RuntimeEnv. Code that populates stats
metadata (caller of FeCatalogUtils.getRowCount(),
FeCatalogUtils.getTotalSize(), and FeCatalogUtils.injectColumnStats())
are instrumented to populate stats from RuntimeEnv instead of Metastore.
Scaled-up tables are invalidated before a test run to reload them with
new high-scale stats. This patch also adds a scan range limit injection
to force ScanNode over a single file table to act as if it scans a
multi-files table.

tpcds_partitioned_schema_template.sql is modified to match column names
and types from impala-tpcds-kit. The test files under
PlannerTest/tpcds_cpu_cost/ are replaced with queries that are
specifically generated to run against the 3TB scale factor of the TPC-DS
dataset 
(https://github.com/cloudera/impala-tpcds-kit/blob/separate_queries_per_scale_factor/queries/sf3000/).

All query plans match with query plans obtained through real query runs
in a large cluster except for a few mismatches due to the hard limit on
the number of files at a table. Below are 3 queries out of 103 that
still do not have a matching shape and the reasons.
+-+--+
|  Q  | Reason   |
+-+--+
| 10a | different num files in customer_demographics |
| 34  | different num files in customer  |
| 69  | different num files in customer  |
+-+--+

Testing:
- Scale tables of tpcds_partitioned_parquet_snap in
  TpcdsCpuCostPlannerTest to simulate 3TB TPC-DS. The number of
  executors is raised from 3 to 10, and REPLICA_PREFERENCE=REMOTE to
  ignore data locality.
- Pass core tests.

Change-Id: Iaffddd70c2da8376ca6c40f65606bbac46c34de7
Reviewed-on: http://gerrit.cloudera.org:8080/20922
Reviewed-by: Wenzhe Zhou 
Reviewed-by: Quanlong Huang 
Tested-by: Impala Public Jenkins 
---
M fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
A fe/src/main/java/org/apache/impala/catalog/SideloadTableStats.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java
M fe/src/main/java/org/apache/impala/common/RuntimeEnv.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
M fe/src/test/java/org/apache/impala/planner/TpcdsCpuCostPlannerTest.java
A fe/src/test/java/org/apache/impala/testutil/StatsJsonParser.java
M testdata/datasets/tpcds_partitioned/tpcds_partitioned_schema_template.sql
A 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/stats-3TB.json
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q01.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q02.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q03.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q04.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q05.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q06.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q07.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q08.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q09.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q10a.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q11.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q12.test
M 
testda

[Impala-ASF-CR] IMPALA-12726: Simulate large-scale query in TpcdsCpuCostPlannerTest

2024-02-04 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20922 )

Change subject: IMPALA-12726: Simulate large-scale query in 
TpcdsCpuCostPlannerTest
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20922/6/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/20922/6/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1758
PS6, Line 1758: 
analyzer.getQueryOptions().getReplica_preference().equals(
  : TReplicaPreference.REMOTE) ?
  : analyzer.numExecutorsForPlanning() :
> This is test only (planner_testcase_mode==true). HDFS minicluster only have
Ack. I missed the check at L1744.



--
To view, visit http://gerrit.cloudera.org:8080/20922
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaffddd70c2da8376ca6c40f65606bbac46c34de7
Gerrit-Change-Number: 20922
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Sun, 04 Feb 2024 22:13:48 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12726: Simulate large-scale query in TpcdsCpuCostPlannerTest

2024-02-04 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20922 )

Change subject: IMPALA-12726: Simulate large-scale query in 
TpcdsCpuCostPlannerTest
..


Patch Set 6: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/20922
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaffddd70c2da8376ca6c40f65606bbac46c34de7
Gerrit-Change-Number: 20922
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Sun, 04 Feb 2024 21:05:31 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12726: Simulate large-scale query in TpcdsCpuCostPlannerTest

2024-02-04 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20922 )

Change subject: IMPALA-12726: Simulate large-scale query in 
TpcdsCpuCostPlannerTest
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20922/6/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/20922/6/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1758
PS6, Line 1758: 
analyzer.getQueryOptions().getReplica_preference().equals(
  : TReplicaPreference.REMOTE) ?
  : analyzer.numExecutorsForPlanning() :
> Is this only required by the test or does it also fix some bugs?
This is test only (planner_testcase_mode==true). HDFS minicluster only have 3 
datanodes. Without this change, even if we declare more executors in 
PlannerTest, it will only plan ScanNodes in 3 executors (due to line 1750-1756).



--
To view, visit http://gerrit.cloudera.org:8080/20922
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaffddd70c2da8376ca6c40f65606bbac46c34de7
Gerrit-Change-Number: 20922
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Sun, 04 Feb 2024 16:32:06 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12726: Simulate large-scale query in TpcdsCpuCostPlannerTest

2024-02-04 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20922 )

Change subject: IMPALA-12726: Simulate large-scale query in 
TpcdsCpuCostPlannerTest
..


Patch Set 6:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10236/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/20922
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaffddd70c2da8376ca6c40f65606bbac46c34de7
Gerrit-Change-Number: 20922
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Sun, 04 Feb 2024 16:35:57 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12726: Simulate large-scale query in TpcdsCpuCostPlannerTest

2024-02-04 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20922 )

Change subject: IMPALA-12726: Simulate large-scale query in 
TpcdsCpuCostPlannerTest
..


Patch Set 6: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20922/6/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/20922/6/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1758
PS6, Line 1758: 
analyzer.getQueryOptions().getReplica_preference().equals(
  : TReplicaPreference.REMOTE) ?
  : analyzer.numExecutorsForPlanning() :
Is this only required by the test or does it also fix some bugs?



--
To view, visit http://gerrit.cloudera.org:8080/20922
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaffddd70c2da8376ca6c40f65606bbac46c34de7
Gerrit-Change-Number: 20922
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Sun, 04 Feb 2024 13:13:16 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12726: Simulate large-scale query in TpcdsCpuCostPlannerTest

2024-02-01 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20922 )

Change subject: IMPALA-12726: Simulate large-scale query in 
TpcdsCpuCostPlannerTest
..


Patch Set 6: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/20922
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaffddd70c2da8376ca6c40f65606bbac46c34de7
Gerrit-Change-Number: 20922
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Thu, 01 Feb 2024 18:34:07 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12726: Simulate large-scale query in TpcdsCpuCostPlannerTest

2024-02-01 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20922 )

Change subject: IMPALA-12726: Simulate large-scale query in 
TpcdsCpuCostPlannerTest
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20922/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20922/5//COMMIT_MSG@32
PS5, Line 32: ar
> nit: are
Done



--
To view, visit http://gerrit.cloudera.org:8080/20922
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaffddd70c2da8376ca6c40f65606bbac46c34de7
Gerrit-Change-Number: 20922
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Thu, 01 Feb 2024 18:33:18 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12726: Simulate large-scale query in TpcdsCpuCostPlannerTest

2024-02-01 Thread Riza Suminto (Code Review)
Hello Quanlong Huang, Abhishek Rawat, Wenzhe Zhou, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20922

to look at the new patch set (#6).

Change subject: IMPALA-12726: Simulate large-scale query in 
TpcdsCpuCostPlannerTest
..

IMPALA-12726: Simulate large-scale query in TpcdsCpuCostPlannerTest

Querying against large-scale databases is a good way for testing Impala.
However, it is impractical to do in a single-node development machine.

Frontend testing does not run the test query in the backend executor and
can benefit from simulated large-scale test cases. This patch attempts
to do it by instrumenting the CatalogD metadata loading code to scale
tpcds_partitioned_parquet_snap by injecting column stats from a 3TB
TPC-DS dataset in TpcdsCpuCostPlannerTest.

The large-scale column stats are expressed in stats-3TB.json, taken by
running "SHOW COLUMN STATS" and "DESCRIBE FORMATTED" queries on a 3TB
dataset loaded using impala-tpcds-kit. It is parsed and then
piggy-backed through RuntimeEnv. Code that populates stats
metadata (caller of FeCatalogUtils.getRowCount(),
FeCatalogUtils.getTotalSize(), and FeCatalogUtils.injectColumnStats())
are instrumented to populate stats from RuntimeEnv instead of Metastore.
Scaled-up tables are invalidated before a test run to reload them with
new high-scale stats. This patch also adds a scan range limit injection
to force ScanNode over a single file table to act as if it scans a
multi-files table.

tpcds_partitioned_schema_template.sql is modified to match column names
and types from impala-tpcds-kit. The test files under
PlannerTest/tpcds_cpu_cost/ are replaced with queries that are
specifically generated to run against the 3TB scale factor of the TPC-DS
dataset 
(https://github.com/cloudera/impala-tpcds-kit/blob/separate_queries_per_scale_factor/queries/sf3000/).

All query plans match with query plans obtained through real query runs
in a large cluster except for a few mismatches due to the hard limit on
the number of files at a table. Below are 3 queries out of 103 that
still do not have a matching shape and the reasons.
+-+--+
|  Q  | Reason   |
+-+--+
| 10a | different num files in customer_demographics |
| 34  | different num files in customer  |
| 69  | different num files in customer  |
+-+--+

Testing:
- Scale tables of tpcds_partitioned_parquet_snap in
  TpcdsCpuCostPlannerTest to simulate 3TB TPC-DS. The number of
  executors is raised from 3 to 10, and REPLICA_PREFERENCE=REMOTE to
  ignore data locality.
- Pass core tests.

Change-Id: Iaffddd70c2da8376ca6c40f65606bbac46c34de7
---
M fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
A fe/src/main/java/org/apache/impala/catalog/SideloadTableStats.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java
M fe/src/main/java/org/apache/impala/common/RuntimeEnv.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
M fe/src/test/java/org/apache/impala/planner/TpcdsCpuCostPlannerTest.java
A fe/src/test/java/org/apache/impala/testutil/StatsJsonParser.java
M testdata/datasets/tpcds_partitioned/tpcds_partitioned_schema_template.sql
A 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/stats-3TB.json
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q01.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q02.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q03.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q04.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q05.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q06.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q07.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q08.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q09.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q10a.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q11.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q12.test
M 
testdata/workloads/functional-planner/querie

[Impala-ASF-CR] IMPALA-12726: Simulate large-scale query in TpcdsCpuCostPlannerTest

2024-02-01 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20922 )

Change subject: IMPALA-12726: Simulate large-scale query in 
TpcdsCpuCostPlannerTest
..


Patch Set 5: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20922/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20922/5//COMMIT_MSG@32
PS5, Line 32: is
nit: are



--
To view, visit http://gerrit.cloudera.org:8080/20922
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaffddd70c2da8376ca6c40f65606bbac46c34de7
Gerrit-Change-Number: 20922
Gerrit-PatchSet: 5
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Thu, 01 Feb 2024 18:23:31 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12726: Simulate large-scale query in TpcdsCpuCostPlannerTest

2024-02-01 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20922 )

Change subject: IMPALA-12726: Simulate large-scale query in 
TpcdsCpuCostPlannerTest
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20922/4/fe/src/main/java/org/apache/impala/common/RuntimeEnv.java
File fe/src/main/java/org/apache/impala/common/RuntimeEnv.java:

http://gerrit.cloudera.org:8080/#/c/20922/4/fe/src/main/java/org/apache/impala/common/RuntimeEnv.java@39
PS4, Line 39:
:   // Map of > that is used to simula
> Looking around org/apache/impala/common, this package is pretty liberal in
Done



--
To view, visit http://gerrit.cloudera.org:8080/20922
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaffddd70c2da8376ca6c40f65606bbac46c34de7
Gerrit-Change-Number: 20922
Gerrit-PatchSet: 5
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Thu, 01 Feb 2024 18:05:29 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12726: Simulate large-scale query in TpcdsCpuCostPlannerTest

2024-02-01 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20922 )

Change subject: IMPALA-12726: Simulate large-scale query in 
TpcdsCpuCostPlannerTest
..


Patch Set 5:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/15135/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20922
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaffddd70c2da8376ca6c40f65606bbac46c34de7
Gerrit-Change-Number: 20922
Gerrit-PatchSet: 5
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Thu, 01 Feb 2024 18:05:31 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12726: Simulate large-scale query in TpcdsCpuCostPlannerTest

2024-02-01 Thread Riza Suminto (Code Review)
Hello Quanlong Huang, Abhishek Rawat, Wenzhe Zhou, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20922

to look at the new patch set (#5).

Change subject: IMPALA-12726: Simulate large-scale query in 
TpcdsCpuCostPlannerTest
..

IMPALA-12726: Simulate large-scale query in TpcdsCpuCostPlannerTest

Querying against large-scale databases is a good way for testing Impala.
However, it is impractical to do in a single-node development machine.

Frontend testing does not run the test query in the backend executor and
can benefit from simulated large-scale test cases. This patch attempts
to do it by instrumenting the CatalogD metadata loading code to scale
tpcds_partitioned_parquet_snap by injecting column stats from a 3TB
TPC-DS dataset in TpcdsCpuCostPlannerTest.

The large-scale column stats are expressed in stats-3TB.json, taken by
running "SHOW COLUMN STATS" and "DESCRIBE FORMATTED" queries on a 3TB
dataset loaded using impala-tpcds-kit. It is parsed and then
piggy-backed through RuntimeEnv. Code that populates stats
metadata (caller of FeCatalogUtils.getRowCount(),
FeCatalogUtils.getTotalSize(), and FeCatalogUtils.injectColumnStats())
are instrumented to populate stats from RuntimeEnv instead of Metastore.
Scaled-up tables are invalidated before a test run to reload them with
new high-scale stats. This patch also adds a scan range limit injection
to force ScanNode over a single file table to act as if it scans a
multi-files table.

tpcds_partitioned_schema_template.sql is modified to match column names
and types from impala-tpcds-kit. The test files under
PlannerTest/tpcds_cpu_cost/ is replaced with queries that are
specifically generated to run against the 3TB scale factor of the TPC-DS
dataset 
(https://github.com/cloudera/impala-tpcds-kit/blob/separate_queries_per_scale_factor/queries/sf3000/).

All query plans match with query plans obtained through real query runs
in a large cluster except for a few mismatches due to the hard limit on
the number of files at a table. Below are 3 queries out of 103 that
still do not have a matching shape and the reasons.
+-+--+
|  Q  | Reason   |
+-+--+
| 10a | different num files in customer_demographics |
| 34  | different num files in customer  |
| 69  | different num files in customer  |
+-+--+

Testing:
- Scale tables of tpcds_partitioned_parquet_snap in
  TpcdsCpuCostPlannerTest to simulate 3TB TPC-DS. The number of
  executors is raised from 3 to 10, and REPLICA_PREFERENCE=REMOTE to
  ignore data locality.
- Pass core tests.

Change-Id: Iaffddd70c2da8376ca6c40f65606bbac46c34de7
---
M fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
A fe/src/main/java/org/apache/impala/catalog/SideloadTableStats.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java
M fe/src/main/java/org/apache/impala/common/RuntimeEnv.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
M fe/src/test/java/org/apache/impala/planner/TpcdsCpuCostPlannerTest.java
A fe/src/test/java/org/apache/impala/testutil/StatsJsonParser.java
M testdata/datasets/tpcds_partitioned/tpcds_partitioned_schema_template.sql
A 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/stats-3TB.json
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q01.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q02.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q03.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q04.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q05.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q06.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q07.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q08.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q09.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q10a.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q11.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q12.test
M 
testdata/workloads/functional-planner/queries

[Impala-ASF-CR] IMPALA-12726: Simulate large-scale query in TpcdsCpuCostPlannerTest

2024-02-01 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20922 )

Change subject: IMPALA-12726: Simulate large-scale query in 
TpcdsCpuCostPlannerTest
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20922/4/fe/src/main/java/org/apache/impala/common/RuntimeEnv.java
File fe/src/main/java/org/apache/impala/common/RuntimeEnv.java:

http://gerrit.cloudera.org:8080/#/c/20922/4/fe/src/main/java/org/apache/impala/common/RuntimeEnv.java@39
PS4, Line 39: The value element is stored as Object to avoid referrencing
:   // SideloadTableStats class in org.apache.impala.common package.
Looking around org/apache/impala/common, this package is pretty liberal in 
doing imports.
There are imports from impala.analysis, impala.catalog, impala.thrift, and 
impala.util in common package. Maybe it is OK to import SideloadTableStats 
directly here.



--
To view, visit http://gerrit.cloudera.org:8080/20922
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaffddd70c2da8376ca6c40f65606bbac46c34de7
Gerrit-Change-Number: 20922
Gerrit-PatchSet: 4
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Thu, 01 Feb 2024 17:28:44 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12726: Simulate large-scale query in TpcdsCpuCostPlannerTest

2024-01-31 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20922 )

Change subject: IMPALA-12726: Simulate large-scale query in 
TpcdsCpuCostPlannerTest
..


Patch Set 4:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/15131/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20922
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaffddd70c2da8376ca6c40f65606bbac46c34de7
Gerrit-Change-Number: 20922
Gerrit-PatchSet: 4
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Wed, 31 Jan 2024 23:07:49 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12726: Simulate large-scale query in TpcdsCpuCostPlannerTest

2024-01-31 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20922 )

Change subject: IMPALA-12726: Simulate large-scale query in 
TpcdsCpuCostPlannerTest
..


Patch Set 4:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/20922/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20922/2//COMMIT_MSG@24
PS2, Line 24: ar
> nit: are
Done


http://gerrit.cloudera.org:8080/#/c/20922/2//COMMIT_MSG@28
PS2, Line 28: mult
> nit: multi
Done


http://gerrit.cloudera.org:8080/#/c/20922/2/fe/src/main/java/org/apache/impala/catalog/SideloadTableStats.java
File fe/src/main/java/org/apache/impala/catalog/SideloadTableStats.java:

http://gerrit.cloudera.org:8080/#/c/20922/2/fe/src/main/java/org/apache/impala/catalog/SideloadTableStats.java@26
PS2, Line 26:
> nit: Could you add description for the new class?
Done


http://gerrit.cloudera.org:8080/#/c/20922/2/fe/src/main/java/org/apache/impala/catalog/SideloadTableStats.java@40
PS2, Line 40:   public SideloadTableStats(String tableName, long numRows, long 
totalSize) {
> add Preconditions check for the input parameters
Done


http://gerrit.cloudera.org:8080/#/c/20922/3/fe/src/main/java/org/apache/impala/common/RuntimeEnv.java
File fe/src/main/java/org/apache/impala/common/RuntimeEnv.java:

http://gerrit.cloudera.org:8080/#/c/20922/3/fe/src/main/java/org/apache/impala/common/RuntimeEnv.java@21
PS3, Line 21:
> nit: remove empty line
The empty line in between is auto-formatted by clang-format.
It looks like it carried by Chromium java style.
https://chromium.googlesource.com/chromium/src/+/HEAD/styleguide/java/java.md#Import-Order

I choose to remove StringUtils import instead.


http://gerrit.cloudera.org:8080/#/c/20922/3/fe/src/test/java/org/apache/impala/testutil/StatsJsonParser.java
File fe/src/test/java/org/apache/impala/testutil/StatsJsonParser.java:

http://gerrit.cloudera.org:8080/#/c/20922/3/fe/src/test/java/org/apache/impala/testutil/StatsJsonParser.java@90
PS3, Line 90: colType.substring(0, colType.indexOf("(")
> if colType contains "(", does it contains ")"?
Yes. An example is "DECIMAL(7,2)". In that case, only "DECIMAL" is taken.
Any invalid type input will be catch by default handler at line 153.



--
To view, visit http://gerrit.cloudera.org:8080/20922
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaffddd70c2da8376ca6c40f65606bbac46c34de7
Gerrit-Change-Number: 20922
Gerrit-PatchSet: 4
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Wed, 31 Jan 2024 22:44:06 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12726: Simulate large-scale query in TpcdsCpuCostPlannerTest

2024-01-31 Thread Riza Suminto (Code Review)
Hello Quanlong Huang, Abhishek Rawat, Wenzhe Zhou, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20922

to look at the new patch set (#4).

Change subject: IMPALA-12726: Simulate large-scale query in 
TpcdsCpuCostPlannerTest
..

IMPALA-12726: Simulate large-scale query in TpcdsCpuCostPlannerTest

Querying against large-scale databases is a good way for testing Impala.
However, it is impractical to do in a single-node development machine.

Frontend testing does not run the test query in the backend executor and
can benefit from simulated large-scale test cases. This patch attempts
to do it by instrumenting the CatalogD metadata loading code to scale
tpcds_partitioned_parquet_snap by injecting column stats from a 3TB
TPC-DS dataset in TpcdsCpuCostPlannerTest.

The large-scale column stats are expressed in stats-3TB.json, taken by
running "SHOW COLUMN STATS" and "DESCRIBE FORMATTED" queries on a 3TB
dataset loaded using impala-tpcds-kit. It is parsed and then
piggy-backed through RuntimeEnv. Code that populates stats
metadata (caller of FeCatalogUtils.getRowCount(),
FeCatalogUtils.getTotalSize(), and FeCatalogUtils.injectColumnStats())
are instrumented to populate stats from RuntimeEnv instead of Metastore.
Scaled-up tables are invalidated before a test run to reload them with
new high-scale stats. This patch also adds a scan range limit injection
to force ScanNode over a single file table to act as if it scans a
multi-files table.

tpcds_partitioned_schema_template.sql is modified to match column names
and types from impala-tpcds-kit. The test files under
PlannerTest/tpcds_cpu_cost/ is replaced with queries that are
specifically generated to run against the 3TB scale factor of the TPC-DS
dataset 
(https://github.com/cloudera/impala-tpcds-kit/blob/separate_queries_per_scale_factor/queries/sf3000/).

All query plans match with query plans obtained through real query runs
in a large cluster except for a few mismatches due to the hard limit on
the number of files at a table. Below are 3 queries out of 103 that
still do not have a matching shape and the reasons.
+-+--+
|  Q  | Reason   |
+-+--+
| 10a | different num files in customer_demographics |
| 34  | different num files in customer  |
| 69  | different num files in customer  |
+-+--+

Testing:
- Scale tables of tpcds_partitioned_parquet_snap in
  TpcdsCpuCostPlannerTest to simulate 3TB TPC-DS. The number of
  executors is raised from 3 to 10, and REPLICA_PREFERENCE=REMOTE to
  ignore data locality.
- Pass core tests.

Change-Id: Iaffddd70c2da8376ca6c40f65606bbac46c34de7
---
M fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
A fe/src/main/java/org/apache/impala/catalog/SideloadTableStats.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java
M fe/src/main/java/org/apache/impala/common/RuntimeEnv.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
M fe/src/test/java/org/apache/impala/planner/TpcdsCpuCostPlannerTest.java
A fe/src/test/java/org/apache/impala/testutil/StatsJsonParser.java
M testdata/datasets/tpcds_partitioned/tpcds_partitioned_schema_template.sql
A 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/stats-3TB.json
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q01.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q02.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q03.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q04.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q05.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q06.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q07.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q08.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q09.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q10a.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q11.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q12.test
M 
testdata/workloads/functional-planner/queries

[Impala-ASF-CR] IMPALA-12726: Simulate large-scale query in TpcdsCpuCostPlannerTest

2024-01-31 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20922 )

Change subject: IMPALA-12726: Simulate large-scale query in 
TpcdsCpuCostPlannerTest
..


Patch Set 3:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/20922/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20922/2//COMMIT_MSG@24
PS2, Line 24: is
nit: are


http://gerrit.cloudera.org:8080/#/c/20922/2//COMMIT_MSG@28
PS2, Line 28: muti
nit: multi


http://gerrit.cloudera.org:8080/#/c/20922/2/fe/src/main/java/org/apache/impala/catalog/SideloadTableStats.java
File fe/src/main/java/org/apache/impala/catalog/SideloadTableStats.java:

http://gerrit.cloudera.org:8080/#/c/20922/2/fe/src/main/java/org/apache/impala/catalog/SideloadTableStats.java@26
PS2, Line 26:
nit: Could you add description for the new class?


http://gerrit.cloudera.org:8080/#/c/20922/2/fe/src/main/java/org/apache/impala/catalog/SideloadTableStats.java@40
PS2, Line 40:   public void addColumnStats(String colName, ColumnStatisticsData 
colStats) {
add Preconditions check for the input parameters


http://gerrit.cloudera.org:8080/#/c/20922/3/fe/src/main/java/org/apache/impala/common/RuntimeEnv.java
File fe/src/main/java/org/apache/impala/common/RuntimeEnv.java:

http://gerrit.cloudera.org:8080/#/c/20922/3/fe/src/main/java/org/apache/impala/common/RuntimeEnv.java@21
PS3, Line 21:
nit: remove empty line


http://gerrit.cloudera.org:8080/#/c/20922/3/fe/src/test/java/org/apache/impala/testutil/StatsJsonParser.java
File fe/src/test/java/org/apache/impala/testutil/StatsJsonParser.java:

http://gerrit.cloudera.org:8080/#/c/20922/3/fe/src/test/java/org/apache/impala/testutil/StatsJsonParser.java@90
PS3, Line 90: colType.substring(0, colType.indexOf("(")
if colType contains "(", does it contains ")"?



--
To view, visit http://gerrit.cloudera.org:8080/20922
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaffddd70c2da8376ca6c40f65606bbac46c34de7
Gerrit-Change-Number: 20922
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Wed, 31 Jan 2024 21:17:53 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12726: Simulate large-scale query in TpcdsCpuCostPlannerTest

2024-01-31 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20922 )

Change subject: IMPALA-12726: Simulate large-scale query in 
TpcdsCpuCostPlannerTest
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/15130/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20922
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaffddd70c2da8376ca6c40f65606bbac46c34de7
Gerrit-Change-Number: 20922
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Wed, 31 Jan 2024 20:10:14 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12726: Simulate large-scale query in TpcdsCpuCostPlannerTest

2024-01-31 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20922 )

Change subject: IMPALA-12726: Simulate large-scale query in 
TpcdsCpuCostPlannerTest
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20922/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20922/2//COMMIT_MSG@33
PS2, Line 33: ale factor of the TPC-DS
: dataset 
(https://github.com/cloudera/impala-tpcds-kit/blob/separate_querie
> David mention that the test SQL can be different depending on the scale fac
Done



--
To view, visit http://gerrit.cloudera.org:8080/20922
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaffddd70c2da8376ca6c40f65606bbac46c34de7
Gerrit-Change-Number: 20922
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Wed, 31 Jan 2024 19:49:38 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12726: Simulate large-scale query in TpcdsCpuCostPlannerTest

2024-01-31 Thread Riza Suminto (Code Review)
Hello Quanlong Huang, Abhishek Rawat, Wenzhe Zhou, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20922

to look at the new patch set (#3).

Change subject: IMPALA-12726: Simulate large-scale query in 
TpcdsCpuCostPlannerTest
..

IMPALA-12726: Simulate large-scale query in TpcdsCpuCostPlannerTest

Querying against large-scale databases is a good way for testing Impala.
However, it is impractical to do in a single-node development machine.

Frontend testing does not run the test query in the backend executor and
can benefit from simulated large-scale test cases. This patch attempts
to do it by instrumenting the CatalogD metadata loading code to scale
tpcds_partitioned_parquet_snap by injecting column stats from a 3TB
TPC-DS dataset TpcdsCpuCostPlannerTest.

The large-scale column stats are expressed in stats-3TB.json, taken by
running "SHOW COLUMN STATS" and "DESCRIBE FORMATTED" queries on a 3TB
dataset loaded using impala-tpcds-kit. It is parsed and then
piggy-backed through RuntimeEnv. Code that populates stats
metadata (caller of FeCatalogUtils.getRowCount(),
FeCatalogUtils.getTotalSize(), and FeCatalogUtils.injectColumnStats())
is instrumented to populate stats from RuntimeEnv instead of Metastore.
Scaled-up tables are invalidated before a test run to reload them with
new high-scale stats. This patch also adds a scan range limit injection
to force ScanNode over a single file table to act as if it scans a
muti-files table.

tpcds_partitioned_schema_template.sql is modified to match column names
and types from impala-tpcds-kit. The test files under
PlannerTest/tpcds_cpu_cost/ is replaced with queries that are
specifically generated to run against the 3TB scale factor of the TPC-DS
dataset 
(https://github.com/cloudera/impala-tpcds-kit/blob/separate_queries_per_scale_factor/queries/sf3000/).

All query plans match with query plans obtained through real query runs
in a large cluster except for a few mismatches due to the hard limit on
the number of files at a table. Below are 3 queries out of 103 that
still do not have a matching shape and the reasons.
+-+--+
|  Q  | Reason   |
+-+--+
| 10a | different num files in customer_demographics |
| 34  | different num files in customer  |
| 69  | different num files in customer  |
+-+--+

Testing:
- Scale tables of tpcds_partitioned_parquet_snap in
  TpcdsCpuCostPlannerTest to simulate 3TB TPC-DS. The number of
  executors is raised from 3 to 10, and REPLICA_PREFERENCE=REMOTE to
  ignore data locality.
- Pass core tests.

Change-Id: Iaffddd70c2da8376ca6c40f65606bbac46c34de7
---
M fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
A fe/src/main/java/org/apache/impala/catalog/SideloadTableStats.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java
M fe/src/main/java/org/apache/impala/common/RuntimeEnv.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
M fe/src/test/java/org/apache/impala/planner/TpcdsCpuCostPlannerTest.java
A fe/src/test/java/org/apache/impala/testutil/StatsJsonParser.java
M testdata/datasets/tpcds_partitioned/tpcds_partitioned_schema_template.sql
A 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/stats-3TB.json
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q01.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q02.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q03.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q04.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q05.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q06.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q07.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q08.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q09.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q10a.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q11.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q12.test
M 
testdata/workloads/functional-planner/queries/Plan

[Impala-ASF-CR] IMPALA-12726: Simulate large-scale query in TpcdsCpuCostPlannerTest

2024-01-30 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20922 )

Change subject: IMPALA-12726: Simulate large-scale query in 
TpcdsCpuCostPlannerTest
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20922/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20922/2//COMMIT_MSG@33
PS2, Line 33: 3TB TPC-DS
: dataset 
(https://github.com/cloudera/impala-tpcds-kit/tree/master/queries)
David mention that the test SQL can be different depending on the scale factor 
it is intended to run.
This set of test SQL is better suit for 3TB scale:
https://github.com/cloudera/impala-tpcds-kit/blob/separate_queries_per_scale_factor/queries/sf3000/



--
To view, visit http://gerrit.cloudera.org:8080/20922
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaffddd70c2da8376ca6c40f65606bbac46c34de7
Gerrit-Change-Number: 20922
Gerrit-PatchSet: 2
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Wed, 31 Jan 2024 00:15:10 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12726: Simulate large-scale query in TpcdsCpuCostPlannerTest

2024-01-29 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20922 )

Change subject: IMPALA-12726: Simulate large-scale query in 
TpcdsCpuCostPlannerTest
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/15087/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20922
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaffddd70c2da8376ca6c40f65606bbac46c34de7
Gerrit-Change-Number: 20922
Gerrit-PatchSet: 2
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Mon, 29 Jan 2024 16:58:48 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12726: Simulate large-scale query in TpcdsCpuCostPlannerTest

2024-01-29 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20922 )

Change subject: IMPALA-12726: Simulate large-scale query in 
TpcdsCpuCostPlannerTest
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/20922/1/fe/src/main/java/org/apache/impala/common/RuntimeEnv.java
File fe/src/main/java/org/apache/impala/common/RuntimeEnv.java:

http://gerrit.cloudera.org:8080/#/c/20922/1/fe/src/main/java/org/apache/impala/common/RuntimeEnv.java@100
PS1, Line 100:
> nit: "tables with their"
Done


http://gerrit.cloudera.org:8080/#/c/20922/1/fe/src/test/java/org/apache/impala/planner/TpcdsCpuCostPlannerTest.java
File fe/src/test/java/org/apache/impala/planner/TpcdsCpuCostPlannerTest.java:

http://gerrit.cloudera.org:8080/#/c/20922/1/fe/src/test/java/org/apache/impala/planner/TpcdsCpuCostPlannerTest.java@53
PS1, Line 53: / Granular scan limit that will injected into individual ScanNode 
of tables.
:   private static Map<
> Looks like some dim tables are also scaled but not linearly. I'll check TPC
ps2 fully inject all stats based on actual dataset instead of just scaling them.



--
To view, visit http://gerrit.cloudera.org:8080/20922
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaffddd70c2da8376ca6c40f65606bbac46c34de7
Gerrit-Change-Number: 20922
Gerrit-PatchSet: 2
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Mon, 29 Jan 2024 16:36:58 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12726: Simulate large-scale query in TpcdsCpuCostPlannerTest

2024-01-29 Thread Riza Suminto (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20922

to look at the new patch set (#2).

Change subject: IMPALA-12726: Simulate large-scale query in 
TpcdsCpuCostPlannerTest
..

IMPALA-12726: Simulate large-scale query in TpcdsCpuCostPlannerTest

Querying against large-scale databases is good way for testing Impala.
However, it is impractical to do in a single-node development machine.

Frontend testing does not run the test query in the backend executor and
can benefit from simulated large-scale test cases. This patch attempts
to do it by instrumenting the CatalogD metadata loading code to scale
tpcds_partitioned_parquet_snap by injecting column stats from a 3TB
TPC-DS dataset TpcdsCpuCostPlannerTest.

The large-scale column stats are expressed in stats-3TB.json, taken by
running "SHOW COLUMN STATS" and "DESCRIBE FORMATTED" queries on a 3TB
dataset loaded using impala-tpcds-kit. It is parsed and then
piggy-backed through RuntimeEnv. Code that populates stats
metadata (caller of FeCatalogUtils.getRowCount(),
FeCatalogUtils.getTotalSize(), and FeCatalogUtils.injectColumnStats())
is instrumented to populate stats from RuntimeEnv instead of Metastore.
Scaled-up tables are invalidated before a test run to reload them with
new high-scale stats. This patch also add a scan range limit injection
to force ScanNode over a single file table to act as if it scans a
muti-files table.

tpcds_partitioned_schema_template.sql is modified to match column names
and types from impala-tpcds-kit). After this patch, the test files under
PlannerTest/tpcds_cpu_cost/ have matching query plan shapes with the
actual impala-tpcds-kit queries run against the 3TB TPC-DS
dataset (https://github.com/cloudera/impala-tpcds-kit/tree/master/queries),
except for a few mismatches due to different SQL and hard limit on
number of files.

Below are 16 queries out of 103 that still does not have matching shape
and the reasons.
+-+--+
|  Q  | Reason   |
+-+--+
| 6   | extra limit 1|
| 10a | different num files in customer_demographics |
| 23b | different frequent_ss_items CTE  |
| 22  | extra warehouse table|
| 27  | different predicate for store table  |
| 34  | extra limit 10   |
| 36  | different predicate for store table  |
| 53  | missing avg_quarterly_sales  |
| 66  | different SQL|
| 68  | different predicate for data_dim table   |
| 69  | different num files in customer  |
| 73  | different order by, extra limit 1000 |
| 74  | different num files in customer  |
| 84  | missing customer_demographics table  |
| 96  | missing limit 100|
| 98  | extra limit 1000 |
+-+--+

Testing:
- Scale tables of tpcds_partitioned_parquet_snap in
  TpcdsCpuCostPlannerTest to simulate 3TB TPC-DS. The number of
  executors is raised from 3 to 10, and REPLICA_PREFERENCE=REMOTE to
  ignore data locality.
- Pass core tests.

Change-Id: Iaffddd70c2da8376ca6c40f65606bbac46c34de7
---
M fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
A fe/src/main/java/org/apache/impala/catalog/SideloadTableStats.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java
M fe/src/main/java/org/apache/impala/common/RuntimeEnv.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
M fe/src/test/java/org/apache/impala/planner/TpcdsCpuCostPlannerTest.java
A fe/src/test/java/org/apache/impala/testutil/StatsJsonParser.java
M testdata/datasets/tpcds_partitioned/tpcds_partitioned_schema_template.sql
A 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/stats-3TB.json
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q01.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q02.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q03.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q04.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q05.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q06.test
M 
testdata/workloads/functional-planner/queries

[Impala-ASF-CR] IMPALA-12726: Simulate large scale query in TpcdsCpuCostPlannerTest

2024-01-18 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20922 )

Change subject: IMPALA-12726: Simulate large scale query in 
TpcdsCpuCostPlannerTest
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/20922/1/fe/src/main/java/org/apache/impala/common/RuntimeEnv.java
File fe/src/main/java/org/apache/impala/common/RuntimeEnv.java:

http://gerrit.cloudera.org:8080/#/c/20922/1/fe/src/main/java/org/apache/impala/common/RuntimeEnv.java@100
PS1, Line 100: the table with its
nit: "tables with their"


http://gerrit.cloudera.org:8080/#/c/20922/1/fe/src/test/java/org/apache/impala/planner/TpcdsCpuCostPlannerTest.java
File fe/src/test/java/org/apache/impala/planner/TpcdsCpuCostPlannerTest.java:

http://gerrit.cloudera.org:8080/#/c/20922/1/fe/src/test/java/org/apache/impala/planner/TpcdsCpuCostPlannerTest.java@53
PS1, Line 53:  // Insert 1000x metadata scale to RuntimeEnv for each fact 
tables.
: int scale = 1000;
Looks like some dim tables are also scaled but not linearly. I'll check TPC-DS 
spec.



--
To view, visit http://gerrit.cloudera.org:8080/20922
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaffddd70c2da8376ca6c40f65606bbac46c34de7
Gerrit-Change-Number: 20922
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Thu, 18 Jan 2024 19:37:42 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12726: Simulate large scale query in TpcdsCpuCostPlannerTest

2024-01-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20922 )

Change subject: IMPALA-12726: Simulate large scale query in 
TpcdsCpuCostPlannerTest
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/14992/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20922
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaffddd70c2da8376ca6c40f65606bbac46c34de7
Gerrit-Change-Number: 20922
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 18 Jan 2024 19:09:44 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12726: Simulate large scale query in TpcdsCpuCostPlannerTest

2024-01-18 Thread Riza Suminto (Code Review)
Riza Suminto has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/20922


Change subject: IMPALA-12726: Simulate large scale query in 
TpcdsCpuCostPlannerTest
..

IMPALA-12726: Simulate large scale query in TpcdsCpuCostPlannerTest

Querying against large scale database is a good way to test Impala.
However, it is impractical to do in single node development machine.

Frontend testing does not actually run the test query in backend
executor and can benefit from simulated large scale test cases. This
patch attempt to do it by instrumenting the CatalogD metadata loading
code to multiply partitions numRows, tables numRows, numNull, numTrues,
and numFalses to 1000x in TpcdsCpuCostPlannerTest. The scaling factor is
supplied through RuntimeEnv. Code that populates stats metadata (caller
of FeCatalogUtils.getRowCount() and FeCatalogUtils.injectColumnStats())
is instrumented to check against this scaling factor on whether to
multiply the stats for a particular table or not. Tables that is scaled
up must also be invalidated so that they will be reloaded with new
scaled stats.

Total byte sizes are not scaled up in this patch because it does not
impact query plan unless stats extrapolation is being used.

Testing:
- Scale the fact tables of tpcds_partitioned_parquet_snap in
  TpcdsCpuCostPlannerTest to 1000x to simulate 1TB TPC-DS. Number of
  executor is raised from 3 to 10, and REPLICA_PREFERENCE is set to
  REMOTE to ignore data locality.
- Compare with the afternative methods where instrumentation is done
  during stats collection (COMPUTE STATS) and confirm that the resulting
  query plans are the same with this patch.
- Pass FE tests.

Change-Id: Iaffddd70c2da8376ca6c40f65606bbac46c34de7
---
M fe/src/main/java/org/apache/impala/catalog/Column.java
M fe/src/main/java/org/apache/impala/catalog/ColumnStats.java
M fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/TableLoader.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java
M fe/src/main/java/org/apache/impala/common/RuntimeEnv.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
M fe/src/test/java/org/apache/impala/planner/TpcdsCpuCostPlannerTest.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q01.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q02.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q03.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q04.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q05.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q06.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q07.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q08.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q09.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q10a.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q11.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q12.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q13.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q14a.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q14b.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q15.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q16.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q17.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q18.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q19.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q20.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q21.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q22.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q23a.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q23b.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q24