Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/12221 )
Change subject: IMPALA-5872: Testcase builder for query planner ...................................................................... IMPALA-5872: Testcase builder for query planner Implements a new testcase builder for simulating query plans from one cluster on a different cluster/minicluster with different number of nodes. The testcase is collected from one cluster and can be replayed on any other cluster. It includes all the information that is needed to replay the query plan exactly as in the source cluster. Also adds a stand-alone tool (PlannerTestCaseLoader) that can replay the testcase without having to start an actual cluster or a dev minicluster. This is done to make testcase debugging simpler. Motivation: ---------- - Make query planner issues easily reproducible - Improve user experience while collecting query diagnostics - Make it easy to test new planner features by testing it on customer usecases collected from much larger clusters. Commands: -------- -- Collect testcase for a query stmt (outputs the testcase file path). impala-shell> COPY TESTCASE TO <hdfs dirpath> <query stmt> -- Load the testcase metadata in a target cluster (dumps the query stmt) impala-shell> COPY TESTCASE FROM <hdfs testcase file path> -- Replay the query plan impala-shell> SET PLANNER_DEBUG_MODE=true impala-shell> EXPLAIN <query stmt> How it works? ------------ - During export on the source cluster, the command dumps all the thrift states of referenced objects in the query into a gzipped binary file. - During replay on a target cluster, it adds these objects to the catalog cache by faking them as DDLs. - The planner also fakes the number of hosts by using the scan range information from the target cluster. Caveats: ------ - Tested to work with HDFS tables. Tables based on other filesystems like HBase/Kudu may not work as desired. - The tool does not collect actual data files for the tables. Only the metadata state is dumped. - Currently only imports databases/tables/views. We can extend it to work for UDFS etc. - It only works for QueryStmts (select/union queries) - On a sentry enabled cluster, the role running the query requires VIEW_METADATA privilege on every db/table/view referenced in the query statement. - Once the metadata dump is loaded on a target cluster, the state is volatile. Hence it cannot survive a cluster restart / invalidate metadata - Loading a testcase requires setting the query option (SET PLANNER_DEBUG_MODE=true) so that the planner knows to fake the number of hosts. Otherwise it takes into account the local cluster topology. - Cross version compatibility of testcases needs some thought. For example, creating a testcase from Impala version 3.2 and trying to replay it on Impala version 3.5. This could be problematic if we don't keep the underlying thrift structures backward compatible. Change-Id: Iec83eeb2dc5136768b70ed581fb8d3ed0335cb52 Reviewed-on: http://gerrit.cloudera.org:8080/12221 Reviewed-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com> --- M be/src/service/client-request-state.cc M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/util/backend-gflag-util.cc M bin/rat_exclude_files.txt M common/thrift/BackendGflags.thrift M common/thrift/CatalogService.thrift M common/thrift/Frontend.thrift M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/JniCatalog.thrift M common/thrift/Types.thrift M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java A fe/src/main/java/org/apache/impala/analysis/CopyTestCaseStmt.java M fe/src/main/java/org/apache/impala/analysis/HdfsUri.java M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java M fe/src/main/java/org/apache/impala/catalog/Catalog.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/FeDb.java M fe/src/main/java/org/apache/impala/catalog/FeTable.java M fe/src/main/java/org/apache/impala/catalog/MetaStoreClientPool.java M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java M fe/src/main/java/org/apache/impala/common/JniUtil.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/main/jflex/sql-scanner.flex M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java M fe/src/test/java/org/apache/impala/analysis/AuthorizationStmtTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java A fe/src/test/java/org/apache/impala/planner/TestCaseLoaderTest.java M fe/src/test/java/org/apache/impala/testutil/CatalogServiceTestCatalog.java A fe/src/test/java/org/apache/impala/testutil/EmbeddedMetastoreClientPool.java M fe/src/test/java/org/apache/impala/testutil/ImpaladTestCatalog.java A fe/src/test/java/org/apache/impala/testutil/PlannerTestCaseLoader.java M testdata/bin/create-load-data.sh A testdata/bin/create-tpcds-testcase-files.sh A testdata/workloads/tpcds/queries/raw/tpcds-query1.sql A testdata/workloads/tpcds/queries/raw/tpcds-query11.sql A testdata/workloads/tpcds/queries/raw/tpcds-query12.sql A testdata/workloads/tpcds/queries/raw/tpcds-query13.sql A testdata/workloads/tpcds/queries/raw/tpcds-query15.sql A testdata/workloads/tpcds/queries/raw/tpcds-query16.sql A testdata/workloads/tpcds/queries/raw/tpcds-query17.sql A testdata/workloads/tpcds/queries/raw/tpcds-query19.sql A testdata/workloads/tpcds/queries/raw/tpcds-query2.sql A testdata/workloads/tpcds/queries/raw/tpcds-query20.sql A testdata/workloads/tpcds/queries/raw/tpcds-query21.sql A testdata/workloads/tpcds/queries/raw/tpcds-query25.sql A testdata/workloads/tpcds/queries/raw/tpcds-query26.sql A testdata/workloads/tpcds/queries/raw/tpcds-query28.sql A testdata/workloads/tpcds/queries/raw/tpcds-query29.sql A testdata/workloads/tpcds/queries/raw/tpcds-query3.sql A testdata/workloads/tpcds/queries/raw/tpcds-query30.sql A testdata/workloads/tpcds/queries/raw/tpcds-query31.sql A testdata/workloads/tpcds/queries/raw/tpcds-query32.sql A testdata/workloads/tpcds/queries/raw/tpcds-query33.sql A testdata/workloads/tpcds/queries/raw/tpcds-query34.sql A testdata/workloads/tpcds/queries/raw/tpcds-query37.sql A testdata/workloads/tpcds/queries/raw/tpcds-query39.sql A testdata/workloads/tpcds/queries/raw/tpcds-query4.sql A testdata/workloads/tpcds/queries/raw/tpcds-query40.sql A testdata/workloads/tpcds/queries/raw/tpcds-query42.sql A testdata/workloads/tpcds/queries/raw/tpcds-query43.sql A testdata/workloads/tpcds/queries/raw/tpcds-query46.sql A testdata/workloads/tpcds/queries/raw/tpcds-query47.sql A testdata/workloads/tpcds/queries/raw/tpcds-query48.sql A testdata/workloads/tpcds/queries/raw/tpcds-query49.sql A testdata/workloads/tpcds/queries/raw/tpcds-query50.sql A testdata/workloads/tpcds/queries/raw/tpcds-query51.sql A testdata/workloads/tpcds/queries/raw/tpcds-query52.sql A testdata/workloads/tpcds/queries/raw/tpcds-query53.sql A testdata/workloads/tpcds/queries/raw/tpcds-query55.sql A testdata/workloads/tpcds/queries/raw/tpcds-query56.sql A testdata/workloads/tpcds/queries/raw/tpcds-query57.sql A testdata/workloads/tpcds/queries/raw/tpcds-query58.sql A testdata/workloads/tpcds/queries/raw/tpcds-query59.sql A testdata/workloads/tpcds/queries/raw/tpcds-query6.sql A testdata/workloads/tpcds/queries/raw/tpcds-query60.sql A testdata/workloads/tpcds/queries/raw/tpcds-query61.sql A testdata/workloads/tpcds/queries/raw/tpcds-query62.sql A testdata/workloads/tpcds/queries/raw/tpcds-query63.sql A testdata/workloads/tpcds/queries/raw/tpcds-query64.sql A testdata/workloads/tpcds/queries/raw/tpcds-query65.sql A testdata/workloads/tpcds/queries/raw/tpcds-query66.sql A testdata/workloads/tpcds/queries/raw/tpcds-query68.sql A testdata/workloads/tpcds/queries/raw/tpcds-query69.sql A testdata/workloads/tpcds/queries/raw/tpcds-query7.sql A testdata/workloads/tpcds/queries/raw/tpcds-query71.sql A testdata/workloads/tpcds/queries/raw/tpcds-query72.sql A testdata/workloads/tpcds/queries/raw/tpcds-query73.sql A testdata/workloads/tpcds/queries/raw/tpcds-query74.sql A testdata/workloads/tpcds/queries/raw/tpcds-query75.sql A testdata/workloads/tpcds/queries/raw/tpcds-query76.sql A testdata/workloads/tpcds/queries/raw/tpcds-query78.sql A testdata/workloads/tpcds/queries/raw/tpcds-query79.sql A testdata/workloads/tpcds/queries/raw/tpcds-query81.sql A testdata/workloads/tpcds/queries/raw/tpcds-query82.sql A testdata/workloads/tpcds/queries/raw/tpcds-query83.sql A testdata/workloads/tpcds/queries/raw/tpcds-query84.sql A testdata/workloads/tpcds/queries/raw/tpcds-query88.sql A testdata/workloads/tpcds/queries/raw/tpcds-query89.sql A testdata/workloads/tpcds/queries/raw/tpcds-query90.sql A testdata/workloads/tpcds/queries/raw/tpcds-query91.sql A testdata/workloads/tpcds/queries/raw/tpcds-query92.sql A testdata/workloads/tpcds/queries/raw/tpcds-query94.sql A testdata/workloads/tpcds/queries/raw/tpcds-query95.sql A testdata/workloads/tpcds/queries/raw/tpcds-query96.sql A testdata/workloads/tpcds/queries/raw/tpcds-query97.sql A testdata/workloads/tpcds/queries/raw/tpcds-query98.sql A testdata/workloads/tpcds/queries/raw/tpcds-query99.sql 116 files changed, 4,487 insertions(+), 130 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/12221 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Iec83eeb2dc5136768b70ed581fb8d3ed0335cb52 Gerrit-Change-Number: 12221 Gerrit-PatchSet: 11 Gerrit-Owner: Bharath Vissapragada <bhara...@cloudera.com> Gerrit-Reviewer: Balazs Jeszenszky <jes...@gmail.com> Gerrit-Reviewer: Bharath Vissapragada <bhara...@cloudera.com> Gerrit-Reviewer: Fredy Wijaya <fwij...@cloudera.com> Gerrit-Reviewer: Greg Rahn <gr...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Paul Rogers <prog...@cloudera.com>