[Impala-ASF-CR] IMPALA-12024: Add catalog profile for createTable
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/20368 ) Change subject: IMPALA-12024: Add catalog profile for createTable .. IMPALA-12024: Add catalog profile for createTable DDL/DMLs might spend a long time in the execution in catalogd. Currently, the profiles just have a counter of CatalogOpExecTimer. We need more items to show the details of how they are executed in catalogd. As the first step, this patch adds the profile timeline for how the createTable DDL is executed in Catalog Server. Also added more events in the existing "Query Timeline" for when the CatalogDdlRequest finished and when the catalog updates are applied. To implement this, a 'profile' field is added in TDdlExecResponse to carry the execution counters and timeline in catalogd. Currently, we just uses the timeline of it. We can add more counters in the future. Several methods add a timeline parameter to mark the progress in them. Timeline events are added after each RPC finished. Here is an example when HMS is hanging for 26s in a CTAS. I used gdb to attach to HMS as the JIRA description mentioned. In the timeline, we can see the time is spent in the first HMS RPC that fetching the current HMS event id: Catalog Server Operation: 26s560ms - Got metastoreDdlLock: 163.977us (163.977us) - Got Metastore client: 166.339us (2.362us) - Got current Metastore event id 8355270: 26s494ms (26s494ms) - Created table in Metastore: 26s558ms (63.507ms) - Fetched event batch from Metastore: 26s559ms (1.155ms) - Created table in catalog cache: 26s560ms (1.164ms) - DDL finished: 26s560ms (65.538us) Query Compilation: 164.257ms - Metadata of all 1 tables cached: 10.535ms (10.535ms) - Analysis finished: 118.324ms (107.788ms) - Authorization finished (noop): 118.489ms (164.626us) - Value transfer graph computed: 118.830ms (341.792us) - Single node plan created: 150.150ms (31.319ms) - Runtime filters computed: 150.254ms (103.529us) - Distributed plan created: 151.832ms (1.578ms) - Planning finished: 164.257ms (12.425ms) Query Timeline: 27s304ms - Query submitted: 129.658us (129.658us) - Planning finished: 170.224ms (170.095ms) - CatalogDdlRequest finished: 26s731ms (26s561ms) - Applied catalog updates from DDL: 26s740ms (8.752ms) - Submit for admission: 26s740ms (22.233us) - Completed admission: 26s740ms (286.295us) - Ready to start on 3 backends: 26s740ms (155.916us) - All 3 execution backends (3 fragment instances) started: 26s751ms (10.864ms) - Last row fetched: 26s920ms (168.226ms) - Released admission control resources: 26s920ms (27.635us) - DML data written: 26s920ms (126.369us) - Applied catalog updates from DDL: 26s985ms (65.354ms) - DML Metastore update finished: 26s985ms (30.343us) - Rows available: 26s985ms (27.050us) - Unregister query: 27s304ms (318.661ms) An example of creating a Kudu table: Catalog Server Operation: 1s730ms - Got Metastore client: 113.403us (113.403us) - Got current Metastore event id 8355276: 974.500us (861.097us) - Got Kudu client: 212.123ms (211.148ms) - Got kuduDdlLock: 212.128ms (4.680us) - Checked table existence in Kudu: 850.786ms (638.658ms) - Created table in Kudu: 1s623ms (772.379ms) - Got metastoreDdlLock: 1s623ms (397.305us) - Got Metastore client: 1s623ms (7.813us) - Checked table existence in Metastore: 1s648ms (25.154ms) - Created table in Metastore: 1s725ms (76.348ms) - Fetched event batch from Metastore: 1s728ms (3.004ms) - Created table in catalog cache: 1s730ms (2.024ms) - DDL finished: 1s730ms (84.448us) An example of creating an Iceberg table: Catalog Server Operation: 1s573ms - Got Metastore client: 141.799us (141.799us) - Checked table existence in Metastore: 2.957ms (2.815ms) - Got current Metastore event id 8355277: 3.669ms (712.475us) - Created table using Iceberg Catalog HIVE_CATALOG: 1s379ms (1s375ms) - Fetched event batch from Metastore: 1s381ms (2.188ms) - Created table in catalog cache: 1s382ms (1.556ms) - Set Iceberg table owner in Metastore: 1s573ms (190.262ms) - DDL finished: 1s573ms (59.176us) Tests: - Add e2e tests to verify the DDL timeline events exist in profile Change-Id: I3ebf591625e71391a5b23f56ddca8f0ae97b1efa Reviewed-on: http://gerrit.cloudera.org:8080/20368 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M be/src/service/client-request-state.cc M be/src/service/impala-server.cc M be/src/service/impala-server.h M common/thrift/CatalogService.thrift M fe/src/main/java/org/apache/impala/catalog/Catalog.java M fe/src/main/java/org/apache/impala/service/Catalog
[Impala-ASF-CR] IMPALA-12024: Add catalog profile for createTable
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20368 ) Change subject: IMPALA-12024: Add catalog profile for createTable .. Patch Set 6: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/20368 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3ebf591625e71391a5b23f56ddca8f0ae97b1efa Gerrit-Change-Number: 20368 Gerrit-PatchSet: 6 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Wed, 30 Aug 2023 04:28:43 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12024: Add catalog profile for createTable
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20368 ) Change subject: IMPALA-12024: Add catalog profile for createTable .. Patch Set 6: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/9649/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/20368 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3ebf591625e71391a5b23f56ddca8f0ae97b1efa Gerrit-Change-Number: 20368 Gerrit-PatchSet: 6 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Wed, 30 Aug 2023 00:02:43 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12024: Add catalog profile for createTable
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20368 ) Change subject: IMPALA-12024: Add catalog profile for createTable .. Patch Set 6: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/20368 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3ebf591625e71391a5b23f56ddca8f0ae97b1efa Gerrit-Change-Number: 20368 Gerrit-PatchSet: 6 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Wed, 30 Aug 2023 00:02:43 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12024: Add catalog profile for createTable
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/20368 ) Change subject: IMPALA-12024: Add catalog profile for createTable .. Patch Set 5: Code-Review+2 (1 comment) This looks good to me http://gerrit.cloudera.org:8080/#/c/20368/5/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/20368/5/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3738 PS5, Line 3738: Preconditions.checkState(tableName.isFullyQualified()); > TableName.fromThrift() always return a non-null result: That makes sense -- To view, visit http://gerrit.cloudera.org:8080/20368 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3ebf591625e71391a5b23f56ddca8f0ae97b1efa Gerrit-Change-Number: 20368 Gerrit-PatchSet: 5 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Tue, 29 Aug 2023 23:44:59 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12024: Add catalog profile for createTable
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/20368 ) Change subject: IMPALA-12024: Add catalog profile for createTable .. Patch Set 5: (1 comment) > Patch Set 5: > > (1 comment) > > This is very useful! Do we have plans to extend this to drop table, create > database, drop database, etc? Yeah, this is the first step. I will extend this to all DDLs after this patch is merged. http://gerrit.cloudera.org:8080/#/c/20368/5/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/20368/5/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3738 PS5, Line 3738: Preconditions.checkState(tableName.isFullyQualified()); > Nit: Why did we remove the tableName != null part? TableName.fromThrift() always return a non-null result: https://github.com/apache/impala/blob/0c8fc997ef7df09b675180a7baa1482852d60b11/fe/src/main/java/org/apache/impala/analysis/TableName.java#L150 public static TableName fromThrift(TTableName tableName) { return new TableName(tableName.getDb_name(), tableName.getTable_name()); } -- To view, visit http://gerrit.cloudera.org:8080/20368 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3ebf591625e71391a5b23f56ddca8f0ae97b1efa Gerrit-Change-Number: 20368 Gerrit-PatchSet: 5 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Tue, 29 Aug 2023 23:42:22 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12024: Add catalog profile for createTable
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/20368 ) Change subject: IMPALA-12024: Add catalog profile for createTable .. Patch Set 5: (1 comment) This is very useful! Do we have plans to extend this to drop table, create database, drop database, etc? http://gerrit.cloudera.org:8080/#/c/20368/5/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/20368/5/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3738 PS5, Line 3738: Preconditions.checkState(tableName.isFullyQualified()); Nit: Why did we remove the tableName != null part? -- To view, visit http://gerrit.cloudera.org:8080/20368 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3ebf591625e71391a5b23f56ddca8f0ae97b1efa Gerrit-Change-Number: 20368 Gerrit-PatchSet: 5 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Tue, 29 Aug 2023 17:56:38 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12024: Add catalog profile for createTable
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20368 ) Change subject: IMPALA-12024: Add catalog profile for createTable .. Patch Set 5: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/13820/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20368 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3ebf591625e71391a5b23f56ddca8f0ae97b1efa Gerrit-Change-Number: 20368 Gerrit-PatchSet: 5 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Wed, 23 Aug 2023 08:29:44 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12024: Add catalog profile for createTable
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/20368 ) Change subject: IMPALA-12024: Add catalog profile for createTable .. Patch Set 5: Added tests for creating kudu/iceberg tables -- To view, visit http://gerrit.cloudera.org:8080/20368 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3ebf591625e71391a5b23f56ddca8f0ae97b1efa Gerrit-Change-Number: 20368 Gerrit-PatchSet: 5 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Wed, 23 Aug 2023 08:08:14 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12024: Add catalog profile for createTable
Hello Riza Suminto, Joe McDonnell, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20368 to look at the new patch set (#5). Change subject: IMPALA-12024: Add catalog profile for createTable .. IMPALA-12024: Add catalog profile for createTable DDL/DMLs might spend a long time in the execution in catalogd. Currently, the profiles just have a counter of CatalogOpExecTimer. We need more items to show the details of how they are executed in catalogd. As the first step, this patch adds the profile timeline for how the createTable DDL is executed in Catalog Server. Also added more events in the existing "Query Timeline" for when the CatalogDdlRequest finished and when the catalog updates are applied. To implement this, a 'profile' field is added in TDdlExecResponse to carry the execution counters and timeline in catalogd. Currently, we just uses the timeline of it. We can add more counters in the future. Several methods add a timeline parameter to mark the progress in them. Timeline events are added after each RPC finished. Here is an example when HMS is hanging for 26s in a CTAS. I used gdb to attach to HMS as the JIRA description mentioned. In the timeline, we can see the time is spent in the first HMS RPC that fetching the current HMS event id: Catalog Server Operation: 26s560ms - Got metastoreDdlLock: 163.977us (163.977us) - Got Metastore client: 166.339us (2.362us) - Got current Metastore event id 8355270: 26s494ms (26s494ms) - Created table in Metastore: 26s558ms (63.507ms) - Fetched event batch from Metastore: 26s559ms (1.155ms) - Created table in catalog cache: 26s560ms (1.164ms) - DDL finished: 26s560ms (65.538us) Query Compilation: 164.257ms - Metadata of all 1 tables cached: 10.535ms (10.535ms) - Analysis finished: 118.324ms (107.788ms) - Authorization finished (noop): 118.489ms (164.626us) - Value transfer graph computed: 118.830ms (341.792us) - Single node plan created: 150.150ms (31.319ms) - Runtime filters computed: 150.254ms (103.529us) - Distributed plan created: 151.832ms (1.578ms) - Planning finished: 164.257ms (12.425ms) Query Timeline: 27s304ms - Query submitted: 129.658us (129.658us) - Planning finished: 170.224ms (170.095ms) - CatalogDdlRequest finished: 26s731ms (26s561ms) - Applied catalog updates from DDL: 26s740ms (8.752ms) - Submit for admission: 26s740ms (22.233us) - Completed admission: 26s740ms (286.295us) - Ready to start on 3 backends: 26s740ms (155.916us) - All 3 execution backends (3 fragment instances) started: 26s751ms (10.864ms) - Last row fetched: 26s920ms (168.226ms) - Released admission control resources: 26s920ms (27.635us) - DML data written: 26s920ms (126.369us) - Applied catalog updates from DDL: 26s985ms (65.354ms) - DML Metastore update finished: 26s985ms (30.343us) - Rows available: 26s985ms (27.050us) - Unregister query: 27s304ms (318.661ms) An example of creating a Kudu table: Catalog Server Operation: 1s730ms - Got Metastore client: 113.403us (113.403us) - Got current Metastore event id 8355276: 974.500us (861.097us) - Got Kudu client: 212.123ms (211.148ms) - Got kuduDdlLock: 212.128ms (4.680us) - Checked table existence in Kudu: 850.786ms (638.658ms) - Created table in Kudu: 1s623ms (772.379ms) - Got metastoreDdlLock: 1s623ms (397.305us) - Got Metastore client: 1s623ms (7.813us) - Checked table existence in Metastore: 1s648ms (25.154ms) - Created table in Metastore: 1s725ms (76.348ms) - Fetched event batch from Metastore: 1s728ms (3.004ms) - Created table in catalog cache: 1s730ms (2.024ms) - DDL finished: 1s730ms (84.448us) An example of creating an Iceberg table: Catalog Server Operation: 1s573ms - Got Metastore client: 141.799us (141.799us) - Checked table existence in Metastore: 2.957ms (2.815ms) - Got current Metastore event id 8355277: 3.669ms (712.475us) - Created table using Iceberg Catalog HIVE_CATALOG: 1s379ms (1s375ms) - Fetched event batch from Metastore: 1s381ms (2.188ms) - Created table in catalog cache: 1s382ms (1.556ms) - Set Iceberg table owner in Metastore: 1s573ms (190.262ms) - DDL finished: 1s573ms (59.176us) Tests: - Add e2e tests to verify the DDL timeline events exist in profile Change-Id: I3ebf591625e71391a5b23f56ddca8f0ae97b1efa --- M be/src/service/client-request-state.cc M be/src/service/impala-server.cc M be/src/service/impala-server.h M common/thrift/CatalogService.thrift M fe/src/main/java/org/apache/impala/catalog/Catalog.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apach
[Impala-ASF-CR] IMPALA-12024: Add catalog profile for createTable
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/20368 ) Change subject: IMPALA-12024: Add catalog profile for createTable .. Patch Set 4: (1 comment) Thanks for the review! http://gerrit.cloudera.org:8080/#/c/20368/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20368/4//COMMIT_MSG@33 PS4, Line 33: Created table in Metastore > nit: curious, why no existence check in Metastore before this? We can add that step in a separated patch. Currently we rely on the createTable HMS RPC to fail. The error message is not quite user-friendly. I disable event-processor and create a table in Hive, then create it again in Impala: Query: create table hive_tbl(id int) ERROR: ImpalaRuntimeException: Error making 'createTable' RPC to Hive Metastore: CAUSED BY: AlreadyExistsException: Table hive.default.hive_tbl already exists When creating the same table name in kudu format: Query: create table hive_tbl (id int, name string, primary key(id)) partition by hash(id) partitions 3 stored as kudu +---+ | summary | +---+ | Table already exists. | +---+ Fetched 1 row(s) in 1.63s We can also improve the error message to "Table already exists in Metastore". Filed a newbie JIRA: IMPALA-12396 -- To view, visit http://gerrit.cloudera.org:8080/20368 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3ebf591625e71391a5b23f56ddca8f0ae97b1efa Gerrit-Change-Number: 20368 Gerrit-PatchSet: 4 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Wed, 23 Aug 2023 02:42:15 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12024: Add catalog profile for createTable
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/20368 ) Change subject: IMPALA-12024: Add catalog profile for createTable .. Patch Set 4: Code-Review+1 (1 comment) Thank you for addressing my comments! http://gerrit.cloudera.org:8080/#/c/20368/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20368/4//COMMIT_MSG@33 PS4, Line 33: Created table in Metastore nit: curious, why no existence check in Metastore before this? -- To view, visit http://gerrit.cloudera.org:8080/20368 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3ebf591625e71391a5b23f56ddca8f0ae97b1efa Gerrit-Change-Number: 20368 Gerrit-PatchSet: 4 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Tue, 22 Aug 2023 20:19:19 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12024: Add catalog profile for createTable
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20368 ) Change subject: IMPALA-12024: Add catalog profile for createTable .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/13797/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20368 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3ebf591625e71391a5b23f56ddca8f0ae97b1efa Gerrit-Change-Number: 20368 Gerrit-PatchSet: 4 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Tue, 22 Aug 2023 03:19:15 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12024: Add catalog profile for createTable
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/20368 ) Change subject: IMPALA-12024: Add catalog profile for createTable .. Patch Set 3: (12 comments) > Patch Set 3: > > (12 comments) > > I think it is important to make the event label self explanatory and as clear > as possible. Otherwise, an explanation should be documented somewhere for > each label. > > Some general comments: > - Replace HMS with Metastore for consistency. > - Consider turning some duplicated event label into string constant. > - Clarify the source of event. Is it Metastore vs CatalogD vs Filesystem vs > other service (like Kudu master). Thanks for the suggestions! Refactored the patch to use string constants. I was intended to write the labels differently by strings to get rid of multiple hits when searching the code. But it makes sense to normalize them as constants so we don't need to document them. Also clarified the sources. BTW, use "Kudu" instead of "Kudu master" since we treate Kudu as a system and can't distinguish what are created in Kudu master and what in Kudu tablet servers. I plan to add more tests but need to figure out a clean way of verifying the EventSequence. Found an existing bug of the JSON profile by the way: IMPALA-12391 http://gerrit.cloudera.org:8080/#/c/20368/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/20368/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3523 PS3, Line 3523: catalogTimeline.markEvent("Created Metastore table"); : } else { : catalogTimeline.markEvent("Found table exists"); > Maybe reorganize this to 2 things: existence check and new table creation: Done http://gerrit.cloudera.org:8080/#/c/20368/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3552 PS3, Line 3552: Created catalog table > nit: "Cached table in catalogd"? Done http://gerrit.cloudera.org:8080/#/c/20368/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3610 PS3, Line 3610: Created Metastore table > nit: "Created table in Metastore"? Done http://gerrit.cloudera.org:8080/#/c/20368/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3627 PS3, Line 3627: HMS > nit: "Fetched table from Metastore"? Done http://gerrit.cloudera.org:8080/#/c/20368/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3656 PS3, Line 3656: Created catalog table > nit: "Cached table in catalog"? Done http://gerrit.cloudera.org:8080/#/c/20368/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3702 PS3, Line 3702: HMS > nit: Metastore Yeah, this one is different than other fetches since it just use the createTime and not caching other things. http://gerrit.cloudera.org:8080/#/c/20368/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3786 PS3, Line 3786: Created iceberg table > nit: created in where? HMS, filesystem, or CatalogD cache? It depends on what iceberg catalog is used. Hard to say it's just HMS or filesystem.. This label generally means the invocation on iceberg createTable API finished. http://gerrit.cloudera.org:8080/#/c/20368/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3827 PS3, Line 3827: Created Metastore table > nit: dulicated String. Change to constant? Done http://gerrit.cloudera.org:8080/#/c/20368/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3834 PS3, Line 3834: Found table exists > nit: dulicated String. Change to constant? Done http://gerrit.cloudera.org:8080/#/c/20368/3/fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/20368/3/fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java@102 PS3, Line 102: Checked table exists > nit: Checked table existence Done http://gerrit.cloudera.org:8080/#/c/20368/3/fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java@342 PS3, Line 342: Checked table exists > nit: Checked table existence Done http://gerrit.cloudera.org:8080/#/c/20368/3/fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java@369 PS3, Line 369: cols > nit: columns Done -- To view, visit http://gerrit.cloudera.org:8080/20368 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3ebf591625e71391a5b23f56ddca8f0ae97b1efa Gerrit-Change-Number: 20368 Gerrit-PatchSet: 3 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Tue, 22 Aug 2023 02:54:37 + Gerrit-Ha
[Impala-ASF-CR] IMPALA-12024: Add catalog profile for createTable
Hello Riza Suminto, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20368 to look at the new patch set (#4). Change subject: IMPALA-12024: Add catalog profile for createTable .. IMPALA-12024: Add catalog profile for createTable DDL/DMLs might spend a long time in the execution in catalogd. Currently, the profiles just have a counter of CatalogOpExecTimer. We need more items to show the details of how they are executed in catalogd. As the first step, this patch adds the profile timeline for how the createTable DDL is executed in Catalog Server. Also added more events in the existing "Query Timeline" for when the CatalogDdlRequest finished and when the catalog updates are applied. To implement this, a 'profile' field is added in TDdlExecResponse to carry the execution counters and timeline in catalogd. Currently, we just uses the timeline of it. We can add more counters in the future. Several methods add a timeline parameter to mark the progress in them. Timeline events are added after each RPC finished. Here is an example when HMS is hanging for 26s in a CTAS. I used gdb to attach to HMS as the JIRA description mentioned. In the timeline, we can see the time is spent in the first HMS RPC that fetching the current HMS event id: Catalog Server Operation: 26s560ms - Got metastoreDdlLock: 163.977us (163.977us) - Got Metastore client: 166.339us (2.362us) - Got current Metastore event id 8355270: 26s494ms (26s494ms) - Created table in Metastore: 26s558ms (63.507ms) - Fetched event batch from Metastore: 26s559ms (1.155ms) - Created table in catalog cache: 26s560ms (1.164ms) - DDL finished: 26s560ms (65.538us) Query Compilation: 164.257ms - Metadata of all 1 tables cached: 10.535ms (10.535ms) - Analysis finished: 118.324ms (107.788ms) - Authorization finished (noop): 118.489ms (164.626us) - Value transfer graph computed: 118.830ms (341.792us) - Single node plan created: 150.150ms (31.319ms) - Runtime filters computed: 150.254ms (103.529us) - Distributed plan created: 151.832ms (1.578ms) - Planning finished: 164.257ms (12.425ms) Query Timeline: 27s304ms - Query submitted: 129.658us (129.658us) - Planning finished: 170.224ms (170.095ms) - CatalogDdlRequest finished: 26s731ms (26s561ms) - Applied catalog updates from DDL: 26s740ms (8.752ms) - Submit for admission: 26s740ms (22.233us) - Completed admission: 26s740ms (286.295us) - Ready to start on 3 backends: 26s740ms (155.916us) - All 3 execution backends (3 fragment instances) started: 26s751ms (10.864ms) - Last row fetched: 26s920ms (168.226ms) - Released admission control resources: 26s920ms (27.635us) - DML data written: 26s920ms (126.369us) - Applied catalog updates from DDL: 26s985ms (65.354ms) - DML Metastore update finished: 26s985ms (30.343us) - Rows available: 26s985ms (27.050us) - Unregister query: 27s304ms (318.661ms) An example of creating a Kudu table: Catalog Server Operation: 1s730ms - Got Metastore client: 113.403us (113.403us) - Got current Metastore event id 8355276: 974.500us (861.097us) - Got Kudu client: 212.123ms (211.148ms) - Got kuduDdlLock: 212.128ms (4.680us) - Checked table existence in Kudu: 850.786ms (638.658ms) - Created table in Kudu: 1s623ms (772.379ms) - Got metastoreDdlLock: 1s623ms (397.305us) - Got Metastore client: 1s623ms (7.813us) - Checked table existence in Metastore: 1s648ms (25.154ms) - Created table in Metastore: 1s725ms (76.348ms) - Fetched event batch from Metastore: 1s728ms (3.004ms) - Created table in catalog cache: 1s730ms (2.024ms) - DDL finished: 1s730ms (84.448us) An example of creating an Iceberg table: Catalog Server Operation: 1s573ms - Got Metastore client: 141.799us (141.799us) - Checked table existence in Metastore: 2.957ms (2.815ms) - Got current Metastore event id 8355277: 3.669ms (712.475us) - Created table using Iceberg Catalog HIVE_CATALOG: 1s379ms (1s375ms) - Fetched event batch from Metastore: 1s381ms (2.188ms) - Created table in catalog cache: 1s382ms (1.556ms) - Set Iceberg table owner in Metastore: 1s573ms (190.262ms) - DDL finished: 1s573ms (59.176us) Tests: - Add e2e tests to verify the DDL timeline events exist in profile Change-Id: I3ebf591625e71391a5b23f56ddca8f0ae97b1efa --- M be/src/service/client-request-state.cc M be/src/service/impala-server.cc M be/src/service/impala-server.h M common/thrift/CatalogService.thrift M fe/src/main/java/org/apache/impala/catalog/Catalog.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/servic
[Impala-ASF-CR] IMPALA-12024: Add catalog profile for createTable
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/20368 ) Change subject: IMPALA-12024: Add catalog profile for createTable .. Patch Set 3: (12 comments) I think it is important to make the event label self explanatory and as clear as possible. Otherwise, an explanation should be documented somewhere for each label. Some general comments: - Replace HMS with Metastore for consistency. - Consider turning some duplicated event label into string constant. - Clarify the source of event. Is it Metastore vs CatalogD vs Filesystem vs other service (like Kudu master). http://gerrit.cloudera.org:8080/#/c/20368/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/20368/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3523 PS3, Line 3523: catalogTimeline.markEvent("Created Metastore table"); : } else { : catalogTimeline.markEvent("Found table exists"); Maybe reorganize this to 2 things: existence check and new table creation: - Checked table existence in Metastore: xx ms - Created table in Metastore: xx ms The first one is always printed. The second one only show up in new table creation. http://gerrit.cloudera.org:8080/#/c/20368/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3552 PS3, Line 3552: Created catalog table nit: "Cached table in catalogd"? http://gerrit.cloudera.org:8080/#/c/20368/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3610 PS3, Line 3610: Created Metastore table nit: "Created table in Metastore"? This is referred twice in chis class. Should this be a constant instead? http://gerrit.cloudera.org:8080/#/c/20368/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3627 PS3, Line 3627: HMS nit: "Fetched table from Metastore"? http://gerrit.cloudera.org:8080/#/c/20368/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3656 PS3, Line 3656: Created catalog table nit: "Cached table in catalog"? This is referred twice in chis class. Should this be a constant instead? http://gerrit.cloudera.org:8080/#/c/20368/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3702 PS3, Line 3702: HMS nit: Metastore Is this label intentionally made different from the other "HMS table fetched" at line 3627? http://gerrit.cloudera.org:8080/#/c/20368/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3786 PS3, Line 3786: Created iceberg table nit: created in where? HMS, filesystem, or CatalogD cache? Why iceberg has its own specific label? http://gerrit.cloudera.org:8080/#/c/20368/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3827 PS3, Line 3827: Created Metastore table nit: dulicated String. Change to constant? http://gerrit.cloudera.org:8080/#/c/20368/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3834 PS3, Line 3834: Found table exists nit: dulicated String. Change to constant? Found in CatalogD or Metastore? http://gerrit.cloudera.org:8080/#/c/20368/3/fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/20368/3/fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java@102 PS3, Line 102: Checked table exists nit: Checked table existence Exist in where? Kudu master, HMS, or CatalogD? http://gerrit.cloudera.org:8080/#/c/20368/3/fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java@342 PS3, Line 342: Checked table exists nit: Checked table existence http://gerrit.cloudera.org:8080/#/c/20368/3/fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java@369 PS3, Line 369: cols nit: columns -- To view, visit http://gerrit.cloudera.org:8080/20368 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3ebf591625e71391a5b23f56ddca8f0ae97b1efa Gerrit-Change-Number: 20368 Gerrit-PatchSet: 3 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Mon, 21 Aug 2023 17:18:27 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12024: Add catalog profile for createTable
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20368 ) Change subject: IMPALA-12024: Add catalog profile for createTable .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/13789/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20368 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3ebf591625e71391a5b23f56ddca8f0ae97b1efa Gerrit-Change-Number: 20368 Gerrit-PatchSet: 3 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Mon, 21 Aug 2023 14:11:12 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12024: Add catalog profile for createTable
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/20368 ) Change subject: IMPALA-12024: Add catalog profile for createTable .. Patch Set 3: (3 comments) http://gerrit.cloudera.org:8080/#/c/20368/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/20368/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@455 PS2, Line 455: createTable(ddlRequest.getCreate_table_params(), response, catalogTimeline, > line too long (94 > 90) Done http://gerrit.cloudera.org:8080/#/c/20368/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2072 PS2, Line 2072: private List getNextMetastoreEventsIfEnabled( > line too long (96 > 90) Done http://gerrit.cloudera.org:8080/#/c/20368/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@4047 PS2, Line 4047: createKuduTable(tbl, createTableParams, wantMinimalResult, response, > line too long (92 > 90) Done -- To view, visit http://gerrit.cloudera.org:8080/20368 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3ebf591625e71391a5b23f56ddca8f0ae97b1efa Gerrit-Change-Number: 20368 Gerrit-PatchSet: 3 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Mon, 21 Aug 2023 13:43:47 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12024: Add catalog profile for createTable
Hello Riza Suminto, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20368 to look at the new patch set (#3). Change subject: IMPALA-12024: Add catalog profile for createTable .. IMPALA-12024: Add catalog profile for createTable DDL/DMLs might spend a long time in the execution in catalogd. Currently, the profiles just have a counter of CatalogOpExecTimer. We need more items to show the details of how they are executed in catalogd. As the first step, this patch adds the profile timeline for how the createTable DDL is executed in Catalog Server. Also added more events in the existing "Query Timeline" for when the CatalogDdlRequest finished and when the catalog updates are applied. To implement this, a 'profile' field is added in TDdlExecResponse to carry the execution counters and timeline in catalogd. Currently, we just uses the timeline of it. We can add more counters in the future. Several methods add a timeline parameter to mark the progress in them. Timeline events are added after each RPC finished. Here is an example when HMS is hanging for 26s in a CTAS. I used gdb to attach to HMS as the JIRA description mentioned. In the timeline, we can see the time is spent in the first HMS RPC that fetching the current HMS event id: Catalog Server Operation: 26s560ms - Got metastoreDdlLock: 163.977us (163.977us) - Got Metastore client: 166.339us (2.362us) - Got current Metastore event id: 26s494ms (26s494ms) - Created Metastore table: 26s558ms (63.507ms) - Fetched Metastore event batch: 26s559ms (1.155ms) - Created catalog table: 26s560ms (1.164ms) - DDL finished: 26s560ms (65.538us) Query Compilation: 164.257ms - Metadata of all 1 tables cached: 10.535ms (10.535ms) - Analysis finished: 118.324ms (107.788ms) - Authorization finished (noop): 118.489ms (164.626us) - Value transfer graph computed: 118.830ms (341.792us) - Single node plan created: 150.150ms (31.319ms) - Runtime filters computed: 150.254ms (103.529us) - Distributed plan created: 151.832ms (1.578ms) - Planning finished: 164.257ms (12.425ms) Query Timeline: 27s304ms - Query submitted: 129.658us (129.658us) - Planning finished: 170.224ms (170.095ms) - CatalogDdlRequest finished: 26s731ms (26s561ms) - Applied catalog updates from DDL: 26s740ms (8.752ms) - Submit for admission: 26s740ms (22.233us) - Completed admission: 26s740ms (286.295us) - Ready to start on 3 backends: 26s740ms (155.916us) - All 3 execution backends (3 fragment instances) started: 26s751ms (10.864ms) - Last row fetched: 26s920ms (168.226ms) - Released admission control resources: 26s920ms (27.635us) - DML data written: 26s920ms (126.369us) - Applied catalog updates from DDL: 26s985ms (65.354ms) - DML Metastore update finished: 26s985ms (30.343us) - Rows available: 26s985ms (27.050us) - Unregister query: 27s304ms (318.661ms) Tests: - Add e2e tests to verify the DDL timeline events exist in profile Change-Id: I3ebf591625e71391a5b23f56ddca8f0ae97b1efa --- M be/src/service/client-request-state.cc M be/src/service/impala-server.cc M be/src/service/impala-server.h M common/thrift/CatalogService.thrift M fe/src/main/java/org/apache/impala/catalog/Catalog.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java M tests/query_test/test_observability.py 9 files changed, 193 insertions(+), 73 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/20368/3 -- To view, visit http://gerrit.cloudera.org:8080/20368 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3ebf591625e71391a5b23f56ddca8f0ae97b1efa Gerrit-Change-Number: 20368 Gerrit-PatchSet: 3 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto
[Impala-ASF-CR] IMPALA-12024: Add catalog profile for createTable
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20368 ) Change subject: IMPALA-12024: Add catalog profile for createTable .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/13784/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20368 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3ebf591625e71391a5b23f56ddca8f0ae97b1efa Gerrit-Change-Number: 20368 Gerrit-PatchSet: 2 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Sun, 20 Aug 2023 14:12:00 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12024: Add catalog profile for createTable
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/20368 ) Change subject: IMPALA-12024: Add catalog profile for createTable .. Patch Set 1: (3 comments) > Patch Set 1: > > (3 comments) > > It looks OK to me. > > However, there are so many new event labels introduced here and it feels very > verbose. Some label can appear in one condition and missing in other > condition. I wonder if it can be reduced into smaller set that can be > displayed more consistently. > > Any thoughts on changing some of the minor one into LOG or counter? Or > perhaps write it to coordinator log instead of query profile? I understand > this patch is meant to gather CatalogD timing info in one place. Yeah, a lot of event labels are introduced in this patch. I try to add event label for each external RPC and lock acquisition since they are usually the bottleneck of catalog operations. For switching some events to be counters or logs, I think there are some cons. For counters: * We already have lots of counters in the profile. I need to search them. * Counters always exist in the profile even if the code branch doesn't run. * Users don't know which step happens first if we just provide counters. Timeline events have the order which also helps users/beginers to learn Impala. For logs: * We need to manually compute the elapsed time between two steps unless we explicitly log them. * We need to collect catalogd and coordinator logs for these details events when the profile can't explain the issue. Logs are easier to be rolled out. So it'd be nice if we can provide the execution details in one place. Users just need to provide us the profile. BTW, current DDL profiles are usually short (except CTAS) so I think it's ok to add detailed timeline in them :) http://gerrit.cloudera.org:8080/#/c/20368/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20368/1//COMMIT_MSG@31 PS1, Line 31: HMS > nit: maybe spell it out to "Metastore"? Done. Changed "HMS" to "Metastore" in this timeline. http://gerrit.cloudera.org:8080/#/c/20368/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/20368/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@400 PS1, Line 400: CatalogOpExecution > nit: Is this the same as "Catalog Server Operation"? This string is not used since I hard-coded the name in ClientRequestState::ExecDdlRequestImpl(). Changed the code to use this and change this to "Catalog Server Operation". http://gerrit.cloudera.org:8080/#/c/20368/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2255 PS1, Line 2255: timeline > nit: how about using more specific name like catalogEvents or catalogTimeli Done -- To view, visit http://gerrit.cloudera.org:8080/20368 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3ebf591625e71391a5b23f56ddca8f0ae97b1efa Gerrit-Change-Number: 20368 Gerrit-PatchSet: 1 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Sun, 20 Aug 2023 13:47:30 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12024: Add catalog profile for createTable
Hello Riza Suminto, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20368 to look at the new patch set (#2). Change subject: IMPALA-12024: Add catalog profile for createTable .. IMPALA-12024: Add catalog profile for createTable DDL/DMLs might spend a long time in the execution in catalogd. Currently, the profiles just have a counter of CatalogOpExecTimer. We need more items to show the details of how they are executed in catalogd. As the first step, this patch adds the profile timeline for how the createTable DDL is executed in Catalog Server. Also added more events in the existing "Query Timeline" for when the CatalogDdlRequest finished and when the catalog updates are applied. To implement this, a 'profile' field is added in TDdlExecResponse to carry the execution counters and timeline in catalogd. Currently, we just uses the timeline of it. We can add more counters in the future. Several methods add a timeline parameter to mark the progress in them. Timeline events are added after each RPC finished. Here is an example when HMS is hanging for 26s in a CTAS. I used gdb to attach to HMS as the JIRA description mentioned. In the timeline, we can see the time is spent in the first HMS RPC that fetching the current HMS event id: Catalog Server Operation: 26s560ms - Got metastoreDdlLock: 163.977us (163.977us) - Got Metastore client: 166.339us (2.362us) - Got current Metastore event id: 26s494ms (26s494ms) - Created Metastore table: 26s558ms (63.507ms) - Fetched Metastore event batch: 26s559ms (1.155ms) - Created catalog table: 26s560ms (1.164ms) - DDL finished: 26s560ms (65.538us) Query Compilation: 164.257ms - Metadata of all 1 tables cached: 10.535ms (10.535ms) - Analysis finished: 118.324ms (107.788ms) - Authorization finished (noop): 118.489ms (164.626us) - Value transfer graph computed: 118.830ms (341.792us) - Single node plan created: 150.150ms (31.319ms) - Runtime filters computed: 150.254ms (103.529us) - Distributed plan created: 151.832ms (1.578ms) - Planning finished: 164.257ms (12.425ms) Query Timeline: 27s304ms - Query submitted: 129.658us (129.658us) - Planning finished: 170.224ms (170.095ms) - CatalogDdlRequest finished: 26s731ms (26s561ms) - Applied catalog updates from DDL: 26s740ms (8.752ms) - Submit for admission: 26s740ms (22.233us) - Completed admission: 26s740ms (286.295us) - Ready to start on 3 backends: 26s740ms (155.916us) - All 3 execution backends (3 fragment instances) started: 26s751ms (10.864ms) - Last row fetched: 26s920ms (168.226ms) - Released admission control resources: 26s920ms (27.635us) - DML data written: 26s920ms (126.369us) - Applied catalog updates from DDL: 26s985ms (65.354ms) - DML Metastore update finished: 26s985ms (30.343us) - Rows available: 26s985ms (27.050us) - Unregister query: 27s304ms (318.661ms) Tests: - Add e2e tests to verify the DDL timeline events exist in profile Change-Id: I3ebf591625e71391a5b23f56ddca8f0ae97b1efa --- M be/src/service/client-request-state.cc M be/src/service/impala-server.cc M be/src/service/impala-server.h M common/thrift/CatalogService.thrift M fe/src/main/java/org/apache/impala/catalog/Catalog.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java M tests/query_test/test_observability.py 9 files changed, 191 insertions(+), 72 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/20368/2 -- To view, visit http://gerrit.cloudera.org:8080/20368 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3ebf591625e71391a5b23f56ddca8f0ae97b1efa Gerrit-Change-Number: 20368 Gerrit-PatchSet: 2 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto
[Impala-ASF-CR] IMPALA-12024: Add catalog profile for createTable
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20368 ) Change subject: IMPALA-12024: Add catalog profile for createTable .. Patch Set 2: (3 comments) http://gerrit.cloudera.org:8080/#/c/20368/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/20368/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@455 PS2, Line 455: createTable(ddlRequest.getCreate_table_params(), response, catalogTimeline, syncDdl, line too long (94 > 90) http://gerrit.cloudera.org:8080/#/c/20368/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2072 PS2, Line 2072: private List getNextMetastoreEventsIfEnabled(EventSequence catalogTimeline, line too long (96 > 90) http://gerrit.cloudera.org:8080/#/c/20368/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@4047 PS2, Line 4047: createKuduTable(tbl, createTableParams, wantMinimalResult, response, catalogTimeline); line too long (92 > 90) -- To view, visit http://gerrit.cloudera.org:8080/20368 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3ebf591625e71391a5b23f56ddca8f0ae97b1efa Gerrit-Change-Number: 20368 Gerrit-PatchSet: 2 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Sun, 20 Aug 2023 13:47:48 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12024: Add catalog profile for createTable
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/20368 ) Change subject: IMPALA-12024: Add catalog profile for createTable .. Patch Set 1: (3 comments) It looks OK to me. However, there are so many new event labels introduced here and it feels very verbose. Some label can appear in one condition and missing in other condition. I wonder if it can be reduced into smaller set that can be displayed more consistently. Any thoughts on changing some of the minor one into LOG or counter? Or perhaps write it to coordinator log instead of query profile? I understand this patch is meant to gather CatalogD timing info in one place. http://gerrit.cloudera.org:8080/#/c/20368/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20368/1//COMMIT_MSG@31 PS1, Line 31: HMS nit: maybe spell it out to "Metastore"? http://gerrit.cloudera.org:8080/#/c/20368/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/20368/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@400 PS1, Line 400: CatalogOpExecution nit: Is this the same as "Catalog Server Operation"? http://gerrit.cloudera.org:8080/#/c/20368/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2255 PS1, Line 2255: timeline nit: how about using more specific name like catalogEvents or catalogTimeline? Same comment for the other places. -- To view, visit http://gerrit.cloudera.org:8080/20368 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3ebf591625e71391a5b23f56ddca8f0ae97b1efa Gerrit-Change-Number: 20368 Gerrit-PatchSet: 1 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Thu, 17 Aug 2023 23:45:56 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12024: Add catalog profile for createTable
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20368 ) Change subject: IMPALA-12024: Add catalog profile for createTable .. Patch Set 1: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/20368 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3ebf591625e71391a5b23f56ddca8f0ae97b1efa Gerrit-Change-Number: 20368 Gerrit-PatchSet: 1 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 16 Aug 2023 12:00:11 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12024: Add catalog profile for createTable
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20368 ) Change subject: IMPALA-12024: Add catalog profile for createTable .. Patch Set 1: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/9602/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/20368 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3ebf591625e71391a5b23f56ddca8f0ae97b1efa Gerrit-Change-Number: 20368 Gerrit-PatchSet: 1 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 16 Aug 2023 07:52:27 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12024: Add catalog profile for createTable
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20368 ) Change subject: IMPALA-12024: Add catalog profile for createTable .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/13762/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20368 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3ebf591625e71391a5b23f56ddca8f0ae97b1efa Gerrit-Change-Number: 20368 Gerrit-PatchSet: 1 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 16 Aug 2023 07:37:59 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12024: Add catalog profile for createTable
Quanlong Huang has uploaded this change for review. ( http://gerrit.cloudera.org:8080/20368 Change subject: IMPALA-12024: Add catalog profile for createTable .. IMPALA-12024: Add catalog profile for createTable DDL/DMLs might spend a long time in the execution in catalogd. Currently, the profiles just have a counter of CatalogOpExecTimer. We need more items to show the details of how they are executed in catalogd. As the first step, this patch adds the profile timeline for how the createTable DDL is executed in Catalog Server. Also added more events in the existing "Query Timeline" for when the CatalogDdlRequest finished and when the catalog updates are applied. To implement this, a 'profile' field is added in TDdlExecResponse to carry the execution counters and timeline in catalogd. Currently, we just uses the timeline of it. We can add more counters in the future. Several methods add a timeline parameter to mark the progress in them. Timeline events are added after each RPC finished. Here is an example when HMS is hanging for 26s in a CTAS. I used gdb to attach to HMS as the JIRA description mentioned. In the timeline, we can see the time is spent in the first HMS RPC that fetching the current HMS event id: Catalog Server Operation: 26s560ms - Got metastoreDdlLock: 163.977us (163.977us) - Got HMS client: 166.339us (2.362us) - Got current HMS event id: 26s494ms (26s494ms) - Created HMS table: 26s558ms (63.507ms) - Fetched HMS event batch: 26s559ms (1.155ms) - Created catalog table: 26s560ms (1.164ms) - DDL finished: 26s560ms (65.538us) Query Compilation: 164.257ms - Metadata of all 1 tables cached: 10.535ms (10.535ms) - Analysis finished: 118.324ms (107.788ms) - Authorization finished (noop): 118.489ms (164.626us) - Value transfer graph computed: 118.830ms (341.792us) - Single node plan created: 150.150ms (31.319ms) - Runtime filters computed: 150.254ms (103.529us) - Distributed plan created: 151.832ms (1.578ms) - Planning finished: 164.257ms (12.425ms) Query Timeline: 27s304ms - Query submitted: 129.658us (129.658us) - Planning finished: 170.224ms (170.095ms) - CatalogDdlRequest finished: 26s731ms (26s561ms) - Applied catalog updates from DDL: 26s740ms (8.752ms) - Submit for admission: 26s740ms (22.233us) - Completed admission: 26s740ms (286.295us) - Ready to start on 3 backends: 26s740ms (155.916us) - All 3 execution backends (3 fragment instances) started: 26s751ms (10.864ms) - Last row fetched: 26s920ms (168.226ms) - Released admission control resources: 26s920ms (27.635us) - DML data written: 26s920ms (126.369us) - Applied catalog updates from DDL: 26s985ms (65.354ms) - DML Metastore update finished: 26s985ms (30.343us) - Rows available: 26s985ms (27.050us) - Unregister query: 27s304ms (318.661ms) Tests: - Add e2e tests to verify the DDL timeline events exist in profile Change-Id: I3ebf591625e71391a5b23f56ddca8f0ae97b1efa --- M be/src/service/client-request-state.cc M be/src/service/impala-server.cc M be/src/service/impala-server.h M common/thrift/CatalogService.thrift M fe/src/main/java/org/apache/impala/catalog/Catalog.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java M tests/query_test/test_observability.py 9 files changed, 182 insertions(+), 68 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/20368/1 -- To view, visit http://gerrit.cloudera.org:8080/20368 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I3ebf591625e71391a5b23f56ddca8f0ae97b1efa Gerrit-Change-Number: 20368 Gerrit-PatchSet: 1 Gerrit-Owner: Quanlong Huang