[Impala-ASF-CR] IMPALA-10317: Add query option that limits huge joins at runtime
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/16706 ) Change subject: IMPALA-10317: Add query option that limits huge joins at runtime .. IMPALA-10317: Add query option that limits huge joins at runtime This patch adds support for limiting the rows produced by a join node such that runaway join queries can be prevented. The limit is specified by a query option. Queries exceeding that limit get terminated. The checking runs periodically, so the actual rows produced may go somewhat over the limit. JOIN_ROWS_PRODUCED_LIMIT is exposed as an advanced query option. Rows produced Query profile is updated to include query wide and per backend metrics for RowsReturned. Example from " set JOIN_ROWS_PRODUCED_LIMIT = 1000; select count(*) from tpch_parquet.lineitem l1 cross join (select * from tpch_parquet.lineitem l2 limit 5) l3;": NESTED_LOOP_JOIN_NODE (id=2): - InactiveTotalTime: 107.534ms - PeakMemoryUsage: 16.00 KB (16384) - ProbeRows: 1.02K (1024) - ProbeTime: 0.000ns - RowsReturned: 10.00M (10002025) - RowsReturnedRate: 749.58 K/sec - TotalTime: 13s337ms Testing: Added tests for JOIN_ROWS_PRODUCED_LIMIT Change-Id: Idbca7e053b61b4e31b066edcfb3b0398fa859d02 Reviewed-on: http://gerrit.cloudera.org:8080/16706 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M be/src/runtime/coordinator-backend-state.cc M be/src/runtime/coordinator.h M be/src/runtime/fragment-instance-state.cc M be/src/runtime/fragment-instance-state.h M be/src/runtime/query-state.cc M be/src/service/impala-server.cc M be/src/service/query-options-test.cc M be/src/service/query-options.cc M be/src/service/query-options.h M common/protobuf/control_service.proto M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/generate_error_codes.py M testdata/workloads/functional-query/queries/QueryTest/query-resource-limits.test 14 files changed, 167 insertions(+), 5 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/16706 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Idbca7e053b61b4e31b066edcfb3b0398fa859d02 Gerrit-Change-Number: 16706 Gerrit-PatchSet: 8 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-10317: Add query option that limits huge joins at runtime
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16706 ) Change subject: IMPALA-10317: Add query option that limits huge joins at runtime .. Patch Set 7: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/16706 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idbca7e053b61b4e31b066edcfb3b0398fa859d02 Gerrit-Change-Number: 16706 Gerrit-PatchSet: 7 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 22 Dec 2020 06:10:38 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10317: Add query option that limits huge joins at runtime
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16706 ) Change subject: IMPALA-10317: Add query option that limits huge joins at runtime .. Patch Set 7: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/16706 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idbca7e053b61b4e31b066edcfb3b0398fa859d02 Gerrit-Change-Number: 16706 Gerrit-PatchSet: 7 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 22 Dec 2020 00:31:24 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10317: Add query option that limits huge joins at runtime
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16706 ) Change subject: IMPALA-10317: Add query option that limits huge joins at runtime .. Patch Set 7: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6799/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/16706 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idbca7e053b61b4e31b066edcfb3b0398fa859d02 Gerrit-Change-Number: 16706 Gerrit-PatchSet: 7 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 22 Dec 2020 00:31:25 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10317: Add query option that limits huge joins at runtime
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/16706 ) Change subject: IMPALA-10317: Add query option that limits huge joins at runtime .. Patch Set 6: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/16706 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idbca7e053b61b4e31b066edcfb3b0398fa859d02 Gerrit-Change-Number: 16706 Gerrit-PatchSet: 6 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 22 Dec 2020 00:27:59 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10317: Add query option that limits huge joins at runtime
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16706 ) Change subject: IMPALA-10317: Add query option that limits huge joins at runtime .. Patch Set 6: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7885/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16706 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idbca7e053b61b4e31b066edcfb3b0398fa859d02 Gerrit-Change-Number: 16706 Gerrit-PatchSet: 6 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Mon, 21 Dec 2020 15:06:06 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10317: Add query option that limits huge joins at runtime
Fucun Chu has posted comments on this change. ( http://gerrit.cloudera.org:8080/16706 ) Change subject: IMPALA-10317: Add query option that limits huge joins at runtime .. Patch Set 6: (3 comments) http://gerrit.cloudera.org:8080/#/c/16706/5//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16706/5//COMMIT_MSG@9 PS5, Line 9: a join no > You may want to mention why it is useful even though it may be somewhat obv Done http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/runtime/coordinator.h File be/src/runtime/coordinator.h: http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/runtime/coordinator.h@230 PS3, Line 230: /// Total num rows produced by each join node. The key is join node id. > Good ! Done http://gerrit.cloudera.org:8080/#/c/16706/5/common/protobuf/control_service.proto File common/protobuf/control_service.proto: http://gerrit.cloudera.org:8080/#/c/16706/5/common/protobuf/control_service.proto@261 PS5, Line 261: map per_join_rows_produced = 16; > Mark this as 'optional' field ? Maps cannot be repeated, optional, or required. from: https://developers.google.com/protocol-buffers/docs/proto#maps -- To view, visit http://gerrit.cloudera.org:8080/16706 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idbca7e053b61b4e31b066edcfb3b0398fa859d02 Gerrit-Change-Number: 16706 Gerrit-PatchSet: 6 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Mon, 21 Dec 2020 14:44:58 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10317: Add query option that limits huge joins at runtime
Fucun Chu has uploaded a new patch set (#6). ( http://gerrit.cloudera.org:8080/16706 ) Change subject: IMPALA-10317: Add query option that limits huge joins at runtime .. IMPALA-10317: Add query option that limits huge joins at runtime This patch adds support for limiting the rows produced by a join node such that runaway join queries can be prevented. The limit is specified by a query option. Queries exceeding that limit get terminated. The checking runs periodically, so the actual rows produced may go somewhat over the limit. JOIN_ROWS_PRODUCED_LIMIT is exposed as an advanced query option. Rows produced Query profile is updated to include query wide and per backend metrics for RowsReturned. Example from " set JOIN_ROWS_PRODUCED_LIMIT = 1000; select count(*) from tpch_parquet.lineitem l1 cross join (select * from tpch_parquet.lineitem l2 limit 5) l3;": NESTED_LOOP_JOIN_NODE (id=2): - InactiveTotalTime: 107.534ms - PeakMemoryUsage: 16.00 KB (16384) - ProbeRows: 1.02K (1024) - ProbeTime: 0.000ns - RowsReturned: 10.00M (10002025) - RowsReturnedRate: 749.58 K/sec - TotalTime: 13s337ms Testing: Added tests for JOIN_ROWS_PRODUCED_LIMIT Change-Id: Idbca7e053b61b4e31b066edcfb3b0398fa859d02 --- M be/src/runtime/coordinator-backend-state.cc M be/src/runtime/coordinator.h M be/src/runtime/fragment-instance-state.cc M be/src/runtime/fragment-instance-state.h M be/src/runtime/query-state.cc M be/src/service/impala-server.cc M be/src/service/query-options-test.cc M be/src/service/query-options.cc M be/src/service/query-options.h M common/protobuf/control_service.proto M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/generate_error_codes.py M testdata/workloads/functional-query/queries/QueryTest/query-resource-limits.test 14 files changed, 167 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/16706/6 -- To view, visit http://gerrit.cloudera.org:8080/16706 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Idbca7e053b61b4e31b066edcfb3b0398fa859d02 Gerrit-Change-Number: 16706 Gerrit-PatchSet: 6 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-10317: Add query option that limits huge joins at runtime
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/16706 ) Change subject: IMPALA-10317: Add query option that limits huge joins at runtime .. Patch Set 5: (3 comments) Looking good. Just couple of minor comments. http://gerrit.cloudera.org:8080/#/c/16706/5//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16706/5//COMMIT_MSG@9 PS5, Line 9: Join node You may want to mention why it is useful even though it may be somewhat obvious. Something like '..produced by a join node such that runaway join queries can be prevented.' http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/runtime/coordinator.h File be/src/runtime/coordinator.h: http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/runtime/coordinator.h@230 PS3, Line 230: /// Total num rows produced by each join node. The key is join node id. > The key has been changed to node id Good ! http://gerrit.cloudera.org:8080/#/c/16706/5/common/protobuf/control_service.proto File common/protobuf/control_service.proto: http://gerrit.cloudera.org:8080/#/c/16706/5/common/protobuf/control_service.proto@261 PS5, Line 261: map per_join_rows_produced = 16; Mark this as 'optional' field ? -- To view, visit http://gerrit.cloudera.org:8080/16706 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idbca7e053b61b4e31b066edcfb3b0398fa859d02 Gerrit-Change-Number: 16706 Gerrit-PatchSet: 5 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Mon, 21 Dec 2020 07:53:47 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10317: Add query option that limits huge joins at runtime
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16706 ) Change subject: IMPALA-10317: Add query option that limits huge joins at runtime .. Patch Set 5: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7882/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16706 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idbca7e053b61b4e31b066edcfb3b0398fa859d02 Gerrit-Change-Number: 16706 Gerrit-PatchSet: 5 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Sun, 20 Dec 2020 04:49:55 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10317: Add query option that limits huge joins at runtime
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16706 ) Change subject: IMPALA-10317: Add query option that limits huge joins at runtime .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7881/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16706 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idbca7e053b61b4e31b066edcfb3b0398fa859d02 Gerrit-Change-Number: 16706 Gerrit-PatchSet: 4 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Sun, 20 Dec 2020 04:33:24 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10317: Add query option that limits huge joins at runtime
Fucun Chu has posted comments on this change. ( http://gerrit.cloudera.org:8080/16706 ) Change subject: IMPALA-10317: Add query option that limits huge joins at runtime .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/16706/4/common/thrift/generate_error_codes.py File common/thrift/generate_error_codes.py: http://gerrit.cloudera.org:8080/#/c/16706/4/common/thrift/generate_error_codes.py@468 PS4, Line 468: > flake8: E501 line too long (95 > 90 characters) Done -- To view, visit http://gerrit.cloudera.org:8080/16706 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idbca7e053b61b4e31b066edcfb3b0398fa859d02 Gerrit-Change-Number: 16706 Gerrit-PatchSet: 5 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Sun, 20 Dec 2020 04:26:31 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10317: Add query option that limits huge joins at runtime
Fucun Chu has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/16706 ) Change subject: IMPALA-10317: Add query option that limits huge joins at runtime .. IMPALA-10317: Add query option that limits huge joins at runtime This patch adds support for limiting the rows produced by Join node. The limit is specified by a query option. Queries exceeding that limit get terminated. The checking runs periodically, so the actual rows produced may go somewhat over the limit. JOIN_ROWS_PRODUCED_LIMIT is exposed as an advanced query option. Rows produced Query profile is updated to include query wide and per backend metrics for RowsReturned. Example from " set JOIN_ROWS_PRODUCED_LIMIT = 1000; select count(*) from tpch_parquet.lineitem l1 cross join (select * from tpch_parquet.lineitem l2 limit 5) l3;": NESTED_LOOP_JOIN_NODE (id=2): - InactiveTotalTime: 107.534ms - PeakMemoryUsage: 16.00 KB (16384) - ProbeRows: 1.02K (1024) - ProbeTime: 0.000ns - RowsReturned: 10.00M (10002025) - RowsReturnedRate: 749.58 K/sec - TotalTime: 13s337ms Testing: Added tests for JOIN_ROWS_PRODUCED_LIMIT Change-Id: Idbca7e053b61b4e31b066edcfb3b0398fa859d02 --- M be/src/runtime/coordinator-backend-state.cc M be/src/runtime/coordinator.h M be/src/runtime/fragment-instance-state.cc M be/src/runtime/fragment-instance-state.h M be/src/runtime/query-state.cc M be/src/service/impala-server.cc M be/src/service/query-options-test.cc M be/src/service/query-options.cc M be/src/service/query-options.h M common/protobuf/control_service.proto M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/generate_error_codes.py M testdata/workloads/functional-query/queries/QueryTest/query-resource-limits.test 14 files changed, 167 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/16706/5 -- To view, visit http://gerrit.cloudera.org:8080/16706 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Idbca7e053b61b4e31b066edcfb3b0398fa859d02 Gerrit-Change-Number: 16706 Gerrit-PatchSet: 5 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-10317: Add query option that limits huge joins at runtime
Fucun Chu has posted comments on this change. ( http://gerrit.cloudera.org:8080/16706 ) Change subject: IMPALA-10317: Add query option that limits huge joins at runtime .. Patch Set 4: (12 comments) Thanks for the review! Addressed the comments. http://gerrit.cloudera.org:8080/#/c/16706/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16706/3//COMMIT_MSG@11 PS3, Line 11: exceed > nit: 'exceeding' instead of exceed Done http://gerrit.cloudera.org:8080/#/c/16706/3//COMMIT_MSG@18 PS3, Line 18: backend metrics for RowsReturned. Example from " > For this example, can you also add the query option ? I assume it was set t Done http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/runtime/coordinator.h File be/src/runtime/coordinator.h: http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/runtime/coordinator.h@230 PS3, Line 230: /// Total num rows produced by each join node. The key is join node id. > Pls indicate what the key of the map is. I am also wondering whether the st The key has been changed to node id http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/runtime/coordinator.h@231 PS3, Line 231: join_rows_produced; > This could more simply be named 'per_join_rows_produced' (see similar comm Done http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/runtime/fragment-instance-state.cc File be/src/runtime/fragment-instance-state.cc: http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/runtime/fragment-instance-state.cc@331 PS3, Line 331: // row count stats for a join node > nit: this comment is a bit confusing. Suggest rephrasing to 'row count sta Done http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/runtime/fragment-instance-state.cc@335 PS3, Line 335: rfind(nested_loop_type > A bit odd that for hash_type you are passing a second argument of 0 while n To implement the startsWith function, the two variable judgments are the same. Done http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/runtime/fragment-instance-state.cc@336 PS3, Line 336: per_join_rows_produced[node->metadata().plan_node_id] = rows_counter->value(); > Is there not a unique numeric id that can be used here instead of the node Done http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/service/impala-server.cc File be/src/service/impala-server.cc: http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/service/impala-server.cc@2688 PS3, Line 2688: Status err = Status::Expected(TErrorCode::JOIN_ROWS_PRODUCED_LIMIT_EXCEEDED, > it would be useful to know which join node (with the node id) in a complex Done http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/service/query-options.cc File be/src/service/query-options.cc: http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/service/query-options.cc@991 PS3, Line 991: case TImpalaQueryOptions::USE_DOP_FOR_COSTING: { > I think the prefix 'NUM' can be removed and simplify the naming. JOIN_ROWS Done http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/service/query-options.cc@996 PS3, Line 996: StringParser::ParseResult result; > 'rows returned limit' doesn't match the name of the query option. Should th Done http://gerrit.cloudera.org:8080/#/c/16706/3/testdata/workloads/functional-query/queries/QueryTest/query-resource-limits.test File testdata/workloads/functional-query/queries/QueryTest/query-resource-limits.test: http://gerrit.cloudera.org:8080/#/c/16706/3/testdata/workloads/functional-query/queries/QueryTest/query-resource-limits.test@71 PS3, Line 71: QUERY > Suggest adding a test with a mix of joins ..i.e couple of hash joins and a Done http://gerrit.cloudera.org:8080/#/c/16706/3/testdata/workloads/functional-query/queries/QueryTest/query-resource-limits.test@87 PS3, Line 87: in > nit: remove trailing whitespace Done -- To view, visit http://gerrit.cloudera.org:8080/16706 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idbca7e053b61b4e31b066edcfb3b0398fa859d02 Gerrit-Change-Number: 16706 Gerrit-PatchSet: 4 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Sun, 20 Dec 2020 04:12:02 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10317: Add query option that limits huge joins at runtime
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16706 ) Change subject: IMPALA-10317: Add query option that limits huge joins at runtime .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/16706/4/common/thrift/generate_error_codes.py File common/thrift/generate_error_codes.py: http://gerrit.cloudera.org:8080/#/c/16706/4/common/thrift/generate_error_codes.py@468 PS4, Line 468: $ flake8: E501 line too long (95 > 90 characters) -- To view, visit http://gerrit.cloudera.org:8080/16706 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idbca7e053b61b4e31b066edcfb3b0398fa859d02 Gerrit-Change-Number: 16706 Gerrit-PatchSet: 4 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Sun, 20 Dec 2020 04:12:03 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10317: Add query option that limits huge joins at runtime
Fucun Chu has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/16706 ) Change subject: IMPALA-10317: Add query option that limits huge joins at runtime .. IMPALA-10317: Add query option that limits huge joins at runtime This patch adds support for limiting the rows produced by Join node. The limit is specified by a query option. Queries exceeding that limit get terminated. The checking runs periodically, so the actual rows produced may go somewhat over the limit. JOIN_ROWS_PRODUCED_LIMIT is exposed as an advanced query option. Rows produced Query profile is updated to include query wide and per backend metrics for RowsReturned. Example from " set JOIN_ROWS_PRODUCED_LIMIT = 1000; select count(*) from tpch_parquet.lineitem l1 cross join (select * from tpch_parquet.lineitem l2 limit 5) l3;": NESTED_LOOP_JOIN_NODE (id=2): - InactiveTotalTime: 107.534ms - PeakMemoryUsage: 16.00 KB (16384) - ProbeRows: 1.02K (1024) - ProbeTime: 0.000ns - RowsReturned: 10.00M (10002025) - RowsReturnedRate: 749.58 K/sec - TotalTime: 13s337ms Testing: Added tests for JOIN_ROWS_PRODUCED_LIMIT Change-Id: Idbca7e053b61b4e31b066edcfb3b0398fa859d02 --- M be/src/runtime/coordinator-backend-state.cc M be/src/runtime/coordinator.h M be/src/runtime/fragment-instance-state.cc M be/src/runtime/fragment-instance-state.h M be/src/runtime/query-state.cc M be/src/service/impala-server.cc M be/src/service/query-options-test.cc M be/src/service/query-options.cc M be/src/service/query-options.h M common/protobuf/control_service.proto M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/generate_error_codes.py M testdata/workloads/functional-query/queries/QueryTest/query-resource-limits.test 14 files changed, 166 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/16706/4 -- To view, visit http://gerrit.cloudera.org:8080/16706 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Idbca7e053b61b4e31b066edcfb3b0398fa859d02 Gerrit-Change-Number: 16706 Gerrit-PatchSet: 4 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-10317: Add query option that limits huge joins at runtime
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/16706 ) Change subject: IMPALA-10317: Add query option that limits huge joins at runtime .. Patch Set 3: (12 comments) http://gerrit.cloudera.org:8080/#/c/16706/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16706/3//COMMIT_MSG@11 PS3, Line 11: exceed nit: 'exceeding' instead of exceed http://gerrit.cloudera.org:8080/#/c/16706/3//COMMIT_MSG@18 PS3, Line 18: backend metrics for RowsReturned. Example from "select count(*) from For this example, can you also add the query option ? I assume it was set to 10M ? http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/runtime/coordinator.h File be/src/runtime/coordinator.h: http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/runtime/coordinator.h@230 PS3, Line 230: /// Total num rows produced by each join node Pls indicate what the key of the map is. I am also wondering whether the string (which is the node name) is the only way to identify the node. Is there no numeric id that you can use ? http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/runtime/coordinator.h@231 PS3, Line 231: per_num_join_rows_produced This could more simply be named 'per_join_rows_produced' (see similar comment elsewhere on the option name) http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/runtime/fragment-instance-state.cc File be/src/runtime/fragment-instance-state.cc: http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/runtime/fragment-instance-state.cc@331 PS3, Line 331: // stats join node nit: this comment is a bit confusing. Suggest rephrasing to 'row count stats for a join node' http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/runtime/fragment-instance-state.cc@335 PS3, Line 335: rfind(nested_loop_type A bit odd that for hash_type you are passing a second argument of 0 while not doing the same for nested_loop_type ? If there's a reason for that it would be good to add a comment. http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/runtime/fragment-instance-state.cc@336 PS3, Line 336: per_num_join_rows_produced[node->name()] = rows_counter->value(); Is there not a unique numeric id that can be used here instead of the node name ? (related comment in coordinator.h) http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/service/impala-server.cc File be/src/service/impala-server.cc: http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/service/impala-server.cc@2688 PS3, Line 2688: Status err = Status::Expected(TErrorCode::JOIN_ROWS_PRODUCED_LIMIT_EXCEEDED, it would be useful to know which join node (with the node id) in a complex plan exceeded this limit. Is it possible to print that info in the error message ? http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/service/query-options.cc File be/src/service/query-options.cc: http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/service/query-options.cc@991 PS3, Line 991: case TImpalaQueryOptions::NUM_JOIN_ROWS_PRODUCED_LIMIT: { I think the prefix 'NUM' can be removed and simplify the naming. JOIN_ROWS_PRODUCED_LIMIT gives sufficient information that it is about the number of rows. Similar simplification of the variable names can be done in other parts of the code. http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/service/query-options.cc@996 PS3, Line 996: return Status(Substitute("Invalid rows returned limit: '$0'. " 'rows returned limit' doesn't match the name of the query option. Should this be 'Invalid join rows produced limit'? http://gerrit.cloudera.org:8080/#/c/16706/3/testdata/workloads/functional-query/queries/QueryTest/query-resource-limits.test File testdata/workloads/functional-query/queries/QueryTest/query-resource-limits.test: http://gerrit.cloudera.org:8080/#/c/16706/3/testdata/workloads/functional-query/queries/QueryTest/query-resource-limits.test@71 PS3, Line 71: QUERY Suggest adding a test with a mix of joins ..i.e couple of hash joins and a nested loop join where one or more of these might exceed the threshold. http://gerrit.cloudera.org:8080/#/c/16706/3/testdata/workloads/functional-query/queries/QueryTest/query-resource-limits.test@87 PS3, Line 87: in nit: remove trailing whitespace -- To view, visit http://gerrit.cloudera.org:8080/16706 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idbca7e053b61b4e31b066edcfb3b0398fa859d02 Gerrit-Change-Number: 16706 Gerrit-PatchSet: 3 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 16 Dec 2020 08:11:26 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10317: Add query option that limits huge joins at runtime
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16706 ) Change subject: IMPALA-10317: Add query option that limits huge joins at runtime .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7747/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16706 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idbca7e053b61b4e31b066edcfb3b0398fa859d02 Gerrit-Change-Number: 16706 Gerrit-PatchSet: 3 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Mon, 30 Nov 2020 01:43:14 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10317: Add query option that limits huge joins at runtime
Fucun Chu has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/16706 ) Change subject: IMPALA-10317: Add query option that limits huge joins at runtime .. IMPALA-10317: Add query option that limits huge joins at runtime This patch adds support for limiting the rows produced by Join node. The limit is specified by a query option. Queries exceed that limit get terminated. The checking runs periodically, so the actual rows produced may go somewhat over the limit. NUM_JOIN_ROWS_PRODUCED_LIMIT is exposed as an advanced query option. Rows produced Query profile is updated to include query wide and per backend metrics for RowsReturned. Example from "select count(*) from tpch_parquet.lineitem l1 cross join (select * from tpch_parquet.lineitem l2 limit 5) l3;": NESTED_LOOP_JOIN_NODE (id=2): - InactiveTotalTime: 107.534ms - PeakMemoryUsage: 16.00 KB (16384) - ProbeRows: 1.02K (1024) - ProbeTime: 0.000ns - RowsReturned: 10.00M (10002025) - RowsReturnedRate: 749.58 K/sec - TotalTime: 13s337ms Testing: Added tests for NUM_JOIN_ROWS_PRODUCED_LIMIT Change-Id: Idbca7e053b61b4e31b066edcfb3b0398fa859d02 --- M be/src/runtime/coordinator-backend-state.cc M be/src/runtime/coordinator.h M be/src/runtime/fragment-instance-state.cc M be/src/runtime/fragment-instance-state.h M be/src/runtime/query-state.cc M be/src/service/impala-server.cc M be/src/service/query-options.cc M be/src/service/query-options.h M common/protobuf/control_service.proto M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/generate_error_codes.py M testdata/workloads/functional-query/queries/QueryTest/query-resource-limits.test 13 files changed, 125 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/16706/3 -- To view, visit http://gerrit.cloudera.org:8080/16706 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Idbca7e053b61b4e31b066edcfb3b0398fa859d02 Gerrit-Change-Number: 16706 Gerrit-PatchSet: 3 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins