[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16474 ) Change subject: IMPALA-10178 Run-time profile shall report skews .. Patch Set 45: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/16474 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 Gerrit-Change-Number: 16474 Gerrit-PatchSet: 45 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 20 Oct 2020 23:30:49 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/16474 ) Change subject: IMPALA-10178 Run-time profile shall report skews .. IMPALA-10178 Run-time profile shall report skews This fix addresses the current limitation in runtime profile that skews existing in certain operators such as the rows read counter (RowsRead) in the scan operators are not reported. A skew condition exists when the number of rows processed at each operator instance is not about the same and can be detected through coefficient of variation (CoV). A high CoV (say > 1.0) usually implies the existence of skew. With the fix, such skew is detected for the following counters 1. RowsRead in HDFS_SCAN_NODE and KUDU_SCAN_NODE 2. ProbeRows and BuildRows in HASH_JOIN_NODE 3. RowsReturned in GroupingAggregator, EXCHANGE and SORT_NODE and reported as follows: 1. In execution profile, a new skew summary that lists the names of the operators with skews; 2. In the averaged profile for the corresponding operator, the list of values of the counter across all fragment instances in the backend processes; 3. Skew detection formula: CoV > limit and mean > 5,000 4. A new query option 'report_skew_limit' < 0: disable skew reporting >= 0: enable skew reporting and supply the CoV limit Examples of skews reported for a hash join and an hdfs scan. In execution profile: ... ... skew(s) found at: HASH_JOIN_NODE (id=4), HDFS_SCAN_NODE (id=0) Per Node Peak Memory Usage: ... ... ... In averaged profiles: HDFS_SCAN_NODE (id=2): ... Skew details: RowsRead ([2004992,1724693,2001351], CoV=0.07, mean=1910345) Testing: 1. Added test_skew_reporting_in_runtime_profile in test_observability.py to verify that the skews are reported. 2. Ran Core tests successfully. Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 Reviewed-on: http://gerrit.cloudera.org:8080/16474 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M be/src/runtime/coordinator.cc M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/util/runtime-profile-counters.h M be/src/util/runtime-profile.cc M be/src/util/runtime-profile.h M be/src/util/stat-util.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M tests/query_test/test_hash_join_timer.py M tests/query_test/test_observability.py 11 files changed, 249 insertions(+), 12 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/16474 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 Gerrit-Change-Number: 16474 Gerrit-PatchSet: 46 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16474 ) Change subject: IMPALA-10178 Run-time profile shall report skews .. Patch Set 45: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/16474 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 Gerrit-Change-Number: 16474 Gerrit-PatchSet: 45 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 20 Oct 2020 18:13:31 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16474 ) Change subject: IMPALA-10178 Run-time profile shall report skews .. Patch Set 45: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6590/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/16474 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 Gerrit-Change-Number: 16474 Gerrit-PatchSet: 45 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 20 Oct 2020 18:13:32 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews
Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/16474 ) Change subject: IMPALA-10178 Run-time profile shall report skews .. Patch Set 44: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/16474 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 Gerrit-Change-Number: 16474 Gerrit-PatchSet: 44 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 20 Oct 2020 18:13:16 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16474 ) Change subject: IMPALA-10178 Run-time profile shall report skews .. Patch Set 44: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7490/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16474 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 Gerrit-Change-Number: 16474 Gerrit-PatchSet: 44 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 20 Oct 2020 03:40:47 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews
Qifan Chen has uploaded a new patch set (#44). ( http://gerrit.cloudera.org:8080/16474 ) Change subject: IMPALA-10178 Run-time profile shall report skews .. IMPALA-10178 Run-time profile shall report skews This fix addresses the current limitation in runtime profile that skews existing in certain operators such as the rows read counter (RowsRead) in the scan operators are not reported. A skew condition exists when the number of rows processed at each operator instance is not about the same and can be detected through coefficient of variation (CoV). A high CoV (say > 1.0) usually implies the existence of skew. With the fix, such skew is detected for the following counters 1. RowsRead in HDFS_SCAN_NODE and KUDU_SCAN_NODE 2. ProbeRows and BuildRows in HASH_JOIN_NODE 3. RowsReturned in GroupingAggregator, EXCHANGE and SORT_NODE and reported as follows: 1. In execution profile, a new skew summary that lists the names of the operators with skews; 2. In the averaged profile for the corresponding operator, the list of values of the counter across all fragment instances in the backend processes; 3. Skew detection formula: CoV > limit and mean > 5,000 4. A new query option 'report_skew_limit' < 0: disable skew reporting >= 0: enable skew reporting and supply the CoV limit Examples of skews reported for a hash join and an hdfs scan. In execution profile: ... ... skew(s) found at: HASH_JOIN_NODE (id=4), HDFS_SCAN_NODE (id=0) Per Node Peak Memory Usage: ... ... ... In averaged profiles: HDFS_SCAN_NODE (id=2): ... Skew details: RowsRead ([2004992,1724693,2001351], CoV=0.07, mean=1910345) Testing: 1. Added test_skew_reporting_in_runtime_profile in test_observability.py to verify that the skews are reported. 2. Ran Core tests successfully. Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 --- M be/src/runtime/coordinator.cc M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/util/runtime-profile-counters.h M be/src/util/runtime-profile.cc M be/src/util/runtime-profile.h M be/src/util/stat-util.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M tests/query_test/test_hash_join_timer.py M tests/query_test/test_observability.py 11 files changed, 249 insertions(+), 12 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/16474/44 -- To view, visit http://gerrit.cloudera.org:8080/16474 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 Gerrit-Change-Number: 16474 Gerrit-PatchSet: 44 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16474 ) Change subject: IMPALA-10178 Run-time profile shall report skews .. Patch Set 43: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/7487/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/16474 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 Gerrit-Change-Number: 16474 Gerrit-PatchSet: 43 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 19 Oct 2020 23:56:12 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews
Qifan Chen has uploaded a new patch set (#43). ( http://gerrit.cloudera.org:8080/16474 ) Change subject: IMPALA-10178 Run-time profile shall report skews .. IMPALA-10178 Run-time profile shall report skews This fix addresses the current limitation in runtime profile that skews existing in certain operators such as the rows read counter (RowsRead) in the scan operators are not reported. A skew condition exists when the number of rows processed at each operator instance is not about the same and can be detected through coefficient of variation (CoV). A high CoV (say > 1.0) usually implies the existence of skew. With the fix, such skew is detected for the following counters 1. RowsRead in HDFS_SCAN_NODE and KUDU_SCAN_NODE 2. ProbeRows and BuildRows in HASH_JOIN_NODE 3. RowsReturned in GroupingAggregator, EXCHANGE and SORT_NODE and reported as follows: 1. In execution profile, a new skew summary that lists the names of the operators with skews; 2. In the averaged profile for the corresponding operator, the list of values of the counter across all fragment instances in the backend processes; 3. Skew detection formula: CoV > limit and mean > 5,000 4. A new query option 'report_skew_limit' < 0: disable skew reporting >= 0: enable skew reporting and supply the CoV limit Examples of skews reported for a hash join and an hdfs scan. In execution profile: ... ... skew(s) found at: HASH_JOIN_NODE (id=4), HDFS_SCAN_NODE (id=0) Per Node Peak Memory Usage: ... ... ... In averaged profiles: HDFS_SCAN_NODE (id=2): ... Skew details: RowsRead ([2004992,1724693,2001351], CoV=0.07, mean=1910345) Testing: 1. Added test_skew_reporting_in_runtime_profile in test_observability.py to verify that the skews are reported. 2. Ran Core tests successfully. Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 --- M be/src/runtime/coordinator.cc M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/util/runtime-profile-counters.h M be/src/util/runtime-profile.cc M be/src/util/runtime-profile.h M be/src/util/stat-util.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M tests/query_test/test_hash_join_timer.py M tests/query_test/test_observability.py 11 files changed, 249 insertions(+), 12 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/16474/43 -- To view, visit http://gerrit.cloudera.org:8080/16474 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 Gerrit-Change-Number: 16474 Gerrit-PatchSet: 43 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews
Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/16474 ) Change subject: IMPALA-10178 Run-time profile shall report skews .. Patch Set 42: Code-Review+2 (1 comment) minor nit http://gerrit.cloudera.org:8080/#/c/16474/42/be/src/util/runtime-profile-counters.h File be/src/util/runtime-profile-counters.h: http://gerrit.cloudera.org:8080/#/c/16474/42/be/src/util/runtime-profile-counters.h@498 PS42, Line 498: const int ROW_AVERAGE_LIMIT=5000; should be static + has formatting issues. should be something like: static const int ROW_AVERAGE_LIMIT = 5000; -- To view, visit http://gerrit.cloudera.org:8080/16474 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 Gerrit-Change-Number: 16474 Gerrit-PatchSet: 42 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 19 Oct 2020 18:21:37 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16474 ) Change subject: IMPALA-10178 Run-time profile shall report skews .. Patch Set 42: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7440/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16474 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 Gerrit-Change-Number: 16474 Gerrit-PatchSet: 42 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 13 Oct 2020 16:07:11 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews
Qifan Chen has uploaded a new patch set (#42). ( http://gerrit.cloudera.org:8080/16474 ) Change subject: IMPALA-10178 Run-time profile shall report skews .. IMPALA-10178 Run-time profile shall report skews This fix addresses the current limitation in runtime profile that skews existing in certain operators such as the rows read counter (RowsRead) in the scan operators are not reported. A skew condition exists when the number of rows processed at each operator instance is not about the same and can be detected through coefficient of variation (CoV). A high CoV (say > 1.0) usually implies the existence of skew. With the fix, such skew is detected for the following counters 1. RowsRead in HDFS_SCAN_NODE and KUDU_SCAN_NODE 2. ProbeRows and BuildRows in HASH_JOIN_NODE 3. RowsReturned in GroupingAggregator, EXCHANGE and SORT_NODE and reported as follows: 1. In execution profile, a new skew summary that lists the names of the operators with skews; 2. In the averaged profile for the corresponding operator, the list of values of the counter across all fragment instances in the backend processes; 3. Skew detection formula: CoV > limit and mean > 5,000 4. A new query option 'report_skew_limit' < 0: disable skew reporting >= 0: enable skew reporting and supply the CoV limit Examples of skews reported for a hash join and an hdfs scan. In execution profile: ... ... skew(s) found at: HASH_JOIN_NODE (id=4), HDFS_SCAN_NODE (id=0) Per Node Peak Memory Usage: ... ... ... In averaged profiles: HDFS_SCAN_NODE (id=2): ... Skew details: RowsRead ([2004992,1724693,2001351], CoV=0.07, mean=1910345) Testing: 1. Added test_skew_reporting_in_runtime_profile in test_observability.py to verify that the skews are reported. 2. Ran Core tests successfully. Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 --- M be/src/runtime/coordinator.cc M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/util/runtime-profile-counters.h M be/src/util/runtime-profile.cc M be/src/util/runtime-profile.h M be/src/util/stat-util.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M tests/query_test/test_hash_join_timer.py M tests/query_test/test_observability.py 11 files changed, 249 insertions(+), 12 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/16474/42 -- To view, visit http://gerrit.cloudera.org:8080/16474 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 Gerrit-Change-Number: 16474 Gerrit-PatchSet: 42 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16474 ) Change subject: IMPALA-10178 Run-time profile shall report skews .. Patch Set 41: Code-Review+1 (2 comments) I had a couple of comments in addition to Sahil's. http://gerrit.cloudera.org:8080/#/c/16474/41/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: http://gerrit.cloudera.org:8080/#/c/16474/41/be/src/runtime/coordinator.cc@1222 PS41, Line 1222: float This is a double in the thrift definition http://gerrit.cloudera.org:8080/#/c/16474/41/be/src/util/runtime-profile.cc File be/src/util/runtime-profile.cc: http://gerrit.cloudera.org:8080/#/c/16474/41/be/src/util/runtime-profile.cc@1926 PS41, Line 1926: 5000 Can you make this a named constant? -- To view, visit http://gerrit.cloudera.org:8080/16474 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 Gerrit-Change-Number: 16474 Gerrit-PatchSet: 41 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 12 Oct 2020 18:22:08 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews
Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/16474 ) Change subject: IMPALA-10178 Run-time profile shall report skews .. Patch Set 41: Code-Review+1 (6 comments) mostly nits, otherwise approach LGTM. @Tim if you want to take another look as well. I think the patch has changed significantly over the past few weeks. http://gerrit.cloudera.org:8080/#/c/16474/41/be/src/service/query-options.cc File be/src/service/query-options.cc: http://gerrit.cloudera.org:8080/#/c/16474/41/be/src/service/query-options.cc@998 PS41, Line 998: } : if (set_query_options_mask != NULL) { : DCHECK_LT(option, set_query_options_mask->size()); : set_query_options_mask->set(option); : } unnecessary change http://gerrit.cloudera.org:8080/#/c/16474/41/be/src/util/runtime-profile-counters.h File be/src/util/runtime-profile-counters.h: http://gerrit.cloudera.org:8080/#/c/16474/41/be/src/util/runtime-profile-counters.h@488 PS41, Line 488: bool EvaluateSkewWithCoV(double threshold, std::stringstream* details); nit: document return value http://gerrit.cloudera.org:8080/#/c/16474/41/be/src/util/runtime-profile.h File be/src/util/runtime-profile.h: http://gerrit.cloudera.org:8080/#/c/16474/41/be/src/util/runtime-profile.h@203 PS41, Line 203: recurssively nit: typo http://gerrit.cloudera.org:8080/#/c/16474/41/be/src/util/runtime-profile.cc File be/src/util/runtime-profile.cc: http://gerrit.cloudera.org:8080/#/c/16474/41/be/src/util/runtime-profile.cc@675 PS41, Line 675: Each spec is a pair of a profile name prefix and a list of : // counter names. would be nice to mention that all counters *have* to be averaged counters http://gerrit.cloudera.org:8080/#/c/16474/41/be/src/util/runtime-profile.cc@677 PS41, Line 677: static const unordered_map> skew_profile_specs = { : {"KUDU_SCAN_NODE", {"RowsRead"}}, {"HDFS_SCAN_NODE", {"RowsRead"}}, : {"HASH_JOIN_NODE", {"ProbeRows", "BuildRows"}}, : {"GroupingAggregator", {"RowsReturned"}}, {"EXCHANGE_NODE", {"RowsReturned"}}, : {"SORT_NODE", {"RowsReturned"}}}; nit: would be nice to define a struct that encapsulates the entries in the map. something like struct SkewProfileSpec { string node_name; vector counter_names; }; http://gerrit.cloudera.org:8080/#/c/16474/41/be/src/util/runtime-profile.cc@1960 PS41, Line 1960: if (remove_last_comma) { : ss.seekp(-1, std::ios_base::end); : } you can probably simplify the string manipulation logic by using something like boost::algorithm::join - https://stackoverflow.com/questions/1833447/a-good-example-for-boostalgorithmjoin -- To view, visit http://gerrit.cloudera.org:8080/16474 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 Gerrit-Change-Number: 16474 Gerrit-PatchSet: 41 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 12 Oct 2020 17:42:07 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16474 ) Change subject: IMPALA-10178 Run-time profile shall report skews .. Patch Set 41: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7419/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16474 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 Gerrit-Change-Number: 16474 Gerrit-PatchSet: 41 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sun, 11 Oct 2020 14:55:55 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews
Qifan Chen has uploaded a new patch set (#41). ( http://gerrit.cloudera.org:8080/16474 ) Change subject: IMPALA-10178 Run-time profile shall report skews .. IMPALA-10178 Run-time profile shall report skews This fix addresses the current limitation in runtime profile that skews existing in certain operators such as the rows read counter (RowsRead) in the scan operators are not reported. A skew condition exists when the number of rows processed at each operator instance is not about the same and can be detected through coefficient of variation (CoV). A high CoV (say > 1.0) usually implies the existence of skew. With the fix, such skew is detected for the following counters 1. RowsRead in HDFS_SCAN_NODE and KUDU_SCAN_NODE 2. ProbeRows and BuildRows in HASH_JOIN_NODE 3. RowsReturned in GroupingAggregator, EXCHANGE and SORT_NODE and reported as follows: 1. In execution profile, a new skew summary that lists the names of the operators with skews; 2. In the averaged profile for the corresponding operator, the list of values of the counter across all fragment instances in the backend processes; 3. Skew detection formula: CoV > limit and mean > 5,000 4. A new query option 'report_skew_limit' < 0: disable skew reporting >= 0: enable skew reporting and supply the CoV limit Examples of skews reported for a hash join and an hdfs scan. In execution profile: ... ... skew(s) found at: HASH_JOIN_NODE (id=4), HDFS_SCAN_NODE (id=0) Per Node Peak Memory Usage: ... ... ... In averaged profiles: HDFS_SCAN_NODE (id=2): ... Skew details: RowsRead ([2004992,1724693,2001351], stddev/mean=0.07, mean=1910345) Testing: 1. Added test_skew_reporting_in_runtime_profile in test_observability.py to verify that the skews are reported. 2. Ran Core tests successfully. Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 --- M be/src/runtime/coordinator.cc M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/util/runtime-profile-counters.h M be/src/util/runtime-profile.cc M be/src/util/runtime-profile.h M be/src/util/stat-util.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M tests/query_test/test_hash_join_timer.py M tests/query_test/test_observability.py 11 files changed, 243 insertions(+), 17 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/16474/41 -- To view, visit http://gerrit.cloudera.org:8080/16474 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 Gerrit-Change-Number: 16474 Gerrit-PatchSet: 41 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16474 ) Change subject: IMPALA-10178 Run-time profile shall report skews .. Patch Set 40: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/7418/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/16474 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 Gerrit-Change-Number: 16474 Gerrit-PatchSet: 40 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sun, 11 Oct 2020 14:29:09 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews
Qifan Chen has uploaded a new patch set (#40). ( http://gerrit.cloudera.org:8080/16474 ) Change subject: IMPALA-10178 Run-time profile shall report skews .. IMPALA-10178 Run-time profile shall report skews This fix addresses the current limitation in runtime profile that skews existing in certain operators such as the rows read counter (RowsRead) in the scan operators are not reported. A skew condition exists when the number of rows processed at each operator instance is not about the same and can be detected through coefficient of variation (CoV). A high CoV (say > 1.0) usually implies the existence of skew. With the fix, such skew is detected for the following counters 1. RowsRead in HDFS_SCAN_NODE and KUDU_SCAN_NODE 2. ProbeRows and BuildRows in HASH_JOIN_NODE 3. RowsReturned in GroupingAggregator, EXCHANGE and SORT_NODE and reported as follows: 1. In execution profile, a new skew summary that lists the names of the operators with skews; 2. In the averaged profile for the corresponding operator, the list of values of the counter across all fragment instances in the backend processes; 3. Skew detection formula: CoV > limit and mean > 5,000 4. A new query option 'report_skew_limit' < 0: disable skew reporting >= 0: enable skew reporting and supply the CoV limit Examples of skews reported for a hash join and an hdfs scan. In execution profile: ... ... skew(s) found at: HASH_JOIN_NODE (id=4), HDFS_SCAN_NODE (id=0) Per Node Peak Memory Usage: ... ... ... In averaged profiles: HDFS_SCAN_NODE (id=2): ... Skew details: RowsRead ([2004992,1724693,2001351], stddev/mean=0.07, mean=1910345) Testing: 1. Added test_skew_reporting_in_runtime_profile in test_observability.py to verify that the skews are reported. 2. Ran Core tests successfully. Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 --- M be/src/runtime/coordinator.cc M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/util/runtime-profile-counters.h M be/src/util/runtime-profile.cc M be/src/util/runtime-profile.h M be/src/util/stat-util.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M tests/query_test/test_hash_join_timer.py M tests/query_test/test_observability.py 11 files changed, 241 insertions(+), 17 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/16474/40 -- To view, visit http://gerrit.cloudera.org:8080/16474 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 Gerrit-Change-Number: 16474 Gerrit-PatchSet: 40 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16474 ) Change subject: IMPALA-10178 Run-time profile shall report skews .. Patch Set 38: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7407/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16474 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 Gerrit-Change-Number: 16474 Gerrit-PatchSet: 38 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 10 Oct 2020 02:08:53 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews
Qifan Chen has uploaded a new patch set (#38). ( http://gerrit.cloudera.org:8080/16474 ) Change subject: IMPALA-10178 Run-time profile shall report skews .. IMPALA-10178 Run-time profile shall report skews This fix addresses the current limitation in runtime profile that skews existing in certain operators such as the rows read counter (RowsRead) in the scan operators are not reported. A skew condition exists when the number of rows processed at each operator instance is not about the same and can be detected through coefficient of variation (CoV). A high CoV (say > 1.0) usually implies the existence of skew. With the fix, such skew is detected for the following counters 1. RowsRead in HDFS_SCAN_NODE and KUDU_SCAN_NODE 2. ProbeRows and BuildRows in HASH_JOIN_NODE 3. RowsReturned in GroupingAggregator, EXCHANGE and SORT_NODE and reported as follows: 1. In execution profile, a new skew summary that lists the names of the operators with skews; 2. In the averaged profile for the corresponding operator, the list of values of the counter across all fragment instances in the backend processes; 3. Skew detection formula: CoV > limit and mean > 5,000 4. A new query option 'report_skew_limit' < 0: disable skew reporting >= 0: enable skew reporting and supply the CoV limit Examples of skews reported for a hash join and an hdfs scan. In execution profile: ... ... skew(s) found at: HASH_JOIN_NODE (id=4), HDFS_SCAN_NODE (id=0) Per Node Peak Memory Usage: ... ... ... In averaged profiles: HDFS_SCAN_NODE (id=2): ... Skew details: RowsRead ([2004992,1724693,2001351], stddev/mean=0.07, mean=1910345) Testing: 1. Added test_skew_reporting_in_runtime_profile in test_observability.py to verify that the skews are reported. 2. Ran Core tests successfully. Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 --- M be/src/runtime/coordinator.cc M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/util/runtime-profile-counters.h M be/src/util/runtime-profile.cc M be/src/util/runtime-profile.h M be/src/util/stat-util.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M tests/query_test/test_hash_join_timer.py M tests/query_test/test_observability.py 11 files changed, 243 insertions(+), 17 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/16474/38 -- To view, visit http://gerrit.cloudera.org:8080/16474 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 Gerrit-Change-Number: 16474 Gerrit-PatchSet: 38 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16474 ) Change subject: IMPALA-10178 Run-time profile shall report skews .. Patch Set 37: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/7405/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/16474 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 Gerrit-Change-Number: 16474 Gerrit-PatchSet: 37 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 10 Oct 2020 00:48:43 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews
Qifan Chen has uploaded a new patch set (#37). ( http://gerrit.cloudera.org:8080/16474 ) Change subject: IMPALA-10178 Run-time profile shall report skews .. IMPALA-10178 Run-time profile shall report skews This fix addresses the current limitation in runtime profile that skews existing in certain operators such as the rows read counter (RowsRead) in the scan operators are not reported. A skew condition exists when the number of rows processed at each operator instance is not about the same and can be detected through coefficient of variation (CoV). A high CoV (say > 1.0) usually implies the existence of skew. With the fix, such skew is detected for the following counters 1. RowsRead in HDFS_SCAN_NODE and KUDU_SCAN_NODE 2. ProbeRows and BuildRows in HASH_JOIN_NODE 3. RowsReturned in GroupingAggregator, EXCHANGE and SORT_NODE and reported as follows: 1. In execution profile, a new skew summary that lists the names of the operators with skews; 2. In the averaged profile for the corresponding operator, the list of values of the counter across all fragment instances in the backend processes; 3. Skew detection formula: CoV > limit and mean > 5,000 4. A new query option 'report_skew_limit' < 0: disable skew reporting >= 0: enable skew reporting and supply the CoV limit Examples of skews reported for a hash join and an hdfs scan. In execution profile: ... ... skew(s) found at: HASH_JOIN_NODE (id=4), HDFS_SCAN_NODE (id=0) Per Node Peak Memory Usage: ... ... ... In averaged profiles: HDFS_SCAN_NODE (id=2): ... Skew details: RowsRead ([2004992,1724693,2001351], stddev/mean=0.07, mean=1910345) Testing: 1. Added test_skew_reporting_in_runtime_profile in test_observability.py to verify that the skews are reported. 2. Ran Core tests successfully. Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 --- M be/src/runtime/coordinator.cc M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/util/runtime-profile-counters.h M be/src/util/runtime-profile.cc M be/src/util/runtime-profile.h M be/src/util/stat-util.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M tests/query_test/test_hash_join_timer.py M tests/query_test/test_observability.py 11 files changed, 241 insertions(+), 17 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/16474/37 -- To view, visit http://gerrit.cloudera.org:8080/16474 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 Gerrit-Change-Number: 16474 Gerrit-PatchSet: 37 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16474 ) Change subject: IMPALA-10178 Run-time profile shall report skews .. Patch Set 35: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7346/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16474 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 Gerrit-Change-Number: 16474 Gerrit-PatchSet: 35 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 02 Oct 2020 18:36:11 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews
Qifan Chen has uploaded a new patch set (#35). ( http://gerrit.cloudera.org:8080/16474 ) Change subject: IMPALA-10178 Run-time profile shall report skews .. IMPALA-10178 Run-time profile shall report skews This fix addresses the current limitation in runtime profile that skews existing in certain operators such as the rows read counter (RowsRead) in the scan operators are not reported. A skew condition exists when the number of rows processed at each operator instance is not about the same and can be detected through coefficient of variation (CoV). A high CoV (say > 1.0) usually implies the existence of skew. With the fix, such skew is detected for the following counters 1. RowsRead in HDFS_SCAN_NODE and KUDU_SCAN_NODE 2. ProbeRows and BuildRows in HASH_JOIN_NODE 3. RowsReturned in GroupingAggregator, EXCHANGE and SORT_NODE and reported as follows: 1. In execution profile, a new skew summary that lists the names of the operators with skews; 2. In the averaged profile for the corresponding operator, the list of values of the counter across all fragment instances in the backend processes; 3. Skew detection formula: CoV > limit and mean > 5,000 4. A new query option 'report_skew_limit' < 0: disable skew reporting >= 0: enable skew reporting and supply the CoV limit Examples of skews reported for a hash join and an hdfs scan. In execution profile: ... ... skew(s) found at: HASH_JOIN_NODE (id=4), HDFS_SCAN_NODE (id=0) Per Node Peak Memory Usage: ... ... ... In averaged profiles: HDFS_SCAN_NODE (id=2): ... Skew details: RowsRead ([2004992,1724693,2001351], stddev/mean=0.07, mean=1910345) Testing: 1. Added test_skew_reporting_in_runtime_profile in test_observability.py to verify that the skews are reported. 2. Ran Core tests successfully. Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 --- M be/src/runtime/coordinator.cc M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/util/runtime-profile-counters.h M be/src/util/runtime-profile.cc M be/src/util/runtime-profile.h M be/src/util/stat-util.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M tests/query_test/test_hash_join_timer.py M tests/query_test/test_observability.py 11 files changed, 243 insertions(+), 17 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/16474/35 -- To view, visit http://gerrit.cloudera.org:8080/16474 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 Gerrit-Change-Number: 16474 Gerrit-PatchSet: 35 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews
Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/16474 ) Change subject: IMPALA-10178 Run-time profile shall report skews .. Patch Set 31: (1 comment) http://gerrit.cloudera.org:8080/#/c/16474/31/be/src/util/runtime-profile.cc File be/src/util/runtime-profile.cc: http://gerrit.cloudera.org:8080/#/c/16474/31/be/src/util/runtime-profile.cc@1949 PS31, Line 1949: if (stddev > 5) { > Oh, that is very interesting result. Can you send me the data? If its useful for future work, why not add it later on? Or at least don't expose it in the runtime profile. my concern is that when aggressive mode is enabled, most queries will report skew, and then customers will start complaining asking why their queries are skewed, even though the problem is actually not that serious, or that is just expected behavior. -- To view, visit http://gerrit.cloudera.org:8080/16474 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 Gerrit-Change-Number: 16474 Gerrit-PatchSet: 31 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 01 Oct 2020 18:02:58 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/16474 ) Change subject: IMPALA-10178 Run-time profile shall report skews .. Patch Set 34: (1 comment) http://gerrit.cloudera.org:8080/#/c/16474/31/be/src/util/runtime-profile.cc File be/src/util/runtime-profile.cc: http://gerrit.cloudera.org:8080/#/c/16474/31/be/src/util/runtime-profile.cc@1949 PS31, Line 1949: template > Still not seeing the use of having an aggressive mode like this. I applied Oh, that is very interesting result. Can you send me the data? The purpose of the aggressive mode is to report all possible skews (excluding scans), to be useful for skew busting work in the future. -- To view, visit http://gerrit.cloudera.org:8080/16474 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 Gerrit-Change-Number: 16474 Gerrit-PatchSet: 34 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 01 Oct 2020 17:49:39 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews
Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/16474 ) Change subject: IMPALA-10178 Run-time profile shall report skews .. Patch Set 34: > Uploaded patch set 34: Patch Set 33 was rebased. Whoops, sorry ignore this. I was testing this locally and looks like I accidentally rebased it. -- To view, visit http://gerrit.cloudera.org:8080/16474 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 Gerrit-Change-Number: 16474 Gerrit-PatchSet: 34 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 01 Oct 2020 17:42:57 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews
Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/16474 ) Change subject: IMPALA-10178 Run-time profile shall report skews .. Patch Set 34: (1 comment) http://gerrit.cloudera.org:8080/#/c/16474/31/be/src/util/runtime-profile.cc File be/src/util/runtime-profile.cc: http://gerrit.cloudera.org:8080/#/c/16474/31/be/src/util/runtime-profile.cc@1949 PS31, Line 1949: template > Done. Scan nodes are excluded from aggressive reporting. Still not seeing the use of having an aggressive mode like this. I applied the most recent version of the patch locally and I tested this out a bit. I ran about 75 TPC-DS queries against the mini-cluster (so a 1 GB dataset) using Parquet. About 5 queries report skew with the default skew threshold (which seems reasonable), but 68 queries report skew in the aggressive mode. -- To view, visit http://gerrit.cloudera.org:8080/16474 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 Gerrit-Change-Number: 16474 Gerrit-PatchSet: 34 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 01 Oct 2020 17:42:36 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16474 ) Change subject: IMPALA-10178 Run-time profile shall report skews .. Patch Set 33: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7316/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16474 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 Gerrit-Change-Number: 16474 Gerrit-PatchSet: 33 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 30 Sep 2020 00:37:12 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16474 ) Change subject: IMPALA-10178 Run-time profile shall report skews .. Patch Set 32: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7315/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16474 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 Gerrit-Change-Number: 16474 Gerrit-PatchSet: 32 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 30 Sep 2020 00:33:37 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews
Qifan Chen has uploaded a new patch set (#33). ( http://gerrit.cloudera.org:8080/16474 ) Change subject: IMPALA-10178 Run-time profile shall report skews .. IMPALA-10178 Run-time profile shall report skews This fix addresses the current limitation in runtime profile that skews existing in certain operators such as the rows read counter (RowsRead) in the scan operators are not reported. A skew condition exists when the number of rows processed at each operator instance is not about the same and can be detected through standard deviation (stddev). A high stddev (say > 5) usually implies the existence of skew. With the fix, such skew is detected for the following counters 1. RowsRead in HDFS_SCAN_NODE and KUDU_SCAN_NODE 2. ProbeRows and BuildRows in HASH_JOIN_NODE 3. RowsReturned in GroupingAggregator, EXCHANGE and SORT_NODE and reported as follows: 1. In execution profile, a new skew summary that lists the names of the operators with skews; 2. In the averaged profile for the corresponding operator, the list of values of the counter across all fragment instances in the backend processes, and the skew detection formula; 3. Skew detection formula: a: stddev/mean > 0.05 and mean > 1,000,000, for conservative reporting b: stddev > 5, for aggressive reporting on non-scan nodes 4. A new query option 'report_skew' = -1, 0: disable skew reporting = 1: enable conservative skew reporting (default) = 2: enable aggressive skew reporting Examples of skews reported for a hash join and an hdfs scan. In execution profile: ... ... skew(s) found at: HASH_JOIN_NODE (id=4), HDFS_SCAN_NODE (id=0) Per Node Peak Memory Usage: ... Per Node Bytes Read: ... Per Node User Time: ... Per Node System Time: ... ... In averaged profiles(conservative reporting): HDFS_SCAN_NODE (id=2): ... Skew details: RowsRead ([2004992,1724693,2001351], stddev/mean=0.07, mean=1910345) In averaged profiles (aggressive reporting): HASH_JOIN_NODE (id=4): ... Skew details: ProbeRows ([16904,17750,19197], stddev=946.77) Testing: 1. Added test_aggressive_skew_reporting_in_runtime_profile and test_conservative_skew_reporting_in_runtime_profile in test_observability.py to verify that the skews are reported. 2. Ran Core tests successfully. Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 --- M be/src/runtime/coordinator.cc M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/util/runtime-profile-counters.h M be/src/util/runtime-profile.cc M be/src/util/runtime-profile.h M be/src/util/stat-util.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M tests/query_test/test_hash_join_timer.py M tests/query_test/test_observability.py 11 files changed, 296 insertions(+), 17 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/16474/33 -- To view, visit http://gerrit.cloudera.org:8080/16474 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 Gerrit-Change-Number: 16474 Gerrit-PatchSet: 33 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews
Qifan Chen has uploaded a new patch set (#32). ( http://gerrit.cloudera.org:8080/16474 ) Change subject: IMPALA-10178 Run-time profile shall report skews .. IMPALA-10178 Run-time profile shall report skews This fix addresses the current limitation in runtime profile that skews existing in certain operators such as the rows read counter (RowsRead) in the scan operators are not reported. A skew condition exists when the number of rows processed at each operator instance is not about the same and can be detected through standard deviation (stddev). A high stddev (say > 5) usually implies the existence of skew. With the fix, such skew is detected for the following counters 1. RowsRead in HDFS_SCAN_NODE and KUDU_SCAN_NODE 2. ProbeRows and BuildRows in HASH_JOIN_NODE 3. RowsReturned in GroupingAggregator, EXCHANGE and SORT_NODE and reported as follows: 1. In execution profile, a new skew summary that lists the names of the operators with skews; 2. In the averaged profile for the corresponding operator, the list of values of the counter across all fragment instances in the backend processes, and the skew detection formula; 3. Skew detection formula: a: stddev/mean > 0.05 and mean > 1,000,000, for conservative reporting b: stddev > 5, for aggressive reporting on non-scan nodes 4. A new query option 'report_skew' = -1, 0: disable skew reporting = 1: enable conservative skew reporting (default) = 2: enable aggressive skew reporting Examples of skews reported for a hash join and an hdfs scan. In execution profile: ... ... skew(s) found at: HASH_JOIN_NODE (id=4), HDFS_SCAN_NODE (id=0) Per Node Peak Memory Usage: ... Per Node Bytes Read: ... Per Node User Time: ... Per Node System Time: ... ... In averaged profiles(conservative reporting): HDFS_SCAN_NODE (id=2): ... Skew details: RowsRead ([2004992,1724693,2001351], stddev/mean=0.07, mean=1910345) In averaged profiles (aggressive reporting): HASH_JOIN_NODE (id=4): ... Skew details: ProbeRows ([16904,17750,19197], stddev=946.77) Testing: 1. Added test_aggressive_skew_reporting_in_runtime_profile and test_conservative_skew_reporting_in_runtime_profile in test_observability.py to verify that the skews are reported. 2. Ran Core tests successfully. Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 --- M be/src/runtime/coordinator.cc M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/util/runtime-profile-counters.h M be/src/util/runtime-profile.cc M be/src/util/runtime-profile.h M be/src/util/stat-util.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M tests/query_test/test_hash_join_timer.py M tests/query_test/test_observability.py 11 files changed, 296 insertions(+), 17 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/16474/32 -- To view, visit http://gerrit.cloudera.org:8080/16474 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 Gerrit-Change-Number: 16474 Gerrit-PatchSet: 32 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews
Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/16474 ) Change subject: IMPALA-10178 Run-time profile shall report skews .. Patch Set 31: (2 comments) http://gerrit.cloudera.org:8080/#/c/16474/21/be/src/util/runtime-profile.cc File be/src/util/runtime-profile.cc: http://gerrit.cloudera.org:8080/#/c/16474/21/be/src/util/runtime-profile.cc@1928 PS21, Line 1928: double stddev = 0.0; > To report severe skews only for impala, maybe we can use CV (instead of > stddev) as a threshold. Say cv > 5% && mean over 1 million. Makes sense to me. http://gerrit.cloudera.org:8080/#/c/16474/31/be/src/util/runtime-profile.cc File be/src/util/runtime-profile.cc: http://gerrit.cloudera.org:8080/#/c/16474/31/be/src/util/runtime-profile.cc@1949 PS31, Line 1949: if (stddev > 5) { I'm still not sure how useful this is. Even if it is moved to an "aggressive" option. I can see it still leading to a lot of false positives. Any chance that when writing this you actually meant to calculate the z-score (the number of standard deviations by which a value is above or below the mean). I've seen references where outlier detection algorithms check if the z-score is greater than 3 (or in this case 5). -- To view, visit http://gerrit.cloudera.org:8080/16474 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 Gerrit-Change-Number: 16474 Gerrit-PatchSet: 31 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 29 Sep 2020 20:02:32 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16474 ) Change subject: IMPALA-10178 Run-time profile shall report skews .. Patch Set 31: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7300/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16474 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 Gerrit-Change-Number: 16474 Gerrit-PatchSet: 31 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 28 Sep 2020 17:00:02 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews
Qifan Chen has uploaded a new patch set (#31). ( http://gerrit.cloudera.org:8080/16474 ) Change subject: IMPALA-10178 Run-time profile shall report skews .. IMPALA-10178 Run-time profile shall report skews This fix addresses the current limitation in runtime profile that skews existing in certain operators such as the rows read counter (RowsRead) in the scan operators are not reported. A skew condition exists when the number of rows processed at each operator instance is not about the same and can be detected through standard deviation (stddev). A high stddev (say > 5) usually implies the existence of skew. With the fix, such skew is detected for the following counters 1. RowsRead in HDFS_SCAN_NODE and KUDU_SCAN_NODE 2. ProbeRows and BuildRows in HASH_JOIN_NODE 3. RowsReturned in GroupingAggregator, EXCHANGE and SORT_NODE and reported as follows: 1. In execution profile, a new skew summary that lists the names of the operators with skews; 2. In the averaged profile for the corresponding operator, the list of values of the counter across all fragment instances in the backend processes, and the skew detection formula; 3. Skew detection formula: a: stddev/mean > 0.05 and mean > 1,000,000, for conservative reporting b: stddev > 5, for aggressive reporting 4. A new query option 'report_skew' = -1, 0: disable skew reporting = 1: enable conservative skew reporting (default) = 2: enable aggressive skew reporting Examples of skews reported for a hash join and an hdfs scan. In execution profile: ... ... skew(s) found at: HASH_JOIN_NODE (id=4), HDFS_SCAN_NODE (id=0) Per Node Peak Memory Usage: ... Per Node Bytes Read: ... Per Node User Time: ... Per Node System Time: ... ... In averaged profiles(conservative reporting): HDFS_SCAN_NODE (id=2): ... Skew details: RowsRead ([2004992,1724693,2001351], stddev/mean=0.07, mean=1910345) In averaged profiles (aggressive reporting): HASH_JOIN_NODE (id=4): ... Skew details: ProbeRows ([16904,17750,19197], stddev=946.77) HDFS_SCAN_NODE (id=0): ... Skew details: RowsRead ([913887,917913,1048604], stddev=62578.85) Testing: 1. Added test_aggressive_skew_reporting_in_runtime_profile and test_conservative_skew_reporting_in_runtime_profile in test_observability.py to verify that the skews are reported. 2. Ran Core tests successfully. Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 --- M be/src/runtime/coordinator.cc M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/util/runtime-profile-counters.h M be/src/util/runtime-profile.cc M be/src/util/runtime-profile.h M be/src/util/stat-util.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M tests/query_test/test_hash_join_timer.py M tests/query_test/test_observability.py 11 files changed, 295 insertions(+), 17 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/16474/31 -- To view, visit http://gerrit.cloudera.org:8080/16474 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 Gerrit-Change-Number: 16474 Gerrit-PatchSet: 31 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16474 ) Change subject: IMPALA-10178 Run-time profile shall report skews .. Patch Set 30: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7297/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16474 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 Gerrit-Change-Number: 16474 Gerrit-PatchSet: 30 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 28 Sep 2020 14:27:01 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews
Qifan Chen has uploaded a new patch set (#30). ( http://gerrit.cloudera.org:8080/16474 ) Change subject: IMPALA-10178 Run-time profile shall report skews .. IMPALA-10178 Run-time profile shall report skews This fix addresses the current limitation in runtime profile that skews existing in certain operators such as the rows read counter (RowsRead) in the scan operators are not reported. A skew condition exists when the number of rows processed at each operator instance is not about the same and can be detected through standard deviation (stddev). A high stddev (say > 5) usually implies the existence of skew. With the fix, such skew is detected for the following counters 1. RowsRead in HDFS_SCAN_NODE and KUDU_SCAN_NODE 2. ProbeRows and BuildRows in HASH_JOIN_NODE 3. RowsReturned in GroupingAggregator, EXCHANGE and SORT_NODE and reported as follows: 1. In execution profile, a new skew summary that lists the names of the operators with skews; 2. In the averaged profile for the corresponding operator, the list of values of the counter across all fragment instances in the backend processes, and the skew detection formula; 3. Skew detection formula: a: stddev/mean > 0.05 and mean > 1,000,000, for conservative reporting b: stddev > 5, for aggressive reporting 4. A new query option 'report_skew' = -1, 0: disable skew reporting = 1: enable conservative skew reporting (default) = 2: enable aggressive skew reporting Examples of skews reported for a hash join and an hdfs scan. In execution profile: ... ... skew(s) found at: HASH_JOIN_NODE (id=4), HDFS_SCAN_NODE (id=0) Per Node Peak Memory Usage: ... Per Node Bytes Read: ... Per Node User Time: ... Per Node System Time: ... ... In averaged profiles(conservative reporting): HDFS_SCAN_NODE (id=2): ... Skew details: RowsRead ([2004992,1724693,2001351], stddev/mean=0.07, mean=1910345) In averaged profiles (aggressive reporting): HASH_JOIN_NODE (id=4): ... Skew details: ProbeRows ([16904,17750,19197], stddev=946.77) HDFS_SCAN_NODE (id=0): ... Skew details: RowsRead ([913887,917913,1048604], stddev=62578.85) Testing: 1. Added test_aggressive_skew_reporting_in_runtime_profile and test_conservative_skew_reporting_in_runtime_profile in test_observability.py to verify that the skews are reported. 2. Ran Core tests successfully. Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 --- M be/src/runtime/coordinator.cc M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/util/runtime-profile-counters.h M be/src/util/runtime-profile.cc M be/src/util/runtime-profile.h M be/src/util/stat-util.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M tests/query_test/test_hash_join_timer.py M tests/query_test/test_observability.py 11 files changed, 290 insertions(+), 17 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/16474/30 -- To view, visit http://gerrit.cloudera.org:8080/16474 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 Gerrit-Change-Number: 16474 Gerrit-PatchSet: 30 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16474 ) Change subject: IMPALA-10178 Run-time profile shall report skews .. Patch Set 29: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7293/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16474 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 Gerrit-Change-Number: 16474 Gerrit-PatchSet: 29 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sun, 27 Sep 2020 21:36:08 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16474 ) Change subject: IMPALA-10178 Run-time profile shall report skews .. Patch Set 28: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7292/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16474 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 Gerrit-Change-Number: 16474 Gerrit-PatchSet: 28 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sun, 27 Sep 2020 21:35:37 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16474 ) Change subject: IMPALA-10178 Run-time profile shall report skews .. Patch Set 27: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7291/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16474 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 Gerrit-Change-Number: 16474 Gerrit-PatchSet: 27 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sun, 27 Sep 2020 21:30:41 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16474 ) Change subject: IMPALA-10178 Run-time profile shall report skews .. Patch Set 26: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7290/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16474 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 Gerrit-Change-Number: 16474 Gerrit-PatchSet: 26 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sun, 27 Sep 2020 21:18:35 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16474 ) Change subject: IMPALA-10178 Run-time profile shall report skews .. Patch Set 29: (1 comment) http://gerrit.cloudera.org:8080/#/c/16474/29/be/src/util/runtime-profile-counters.h File be/src/util/runtime-profile-counters.h: http://gerrit.cloudera.org:8080/#/c/16474/29/be/src/util/runtime-profile-counters.h@416 PS29, Line 416: /// Input argument 'option': line has trailing whitespace -- To view, visit http://gerrit.cloudera.org:8080/16474 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 Gerrit-Change-Number: 16474 Gerrit-PatchSet: 29 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sun, 27 Sep 2020 21:16:05 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews
Qifan Chen has uploaded a new patch set (#29). ( http://gerrit.cloudera.org:8080/16474 ) Change subject: IMPALA-10178 Run-time profile shall report skews .. IMPALA-10178 Run-time profile shall report skews This fix addresses the current limitation in runtime profile that skews existing in certain operators such as the rows read counter (RowsRead) in the scan operators are not reported. A skew condition exists when the number of rows processed at each operator instance is not about the same and can be detected through standard deviation (stddev). A high stddev (say > 5) usually implies the existence of skew. With the fix, such skew is detected for the following counters 1. RowsRead in HDFS_SCAN_NODE and KUDU_SCAN_NODE 2. ProbeRows and BuildRows in HASH_JOIN_NODE 3. RowsReturned in GroupingAggregator, EXCHANGE and SORT_NODE and reported as follows: 1. In execution profile, a new skew summary that lists the names of the operators with skews; 2. In the averaged profile for the corresponding operator, the list of values of the counter across all fragment instances in the backend processes, and the skew detection formula; 3. Skew detection formula: a: stddev/mean > 0.05 and mean > 1,000,000, for conservative reporting b: stddev > 5, for aggressive reporting 4. A new query option 'report_skew' = -1, 0: disable skew reporting = 1: enable conservative skew reporting (default) = 2: enable aggressive skew reporting Examples of skews reported for a hash join and an hdfs scan. In execution profile: ... ... skew(s) found at: HASH_JOIN_NODE (id=4), HDFS_SCAN_NODE (id=0) Per Node Peak Memory Usage: ... Per Node Bytes Read: ... Per Node User Time: ... Per Node System Time: ... ... In averaged profiles(conservative reporting): HDFS_SCAN_NODE (id=2): ... Skew details: RowsRead ([2004992,1724693,2001351], stddev/mean=0.07, mean=1910345) In averaged profiles (aggressive reporting): HASH_JOIN_NODE (id=4): ... Skew details: ProbeRows ([16904,17750,19197], stddev=946.77) HDFS_SCAN_NODE (id=0): ... Skew details: RowsRead ([913887,917913,1048604], stddev=62578.85) Testing: 1. Added test_aggressive_skew_reporting_in_runtime_profile and test_conservative_skew_reporting_in_runtime_profile in test_observability.py to verify that the skews are reported. 2. Ran Core tests successfully. Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 --- M be/src/runtime/coordinator.cc M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/util/runtime-profile-counters.h M be/src/util/runtime-profile.cc M be/src/util/runtime-profile.h M be/src/util/stat-util.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M tests/query_test/test_hash_join_timer.py M tests/query_test/test_observability.py 11 files changed, 290 insertions(+), 17 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/16474/29 -- To view, visit http://gerrit.cloudera.org:8080/16474 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 Gerrit-Change-Number: 16474 Gerrit-PatchSet: 29 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16474 ) Change subject: IMPALA-10178 Run-time profile shall report skews .. Patch Set 28: (1 comment) http://gerrit.cloudera.org:8080/#/c/16474/28/be/src/util/runtime-profile-counters.h File be/src/util/runtime-profile-counters.h: http://gerrit.cloudera.org:8080/#/c/16474/28/be/src/util/runtime-profile-counters.h@416 PS28, Line 416: /// Input argument 'option': line has trailing whitespace -- To view, visit http://gerrit.cloudera.org:8080/16474 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 Gerrit-Change-Number: 16474 Gerrit-PatchSet: 28 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sun, 27 Sep 2020 21:14:39 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews
Qifan Chen has uploaded a new patch set (#28). ( http://gerrit.cloudera.org:8080/16474 ) Change subject: IMPALA-10178 Run-time profile shall report skews .. IMPALA-10178 Run-time profile shall report skews This fix addresses the current limitation in runtime profile that skews existing in certain operators such as the rows read counter (RowsRead) in the scan operators are not reported. A skew condition exists when the number of rows processed at each operator instance is not about the same and can be detected through standard deviation (stddev). A high stddev (say > 5) usually implies the existence of skew. With the fix, such skew is detected for the following counters 1. RowsRead in HDFS_SCAN_NODE and KUDU_SCAN_NODE 2. ProbeRows and BuildRows in HASH_JOIN_NODE 3. RowsReturned in GroupingAggregator, EXCHANGE and SORT_NODE and reported as follows: 1. In execution profile, a new skew summary that lists the names of the operators with skews; 2. In the averaged profile for the corresponding operator, the list of values of the counter across all fragment instances in the backend processes, and the skew detection formula; 3. Skew detection formula: a: stddev/mean > 0.05 and mean > 1,000,000, for conservative reporting b: stddev > 5, for aggressive reporting 4. A new query option 'report_skew' = -1, 0: disable skew reporting = 1: enable conservative skew reporting (default) = 2: enable aggressive skew reporting Examples of skews reported for a hash join and an hdfs scan. In execution profile: ... ... skew(s) found at: HASH_JOIN_NODE (id=4), HDFS_SCAN_NODE (id=0) Per Node Peak Memory Usage: ... Per Node Bytes Read: ... Per Node User Time: ... Per Node System Time: ... ... In averaged profiles(conservative reporting): HDFS_SCAN_NODE (id=2): ... Skew details: RowsRead ([2004992,1724693,2001351], stddev/mean=0.07, mean=1910345) In averaged profiles (aggressive reporting): HASH_JOIN_NODE (id=4): ... Skew details: ProbeRows ([16904,17750,19197], stddev=946.77) ... ... HDFS_SCAN_NODE (id=0): ... Skew details: RowsRead ([913887,917913,1048604], stddev=62578.85) Testing: 1. Added test_aggressive_skew_reporting_in_runtime_profile and test_conservative_skew_reporting_in_runtime_profile in test_observability.py to verify that the skews are reported. 2. Ran Core tests successfully. Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 --- M be/src/runtime/coordinator.cc M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/util/runtime-profile-counters.h M be/src/util/runtime-profile.cc M be/src/util/runtime-profile.h M be/src/util/stat-util.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M tests/query_test/test_hash_join_timer.py M tests/query_test/test_observability.py 11 files changed, 290 insertions(+), 17 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/16474/28 -- To view, visit http://gerrit.cloudera.org:8080/16474 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 Gerrit-Change-Number: 16474 Gerrit-PatchSet: 28 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews
Qifan Chen has uploaded a new patch set (#27). ( http://gerrit.cloudera.org:8080/16474 ) Change subject: IMPALA-10178 Run-time profile shall report skews .. IMPALA-10178 Run-time profile shall report skews This fix addresses the current limitation in runtime profile that skews existing in certain operators such as the rows read counter (RowsRead) in the scan operators are not reported. A skew condition exists when the number of rows processed at each operator instance is not about the same and can be detected through standard deviation (stddev). A high stddev (say > 5) usually implies the existence of skew. With the fix, such skew is detected for the following counters 1. RowsRead in HDFS_SCAN_NODE and KUDU_SCAN_NODE 2. ProbeRows and BuildRows in HASH_JOIN_NODE 3. RowsReturned in GroupingAggregator, EXCHANGE and SORT_NODE and reported as follows: 1. In execution profile, a new skew summary that lists the names of the operators with skews; 2. In the averaged profile for the corresponding operator, the list of values of the counter across all fragment instances in the backend processes, and the skew detection formula; 3. Skew detection formula: a: stddev/mean > 0.05 and mean > 1,000,000, for conservative reporting b: stddev > 5, for aggressive reporting 4. A new query option 'report_skew' = -1, 0: disable skew reporting = 1: enable conservative skew report (default) = 2: enable aggressive skew report Examples of skews reported for a hash join and an hdfs scan. In execution profile: ... ... skew(s) found at: HASH_JOIN_NODE (id=4), HDFS_SCAN_NODE (id=0) Per Node Peak Memory Usage: ... Per Node Bytes Read: ... Per Node User Time: ... Per Node System Time: ... ... In averaged profiles(conservative reporting): HDFS_SCAN_NODE (id=2): ... Skew details: RowsRead ([2004992,1724693,2001351], stddev/mean=0.07, mean=1910345) In averaged profiles (aggressive reporting): HASH_JOIN_NODE (id=4): ... Skew details: ProbeRows ([16904,17750,19197], stddev=946.77) ... ... HDFS_SCAN_NODE (id=0): ... Skew details: RowsRead ([913887,917913,1048604], stddev=62578.85) Testing: 1. Added test_aggressive_skew_reporting_in_runtime_profile and test_conservative_skew_reporting_in_runtime_profile in test_observability.py to verify that the skews are reported. 2. Ran Core tests successfully. Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 --- M be/src/runtime/coordinator.cc M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/util/runtime-profile-counters.h M be/src/util/runtime-profile.cc M be/src/util/runtime-profile.h M be/src/util/stat-util.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M tests/query_test/test_hash_join_timer.py M tests/query_test/test_observability.py 11 files changed, 290 insertions(+), 17 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/16474/27 -- To view, visit http://gerrit.cloudera.org:8080/16474 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 Gerrit-Change-Number: 16474 Gerrit-PatchSet: 27 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16474 ) Change subject: IMPALA-10178 Run-time profile shall report skews .. Patch Set 27: (1 comment) http://gerrit.cloudera.org:8080/#/c/16474/27/be/src/util/runtime-profile-counters.h File be/src/util/runtime-profile-counters.h: http://gerrit.cloudera.org:8080/#/c/16474/27/be/src/util/runtime-profile-counters.h@416 PS27, Line 416: /// Input argument 'option': line has trailing whitespace -- To view, visit http://gerrit.cloudera.org:8080/16474 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 Gerrit-Change-Number: 16474 Gerrit-PatchSet: 27 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sun, 27 Sep 2020 21:13:31 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews
Qifan Chen has uploaded a new patch set (#26). ( http://gerrit.cloudera.org:8080/16474 ) Change subject: IMPALA-10178 Run-time profile shall report skews .. IMPALA-10178 Run-time profile shall report skews This fix addresses the current limitation in runtime profile that skews existing in certain operators such as the rows read counter (RowsRead) in the scan operators are not reported. A skew condition exists when the number of rows processed at each operator instance is not about the same and can be detected through standard deviation (stddev). A high stddev (say > 5) usually implies the existence of skew. With the fix, such skew is detected for the following counters 1. RowsRead in HDFS_SCAN_NODE and KUDU_SCAN_NODE 2. ProbeRows and BuildRows in HASH_JOIN_NODE 3. RowsReturned in GroupingAggregator, EXCHANGE and SORT_NODE and reported as follows: 1. In execution profile, a new skew summary that lists the names of the operators with skews; 2. In the averaged profile for the corresponding operator, the list of values of the counter across all fragment instances in the backend processes, and the skew detection formula; 3. Skew detection formula: a: stddev/mean > 0.05 and mean > 1,000,000, for conservative reporting b: stddev > 5, for aggressive reporting 4. A new query option 'report_skew' = -1, 0: disable skew reporting = 1: report skew conservatively (default) = 2: report skew aggressively Examples of skews reported for a hash join and an hdfs scan. In execution profile: ... ... skew(s) found at: HASH_JOIN_NODE (id=4), HDFS_SCAN_NODE (id=0) Per Node Peak Memory Usage: ... Per Node Bytes Read: ... Per Node User Time: ... Per Node System Time: ... ... In averaged profiles: HASH_JOIN_NODE (id=4): ... Skew details: ProbeRows ([16904,17750,19197], stddev=946.77) ... ... HDFS_SCAN_NODE (id=0): ... Skew details: RowsRead ([913887,917913,1048604], stddev=62578.85) Testing: 1. Added test_aggressive_skew_reporting_in_runtime_profile and test_conservative_skew_reporting_in_runtime_profile in test_observability.py to verify that the skews are reported. 2. Ran Core tests successfully. Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 --- M be/src/runtime/coordinator.cc M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/util/runtime-profile-counters.h M be/src/util/runtime-profile.cc M be/src/util/runtime-profile.h M be/src/util/stat-util.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M tests/query_test/test_hash_join_timer.py M tests/query_test/test_observability.py 11 files changed, 285 insertions(+), 17 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/16474/26 -- To view, visit http://gerrit.cloudera.org:8080/16474 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 Gerrit-Change-Number: 16474 Gerrit-PatchSet: 26 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16474 ) Change subject: IMPALA-10178 Run-time profile shall report skews .. Patch Set 25: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7289/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16474 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 Gerrit-Change-Number: 16474 Gerrit-PatchSet: 25 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sun, 27 Sep 2020 18:48:06 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/16474 ) Change subject: IMPALA-10178 Run-time profile shall report skews .. Patch Set 25: Add a new query option 'report_skew' to specify the skew reporting mode. -1,0: disable reporting at all; 1: conservatively: use the formula stddev/mean > 0.05 && mean > 1,000,000, default 2: aggressively: use the formula stddev > 5 -- To view, visit http://gerrit.cloudera.org:8080/16474 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 Gerrit-Change-Number: 16474 Gerrit-PatchSet: 25 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sun, 27 Sep 2020 18:28:53 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews
Qifan Chen has uploaded a new patch set (#25). ( http://gerrit.cloudera.org:8080/16474 ) Change subject: IMPALA-10178 Run-time profile shall report skews .. IMPALA-10178 Run-time profile shall report skews This fix addresses the current limitation in runtime profile that skews existing in certain operators such as the rows read counter (RowsRead) in the scan operators are not reported. A skew condition exists when the number of rows processed at each operator instance is not about the same and can be detected through standard deviation (stddev). A high stddev (say > 5) usually implies the existence of skew. With the fix, such skew is detected for the following counters 1. RowsRead in HDFS_SCAN_NODE and KUDU_SCAN_NODE profile 2. ProbeRows and BuildRows in HASH_JOIN_NODE profile 3. RowsReturned in GroupingAggregator, EXCHANGE and SORT_NODE profile and reported as follows: 1. In a new skew summary in execution profile that lists the names of the operators with skews; 2. In each corresponding operator in the averaged profile for a fragment, the name of the counter, the list of values of the counter across all fragment instances in the backend processes, and the stddev value. Examples of skews reported for a hash join and an hdfs scan. In execution profile: ... ... skew(s) found at: HASH_JOIN_NODE (id=4), HDFS_SCAN_NODE (id=0) Per Node Peak Memory Usage: ... Per Node Bytes Read: ... Per Node User Time: ... Per Node System Time: ... ... In averaged profiles: HASH_JOIN_NODE (id=4): ... Skew details: ProbeRows ([16904,17750,19197], stddev=946.77) ... ... HDFS_SCAN_NODE (id=0): ... Skew details: RowsRead ([913887,917913,1048604], stddev=62578.85) The reporting is controlled by a new query option 'report_skew' as follows. -1, 0: disabled 1: report skew conservatively, iff stddev/mean > 0.05 and mean > 1,000,000 2: report skew aggressively, iff stddev > 5. Testing: 1. Added a new test test_skew_reporting_in_runtime_profile in test_observability.py to verify that the skews are reported. 2. Ran Core tests successfully. Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 --- M be/src/runtime/coordinator.cc M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/util/runtime-profile-counters.h M be/src/util/runtime-profile.cc M be/src/util/runtime-profile.h M be/src/util/stat-util.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M tests/query_test/test_hash_join_timer.py M tests/query_test/test_observability.py 11 files changed, 260 insertions(+), 17 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/16474/25 -- To view, visit http://gerrit.cloudera.org:8080/16474 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 Gerrit-Change-Number: 16474 Gerrit-PatchSet: 25 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16474 ) Change subject: IMPALA-10178 Run-time profile shall report skews .. Patch Set 24: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7286/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16474 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 Gerrit-Change-Number: 16474 Gerrit-PatchSet: 24 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 26 Sep 2020 00:20:17 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews
Qifan Chen has uploaded a new patch set (#24). ( http://gerrit.cloudera.org:8080/16474 ) Change subject: IMPALA-10178 Run-time profile shall report skews .. IMPALA-10178 Run-time profile shall report skews This fix addresses the current limitation in runtime profile that skews existing in certain operators such as the rows read counter (RowsRead) in the scan operators are not reported. A skew condition exists when the number of rows processed at each operator instance is not about the same and can be detected through standard deviation (stddev). A high stddev (say > 5) usually implies the existence of skew. With the fix, such skew is detected for the following counters 1. RowsRead in HDFS_SCAN_NODE and KUDU_SCAN_NODE profile 2. ProbeRows and BuildRows in HASH_JOIN_NODE profile 3. RowsReturned in GroupingAggregator, EXCHANGE and SORT_NODE profile and reported as follows: 1. In a new skew summary in execution profile that lists the names of the operators with skews; 2. In each corresponding operator in the averaged profile for a fragment, the name of the counter, the list of values of the counter across all fragment instances in the backend processes, and the stddev value. Examples of skews reported for a hash join and an hdfs scan. In execution profile: ... ... skew(s) found at: HASH_JOIN_NODE (id=4), HDFS_SCAN_NODE (id=0) Per Node Peak Memory Usage: ... Per Node Bytes Read: ... Per Node User Time: ... Per Node System Time: ... ... In averaged profiles: HASH_JOIN_NODE (id=4): ... Skew details: ProbeRows ([16904,17750,19197], stddev=946.77) ... ... HDFS_SCAN_NODE (id=0): ... Skew details: RowsRead ([913887,917913,1048604], stddev=62578.85) Testing: 1. Added a new test test_skew_reporting_in_runtime_profile in test_observability.py to verify that the skews are reported. 2. Ran Core tests successfully. Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 --- M be/src/runtime/coordinator.cc M be/src/util/runtime-profile-counters.h M be/src/util/runtime-profile.cc M be/src/util/runtime-profile.h M be/src/util/stat-util.h M tests/query_test/test_hash_join_timer.py M tests/query_test/test_observability.py 7 files changed, 177 insertions(+), 10 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/16474/24 -- To view, visit http://gerrit.cloudera.org:8080/16474 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 Gerrit-Change-Number: 16474 Gerrit-PatchSet: 24 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/16474 ) Change subject: IMPALA-10178 Run-time profile shall report skews .. Patch Set 22: (1 comment) http://gerrit.cloudera.org:8080/#/c/16474/21/be/src/util/runtime-profile.cc File be/src/util/runtime-profile.cc: http://gerrit.cloudera.org:8080/#/c/16474/21/be/src/util/runtime-profile.cc@1928 PS21, Line 1928: vector values; > > In the past, stddev with a threshold of 5 served the purpose well. Here are some examples from a profile QueryIDa841807823cbdc837375aa60.txt involving tables in the order of million rows and a DoP of 212. Hash join 08 left child: 145.00M rows frag instances: 212 stddev: 11421.5 mean: 683975 stddev/mean: 0.0166 Hdfs_scan 18 fragment instances=209 stddev = 918947 mean = 4.45696e+06 stddev / mean = 0.206 hash exchange 38 fragment instances=209 stddev=13692.9 mean = 1.52542e+06 Stddev/mean = 0.0089 Here stddev/mean is called coefficient of variation (CV), also known as relative standard deviation (RSD). It shows the extent of variability in relation to the mean of the population. In our case, the less the CV, the better. When all values are the same and >=1, CV is 0 because stddev is 0. If we look at these three examples above, we can see that hdfs scan at node 18 has a CV value of 20%. That is a skew case in my opinion. Skews with other two are much less. The intention of reporting skew is to reveal processing imbalance. The translation of the skew-ness to performance loss has to be done separately. In the case of filtering, my theory is that if the matching values are distributed evenly, and the scanners are applied evenly, the rows read should be about the same. To report severe skews only for impala, maybe we can use CV (instead of stddev) as a threshold. Say cv > 5% && mean over 1 million. -- To view, visit http://gerrit.cloudera.org:8080/16474 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 Gerrit-Change-Number: 16474 Gerrit-PatchSet: 22 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 25 Sep 2020 17:36:28 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews
Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/16474 ) Change subject: IMPALA-10178 Run-time profile shall report skews .. Patch Set 21: (1 comment) http://gerrit.cloudera.org:8080/#/c/16474/21/be/src/util/runtime-profile.cc File be/src/util/runtime-profile.cc: http://gerrit.cloudera.org:8080/#/c/16474/21/be/src/util/runtime-profile.cc@1928 PS21, Line 1928: if (stddev > 5.0) { > In the past, stddev with a threshold of 5 served the purpose well. Would be interested into seeing what evidence we have to support this. Might be worth running this logic against some larger runtime profiles and see what comes out. With Parquet encodings, Parquet filter pushdown, page skipping, runtime filters, etc. I wouldn't expect the number of rows read by a scan node to be that close together. just want to make sure that the skew flag doesn't start popping up on every runtime profile we get, in which case folks will start to ignore it. another benchmark might be to see how many TPC-DS or TPC-H profiles the skew flag pops up on. maybe 30 GB would be enough scale, not sure. -- To view, visit http://gerrit.cloudera.org:8080/16474 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 Gerrit-Change-Number: 16474 Gerrit-PatchSet: 21 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 24 Sep 2020 20:32:43 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16474 ) Change subject: IMPALA-10178 Run-time profile shall report skews .. Patch Set 22: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7278/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16474 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 Gerrit-Change-Number: 16474 Gerrit-PatchSet: 22 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 24 Sep 2020 19:57:55 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews
Qifan Chen has uploaded a new patch set (#22). ( http://gerrit.cloudera.org:8080/16474 ) Change subject: IMPALA-10178 Run-time profile shall report skews .. IMPALA-10178 Run-time profile shall report skews This fix addresses the current limitation in runtime profile that skews existing in certain operators such as the rows read counter (RowsRead) in the scan operators are not reported. A skew condition exists when the number of rows processed at each operator instance is not about the same and can be detected through standard deviation (stddev). A high stddev (say > 5) usually implies the existence of skew. With the fix, such skew is detected for the following counters 1. RowsRead in HDFS_SCAN_NODE and KUDU_SCAN_NODE profile 2. ProbeRows and BuildRows in HASH_JOIN_NODE profile 3. RowsReturned in GroupingAggregator, EXCHANGE and SORT_NODE profile and reported as follows: 1. In a new skew summary in execution profile that lists the names of the operators with skews; 2. In each corresponding operator in the averaged profile for a fragment, the name of the counter, the list of values of the counter across all fragment instances in the backend processes, and the stddev value. Examples of skews reported for a hash join and an hdfs scan. In execution profile: ... ... skew(s) found at: HASH_JOIN_NODE (id=4), HDFS_SCAN_NODE (id=0) Per Node Peak Memory Usage: ... Per Node Bytes Read: ... Per Node User Time: ... Per Node System Time: ... ... In averaged profiles: HASH_JOIN_NODE (id=4): ... Skew details: ProbeRows ([16904,17750,19197], stddev=946.77) ... ... HDFS_SCAN_NODE (id=0): ... Skew details: RowsRead ([913887,917913,1048604], stddev=62578.85) Testing: 1. Added a new test test_skew_reporting_in_runtime_profile in test_observability.py to verify that the skews are reported. 2. Ran Core tests successfully. Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 --- M be/src/runtime/coordinator.cc M be/src/util/runtime-profile-counters.h M be/src/util/runtime-profile.cc M be/src/util/runtime-profile.h M be/src/util/stat-util.h M tests/query_test/test_hash_join_timer.py M tests/query_test/test_observability.py 7 files changed, 182 insertions(+), 10 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/16474/22 -- To view, visit http://gerrit.cloudera.org:8080/16474 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 Gerrit-Change-Number: 16474 Gerrit-PatchSet: 22 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/16474 ) Change subject: IMPALA-10178 Run-time profile shall report skews .. Patch Set 21: (3 comments) Thanks for the review! http://gerrit.cloudera.org:8080/#/c/16474/21//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16474/21//COMMIT_MSG@28 PS21, Line 28: 2. In each corresponding operator in the averaged profile, the name : of the counter, the list of values of the counter across the : impalad backend processes, and the stddev value. > I'm a bit confused as to whether this just detects skew across all fragment An averaged profile is created per fragment, summarizing all data from all fragment instances for that fragment. Regardless of on the fragment instances are distributed across the nodes, the skew is computed for each fragment. Reworded. http://gerrit.cloudera.org:8080/#/c/16474/21/be/src/util/runtime-profile.h File be/src/util/runtime-profile.h: http://gerrit.cloudera.org:8080/#/c/16474/21/be/src/util/runtime-profile.h@202 PS21, Line 202: // Generate a string enumerating profiles rooted at this. : std::string DebugString(int indent = 0); > where is this used? Removed. http://gerrit.cloudera.org:8080/#/c/16474/21/be/src/util/runtime-profile.cc File be/src/util/runtime-profile.cc: http://gerrit.cloudera.org:8080/#/c/16474/21/be/src/util/runtime-profile.cc@1928 PS21, Line 1928: if (stddev > 5.0) { > how well does this work as the number of rows processed by a counter increa Yes, a stddev of 5 may not be a big deal with respect to very large row count. However, it still captures the variation. And a large stddev implies large variation which should be reduced somehow toward a stddev of 0. In the past, stddev with a threshold of 5 served the purpose well. -- To view, visit http://gerrit.cloudera.org:8080/16474 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 Gerrit-Change-Number: 16474 Gerrit-PatchSet: 21 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 24 Sep 2020 19:34:16 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews
Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/16474 ) Change subject: IMPALA-10178 Run-time profile shall report skews .. Patch Set 21: (3 comments) http://gerrit.cloudera.org:8080/#/c/16474/21//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16474/21//COMMIT_MSG@28 PS21, Line 28: 2. In each corresponding operator in the averaged profile, the name : of the counter, the list of values of the counter across the : impalad backend processes, and the stddev value. I'm a bit confused as to whether this just detects skew across all fragment instances on a single node, or does detect skew across all fragment instances across all nodes? http://gerrit.cloudera.org:8080/#/c/16474/21/be/src/util/runtime-profile.h File be/src/util/runtime-profile.h: http://gerrit.cloudera.org:8080/#/c/16474/21/be/src/util/runtime-profile.h@202 PS21, Line 202: // Generate a string enumerating profiles rooted at this. : std::string DebugString(int indent = 0); where is this used? http://gerrit.cloudera.org:8080/#/c/16474/21/be/src/util/runtime-profile.cc File be/src/util/runtime-profile.cc: http://gerrit.cloudera.org:8080/#/c/16474/21/be/src/util/runtime-profile.cc@1928 PS21, Line 1928: if (stddev > 5.0) { how well does this work as the number of rows processed by a counter increases? e.g. if there are nodes processing billions of rows, a std-dev of more than 5 doesn't seem that statistically significant I'm not entirely sure how it works but the single_node_perf_benchmark.py uses various tests to check if a difference in runtime profile counters are statistically significant. see report_benchmark_results.py which refers to things like "ttest t-value" and the "Mann-Whitney Z-value". I'm not stats expert but, simply hardcoding the threshold to 5 seems odd. -- To view, visit http://gerrit.cloudera.org:8080/16474 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 Gerrit-Change-Number: 16474 Gerrit-PatchSet: 21 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 24 Sep 2020 17:59:59 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16474 ) Change subject: IMPALA-10178 Run-time profile shall report skews .. Patch Set 21: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7275/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16474 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 Gerrit-Change-Number: 16474 Gerrit-PatchSet: 21 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 24 Sep 2020 17:49:30 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews
Qifan Chen has uploaded a new patch set (#21). ( http://gerrit.cloudera.org:8080/16474 ) Change subject: IMPALA-10178 Run-time profile shall report skews .. IMPALA-10178 Run-time profile shall report skews This fix addresses the current limitation in runtime profile that skews existing in certain operators such as the rows read counter (RowsRead) in the scan operators are not reported. A skew condition exists when the number of rows processed at each operator instance is not about the same and can be detected through standard deviation (stddev). A high stddev (say > 5) usually implies the existence of skew. With the fix, such skew is detected for the following counters 1. RowsRead in HDFS_SCAN_NODE and KUDU_SCAN_NODE profile 2. ProbeRows and BuildRows in HASH_JOIN_NODE profile 3. RowsReturned in GroupingAggregator, EXCHANGE and SORT_NODE profile and reported as follows: 1. In a new skew summary in execution profile that lists the names of the operators with skews; 2. In each corresponding operator in the averaged profile, the name of the counter, the list of values of the counter across the impalad backend processes, and the stddev value. Examples of skews reported for a hash join and an hdfs scan. ... ... Execution Profile ... ... ... ... skew(s) found at: HASH_JOIN_NODE (id=4), HDFS_SCAN_NODE (id=0) Per Node Peak Memory Usage: ... Per Node Bytes Read: ... Per Node User Time: ... Per Node System Time: ... ... HASH_JOIN_NODE (id=4): ... Skew details: ProbeRows ([16904,17750,19197], stddev=946.77) ... ... HDFS_SCAN_NODE (id=0): ... Skew details: RowsRead ([913887,917913,1048604], stddev=62578.85) Testing: 1. Added a new test test_skew_reporting_in_runtime_profile in test_observability.py to verify that the skews are reported. 2. Ran Core tests successfully. Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 --- M be/src/runtime/coordinator.cc M be/src/util/runtime-profile-counters.h M be/src/util/runtime-profile.cc M be/src/util/runtime-profile.h M be/src/util/stat-util.h M tests/query_test/test_hash_join_timer.py M tests/query_test/test_observability.py 7 files changed, 201 insertions(+), 10 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/16474/21 -- To view, visit http://gerrit.cloudera.org:8080/16474 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 Gerrit-Change-Number: 16474 Gerrit-PatchSet: 21 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16474 ) Change subject: IMPALA-10178 Run-time profile shall report skews .. Patch Set 20: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7262/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16474 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 Gerrit-Change-Number: 16474 Gerrit-PatchSet: 20 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 24 Sep 2020 00:14:15 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews
Qifan Chen has uploaded a new patch set (#20). ( http://gerrit.cloudera.org:8080/16474 ) Change subject: IMPALA-10178 Run-time profile shall report skews .. IMPALA-10178 Run-time profile shall report skews This fix addresses the current limitation in runtime profile that skews existing in certain operators such as the rows read counter (RowsRead) in the scan operators are not reported. A skew condition exists when the number of rows processed at each operator instance is not about the same and can be detected through standard deviation (stddev). A high stddev (say > 5) usually implies the existence of skew. With the fix, such skew is detected for the following counters 1. RowsRead in HDFS_SCAN_NODE and KUDU_SCAN_NODE profile 2. ProbeRows and BuildRows in HASH_JOIN_NODE profile 3. RowsReturned in GroupingAggregator, EXCHANGE and SORT_NODE profile and reported as follows: 1. In a new skew summary in execution profile that lists the names of the operators with skews; 2. In each corresponding operator in the averaged profile, the name of the counter, the list of values of the counter across the impalad backend processes, and the stddev value. Examples of skews reported for a hash join and an hdfs scan. ... ... Execution Profile ... ... ... ... skew(s) found at: HASH_JOIN_NODE (id=4), HDFS_SCAN_NODE (id=0) Per Node Peak Memory Usage: ... Per Node Bytes Read: ... Per Node User Time: ... Per Node System Time: ... ... HASH_JOIN_NODE (id=4): ... Skew details: ProbeRows ([16904,17750,19197], stddev=946.77) ... ... HDFS_SCAN_NODE (id=0): ... Skew details: RowsRead ([913887,917913,1048604], stddev=62578.85) Testing: 1. Added a new test test_skew_reporting_in_runtime_profile in test_observability.py to verify that the skews are reported. 2. Ran Core tests successfully. Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 --- M be/src/runtime/coordinator.cc M be/src/util/runtime-profile-counters.h M be/src/util/runtime-profile.cc M be/src/util/runtime-profile.h M be/src/util/stat-util.h M tests/query_test/test_hash_join_timer.py M tests/query_test/test_observability.py 7 files changed, 201 insertions(+), 10 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/16474/20 -- To view, visit http://gerrit.cloudera.org:8080/16474 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 Gerrit-Change-Number: 16474 Gerrit-PatchSet: 20 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16474 ) Change subject: IMPALA-10178 Run-time profile shall report skews .. Patch Set 19: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7258/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16474 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 Gerrit-Change-Number: 16474 Gerrit-PatchSet: 19 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 23 Sep 2020 21:28:24 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16474 ) Change subject: IMPALA-10178 Run-time profile shall report skews .. Patch Set 18: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7257/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16474 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 Gerrit-Change-Number: 16474 Gerrit-PatchSet: 18 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 23 Sep 2020 21:16:35 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16474 ) Change subject: IMPALA-10178 Run-time profile shall report skews .. Patch Set 19: (2 comments) http://gerrit.cloudera.org:8080/#/c/16474/19/tests/query_test/test_observability.py File tests/query_test/test_observability.py: http://gerrit.cloudera.org:8080/#/c/16474/19/tests/query_test/test_observability.py@806 PS19, Line 806: = flake8: E225 missing whitespace around operator http://gerrit.cloudera.org:8080/#/c/16474/19/tests/query_test/test_observability.py@814 PS19, Line 814: flake8: E221 multiple spaces before operator -- To view, visit http://gerrit.cloudera.org:8080/16474 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 Gerrit-Change-Number: 16474 Gerrit-PatchSet: 19 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 23 Sep 2020 21:08:54 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews
Qifan Chen has uploaded a new patch set (#19). ( http://gerrit.cloudera.org:8080/16474 ) Change subject: IMPALA-10178 Run-time profile shall report skews .. IMPALA-10178 Run-time profile shall report skews This fix addresses the current limitation in runtime profile that skews existing in certain operators such as the rows read counter (RowsRead) in the scan operators are not reported. A skew condition exists when the number of rows processed at each operator instance is not about the same and can be detected through standard deviation (stddev). A high stddev (say > 5) usually implies the existence of skew. With the fix, such skew is detected for the following counters 1. RowsRead in HDFS_SCAN_NODE and KUDU_SCAN_NODE profile 2. ProbeRows and BuildRows in HASH_JOIN_NODE profile 3. RowsReturned in GroupingAggregator, EXCHANGE and SORT_NODE profile and reported as follows: 1. In a new skew summary in execution profile that lists the names of the operators with skews; 2. In each corresponding operator in the averaged profile, the name of the counter, the list of values of the counter across the impalad backend processes, and the stddev value. Examples of skews reported for a hash join and an hdfs scan. ... ... Execution Profile ... ... ... ... skew(s) found at: HASH_JOIN_NODE (id=4), HDFS_SCAN_NODE (id=0) Per Node Peak Memory Usage: ... Per Node Bytes Read: ... Per Node User Time: ... Per Node System Time: ... ... HASH_JOIN_NODE (id=4): ... Skew details: ProbeRows ([16904,17750,19197], stddev=946.77) ... ... HDFS_SCAN_NODE (id=0): ... Skew details: RowsRead ([913887,917913,1048604], stddev=62578.85) Testing: 1. Added a new test test_skew_reporting_in_runtime_profile in test_observability.py to verify that the skews are reported. 2. Ran Core tests successfully. Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 --- M be/src/runtime/coordinator.cc M be/src/util/runtime-profile-counters.h M be/src/util/runtime-profile.cc M be/src/util/runtime-profile.h M be/src/util/stat-util.h M tests/query_test/test_hash_join_timer.py M tests/query_test/test_observability.py 7 files changed, 195 insertions(+), 10 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/16474/19 -- To view, visit http://gerrit.cloudera.org:8080/16474 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 Gerrit-Change-Number: 16474 Gerrit-PatchSet: 19 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16474 ) Change subject: IMPALA-10178 Run-time profile shall report skews .. Patch Set 18: (4 comments) http://gerrit.cloudera.org:8080/#/c/16474/18/be/src/util/stat-util.h File be/src/util/stat-util.h: http://gerrit.cloudera.org:8080/#/c/16474/18/be/src/util/stat-util.h@45 PS18, Line 45: /// Computes the mean and the standard deviation (population) from an array of line has trailing whitespace http://gerrit.cloudera.org:8080/#/c/16474/18/tests/query_test/test_hash_join_timer.py File tests/query_test/test_hash_join_timer.py: http://gerrit.cloudera.org:8080/#/c/16474/18/tests/query_test/test_hash_join_timer.py@141 PS18, Line 141: ; flake8: E703 statement ends with a semicolon http://gerrit.cloudera.org:8080/#/c/16474/18/tests/query_test/test_observability.py File tests/query_test/test_observability.py: http://gerrit.cloudera.org:8080/#/c/16474/18/tests/query_test/test_observability.py@806 PS18, Line 806: = flake8: E225 missing whitespace around operator http://gerrit.cloudera.org:8080/#/c/16474/18/tests/query_test/test_observability.py@814 PS18, Line 814: flake8: E221 multiple spaces before operator -- To view, visit http://gerrit.cloudera.org:8080/16474 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 Gerrit-Change-Number: 16474 Gerrit-PatchSet: 18 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 23 Sep 2020 20:55:15 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews
Qifan Chen has uploaded a new patch set (#18). ( http://gerrit.cloudera.org:8080/16474 ) Change subject: IMPALA-10178 Run-time profile shall report skews .. IMPALA-10178 Run-time profile shall report skews This fix addresses the current limitation in runtime profile that skews existing in certain operators such as the rows read counter (RowsRead) in the scan operators are not reported. A skew condition exists when the number of rows processed at each operator instance is not about the same and can be detected through standard deviation (stddev). A high stddev (say > 5) usually implies the existence of skew. With the fix, such skew is detected for the following counters 1. RowsRead in HDFS_SCAN_NODE and KUDU_SCAN_NODE profile 2. ProbeRows and BuildRows in HASH_JOIN_NODE profile 3. RowsReturned in GroupingAggregator, EXCHANGE and SORT_NODE profile and reported as follows: 1. In a new skew summary in execution profile that lists the names of the operators with skews; 2. In each corresponding operator in the averaged profile, the name of the counter, the list of values of the counter across the impalad backend processes, and the stddev value. Examples of skews reported for a hash join and an hdfs scan. ... ... Execution Profile ... ... ... ... skew(s) found at: HASH_JOIN_NODE (id=4), HDFS_SCAN_NODE (id=0) Per Node Peak Memory Usage: ... Per Node Bytes Read: ... Per Node User Time: ... Per Node System Time: ... ... HASH_JOIN_NODE (id=4): ... Skew details: ProbeRows ([16904,17750,19197], stddev=946.77) ... ... HDFS_SCAN_NODE (id=0): ... Skew details: RowsRead ([913887,917913,1048604], stddev=62578.85) Testing: 1. Added a new test test_skew_reporting_in_runtime_profile in test_observability.py to verify that the skews are reported. 2. Ran Core tests successfully. Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 --- M be/src/runtime/coordinator.cc M be/src/util/runtime-profile-counters.h M be/src/util/runtime-profile.cc M be/src/util/runtime-profile.h M be/src/util/stat-util.h M tests/query_test/test_hash_join_timer.py M tests/query_test/test_observability.py 7 files changed, 195 insertions(+), 10 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/16474/18 -- To view, visit http://gerrit.cloudera.org:8080/16474 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 Gerrit-Change-Number: 16474 Gerrit-PatchSet: 18 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16474 ) Change subject: IMPALA-10178 Run-time profile shall report skews .. Patch Set 16: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7252/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16474 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 Gerrit-Change-Number: 16474 Gerrit-PatchSet: 16 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 23 Sep 2020 18:10:08 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16474 ) Change subject: IMPALA-10178 Run-time profile shall report skews .. Patch Set 16: (8 comments) http://gerrit.cloudera.org:8080/#/c/16474/16/be/src/util/runtime-profile-counters.h File be/src/util/runtime-profile-counters.h: http://gerrit.cloudera.org:8080/#/c/16474/16/be/src/util/runtime-profile-counters.h@414 PS16, Line 414: /// all valid raw values backing this average counter. line has trailing whitespace http://gerrit.cloudera.org:8080/#/c/16474/16/be/src/util/runtime-profile-counters.h@415 PS16, Line 415: /// line has trailing whitespace http://gerrit.cloudera.org:8080/#/c/16474/16/be/src/util/runtime-profile-counters.h@417 PS16, Line 417: /// all valid raw values and the population stddev in the form of: line has trailing whitespace http://gerrit.cloudera.org:8080/#/c/16474/16/be/src/util/runtime-profile-counters.h@419 PS16, Line 419: /// line has trailing whitespace http://gerrit.cloudera.org:8080/#/c/16474/16/be/src/util/stat-util.h File be/src/util/stat-util.h: http://gerrit.cloudera.org:8080/#/c/16474/16/be/src/util/stat-util.h@45 PS16, Line 45: /// Computes the mean and the standard deviation (population) from an array of line has trailing whitespace http://gerrit.cloudera.org:8080/#/c/16474/16/tests/query_test/test_observability.py File tests/query_test/test_observability.py: http://gerrit.cloudera.org:8080/#/c/16474/16/tests/query_test/test_observability.py@801 PS16, Line 801: # flake8: E265 block comment should start with '# ' http://gerrit.cloudera.org:8080/#/c/16474/16/tests/query_test/test_observability.py@807 PS16, Line 807: = flake8: E225 missing whitespace around operator http://gerrit.cloudera.org:8080/#/c/16474/16/tests/query_test/test_observability.py@817 PS16, Line 817: flake8: E221 multiple spaces before operator -- To view, visit http://gerrit.cloudera.org:8080/16474 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 Gerrit-Change-Number: 16474 Gerrit-PatchSet: 16 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 23 Sep 2020 17:53:46 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews
Qifan Chen has uploaded a new patch set (#16). ( http://gerrit.cloudera.org:8080/16474 ) Change subject: IMPALA-10178 Run-time profile shall report skews .. IMPALA-10178 Run-time profile shall report skews This fix addresses the current limitation in runtime profile that skews existing in certain operators such as the rows read counter (RowsRead) in the scan operators are not reported. A skew condition exists when the number of rows processed at each operator instance is not about the same and can be detected through standard deviation (stddev). A high stddev (say > 5) usually implies the existence of skew. With the fix and in an average fragment profile, such skew is detected for the following counters 1. RowsRead in HDFS_SCAN_NODE and KUDU_SCAN_NODE profile 2. ProbeRows and BuildRows in HASH_JOIN_NODE profile 3. RowsReturned in GroupingAggregator, EXCHANGE and SORT_NODE profile and reported as follows: 1. In the skew summary section which lists the names of the operators with skews; 2. In each corresponding operator, the name of the counters and the corresponding stddev values. Examples of skews reported for a hash join and an hdfs scan. Averaged Fragment F00:(Total: 1s075ms, non-child: 26.919ms, ... ... ... num instances: 3 skew(s) found at: HASH_JOIN_NODE (id=4), HDFS_SCAN_NODE (id=0) HASH_JOIN_NODE (id=4):(Total: 1s204ms, non-child: 2.166ms, ... Skew details: ProbeRows ([16904, 17750, 19197], stddev=946.77) ... ... HDFS_SCAN_NODE (id=0):(Total: 1s032ms, non-child: 1s032ms, ... Skew details: RowsRead ([913887, 917913, 1048604], stddev=62578.85) Testing: 1. Added a new test test_skew_reporting_in_runtime_profile to test_observability.py to verify that the skews are reported. 2. Ran Core tests successfully. Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 --- M be/src/runtime/coordinator-backend-state.cc M be/src/util/runtime-profile-counters.h M be/src/util/runtime-profile.cc M be/src/util/runtime-profile.h M be/src/util/stat-util.h M tests/query_test/test_observability.py 6 files changed, 195 insertions(+), 11 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/16474/16 -- To view, visit http://gerrit.cloudera.org:8080/16474 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 Gerrit-Change-Number: 16474 Gerrit-PatchSet: 16 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/16474 ) Change subject: IMPALA-10178 Run-time profile shall report skews .. Patch Set 15: (7 comments) http://gerrit.cloudera.org:8080/#/c/16474/15//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16474/15//COMMIT_MSG@22 PS15, Line 22: 3. RowsReturned in GroupingAggregator profile > It would be good to add this for sort operations too. We have SortDataSize Good point. Added the following: {"EXCHANGE_NODE", "RowsReturned"} and {"SORT_NODE", "RowsReturned"}. Since sort does not drop tuples, I guess RowsReturned should be OK. http://gerrit.cloudera.org:8080/#/c/16474/15//COMMIT_MSG@36 PS15, Line 36: skew(s) found at: HASH_JOIN_NODE (id=4), HDFS_SCAN_NODE (id=0) > I thought a bit about whether using the info string was the right approach Yeah. The skew summary follows the current model by adding some extra info strings to the aggregated profile. http://gerrit.cloudera.org:8080/#/c/16474/15/be/src/util/runtime-profile-counters.h File be/src/util/runtime-profile-counters.h: http://gerrit.cloudera.org:8080/#/c/16474/15/be/src/util/runtime-profile-counters.h@415 PS15, Line 415: o. > can you explicitly say that it's returned in 'details'. Done http://gerrit.cloudera.org:8080/#/c/16474/15/be/src/util/runtime-profile-counters.h@416 PS15, Line 416: > nit: convention is to use pointer for output args. Done http://gerrit.cloudera.org:8080/#/c/16474/15/be/src/util/runtime-profile.cc File be/src/util/runtime-profile.cc: http://gerrit.cloudera.org:8080/#/c/16474/15/be/src/util/runtime-profile.cc@1969 PS15, Line 1969: int num_valid_values = NumValidValues(); > I don't think this quite works since valid values could be added concurrent Done http://gerrit.cloudera.org:8080/#/c/16474/15/be/src/util/stat-util.h File be/src/util/stat-util.h: http://gerrit.cloudera.org:8080/#/c/16474/15/be/src/util/stat-util.h@28 PS15, Line 28: /// Computes standard deviation given mean > Is this the population standard deviation or the sample standard deviation? Added some comments to clarify that it is the population version that is computed. Also Add 'P' in the function name. http://gerrit.cloudera.org:8080/#/c/16474/15/be/src/util/stat-util.h@28 PS15, Line 28: /// Computes standard deviation given mean > I guess this documentation was already missing but would be good to fix Done -- To view, visit http://gerrit.cloudera.org:8080/16474 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 Gerrit-Change-Number: 16474 Gerrit-PatchSet: 15 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 23 Sep 2020 17:51:29 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16474 ) Change subject: IMPALA-10178 Run-time profile shall report skews .. Patch Set 15: (1 comment) http://gerrit.cloudera.org:8080/#/c/16474/15/be/src/util/stat-util.h File be/src/util/stat-util.h: http://gerrit.cloudera.org:8080/#/c/16474/15/be/src/util/stat-util.h@28 PS15, Line 28: /// Computes standard deviation given mean > Is this the population standard deviation or the sample standard deviation? I guess this documentation was already missing but would be good to fix -- To view, visit http://gerrit.cloudera.org:8080/16474 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 Gerrit-Change-Number: 16474 Gerrit-PatchSet: 15 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 23 Sep 2020 03:49:55 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16474 ) Change subject: IMPALA-10178 Run-time profile shall report skews .. Patch Set 15: (6 comments) http://gerrit.cloudera.org:8080/#/c/16474/15//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16474/15//COMMIT_MSG@22 PS15, Line 22: 3. RowsReturned in GroupingAggregator profile It would be good to add this for sort operations too. We have SortDataSize already, which would be an OK metric, or I guess we could add a count of the rows in the sorter. Could be a follow-on patch but might be good to include here. It would also be good to include RowsReturned for exchanges, since that could be another source of skew. http://gerrit.cloudera.org:8080/#/c/16474/15//COMMIT_MSG@36 PS15, Line 36: skew(s) found at: HASH_JOIN_NODE (id=4), HDFS_SCAN_NODE (id=0) I thought a bit about whether using the info string was the right approach (as opposed to adding it to the thrift in a more structure way) and I think this makes sense, since all the tools can already handle info strings. http://gerrit.cloudera.org:8080/#/c/16474/15/be/src/util/runtime-profile-counters.h File be/src/util/runtime-profile-counters.h: http://gerrit.cloudera.org:8080/#/c/16474/15/be/src/util/runtime-profile-counters.h@415 PS15, Line 415: o. can you explicitly say that it's returned in 'details'. When does it return true/false? http://gerrit.cloudera.org:8080/#/c/16474/15/be/src/util/runtime-profile-counters.h@416 PS15, Line 416: nit: convention is to use pointer for output args. http://gerrit.cloudera.org:8080/#/c/16474/15/be/src/util/runtime-profile.cc File be/src/util/runtime-profile.cc: http://gerrit.cloudera.org:8080/#/c/16474/15/be/src/util/runtime-profile.cc@1969 PS15, Line 1969: int num_valid_values = NumValidValues(); I don't think this quite works since valid values could be added concurrently with this method executes. I think you instead need to do a single pass over the array prepend the ", " if it's not the first. http://gerrit.cloudera.org:8080/#/c/16474/15/be/src/util/stat-util.h File be/src/util/stat-util.h: http://gerrit.cloudera.org:8080/#/c/16474/15/be/src/util/stat-util.h@28 PS15, Line 28: /// Computes standard deviation given mean Is this the population standard deviation or the sample standard deviation? Would be good to document in the comment cause it's caused confusion in the past when it's ambiguous. I don't know which is the right one to use in this context and it probably doesn't matter for the skew threshold. So ok to punt on that. -- To view, visit http://gerrit.cloudera.org:8080/16474 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 Gerrit-Change-Number: 16474 Gerrit-PatchSet: 15 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 23 Sep 2020 03:40:36 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16474 ) Change subject: IMPALA-10178 Run-time profile shall report skews .. Patch Set 15: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7244/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16474 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 Gerrit-Change-Number: 16474 Gerrit-PatchSet: 15 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 22 Sep 2020 21:22:38 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews
Qifan Chen has uploaded a new patch set (#15). ( http://gerrit.cloudera.org:8080/16474 ) Change subject: IMPALA-10178 Run-time profile shall report skews .. IMPALA-10178 Run-time profile shall report skews This fix addresses the current limitation in runtime profile that skews existing in certain operators such as the rows read counter (RowsRead) in the scan operators are not reported. A skew condition exists when the number of rows processed at each operator instance is not about the same and can be detected through standard deviation (stddev). A high stddev (say > 5) usually implies the existence of skew. With the fix and in the average fragment profile, such skew is detected for the following counters 1. RowsRead in HDFS_SCAN_NODE and KUDU_SCAN_NODE profile 2. ProbeRows and BuildRows in HASH_JOIN_NODE profile 3. RowsReturned in GroupingAggregator profile and reported as follows: 1. In the skew summary section which lists the names of the operators with skews; 2. In each corresponding operator, the name of the counters and the corresponding stddev values. Examples of skews reported for a hash join and an hdfs scan. Averaged Fragment F00:(Total: 1s075ms, non-child: 26.919ms, ... ... ... num instances: 3 skew(s) found at: HASH_JOIN_NODE (id=4), HDFS_SCAN_NODE (id=0) HASH_JOIN_NODE (id=4):(Total: 1s204ms, non-child: 2.166ms, ... Skew details: ProbeRows ([16904, 17750, 19197], stddev=946.77) ... ... HDFS_SCAN_NODE (id=0):(Total: 1s032ms, non-child: 1s032ms, ... Skew details: RowsRead ([913887, 917913, 1048604], stddev=62578.85) Testing: 1. Added a new test test_skew_reporting_in_runtime_profile to test_observability.py to verify that the skews are reported. 2. Ran Core tests successfully. Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 --- M be/src/runtime/coordinator-backend-state.cc M be/src/util/runtime-profile-counters.h M be/src/util/runtime-profile.cc M be/src/util/runtime-profile.h M be/src/util/stat-util.h M tests/query_test/test_observability.py 6 files changed, 193 insertions(+), 9 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/16474/15 -- To view, visit http://gerrit.cloudera.org:8080/16474 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 Gerrit-Change-Number: 16474 Gerrit-PatchSet: 15 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16474 ) Change subject: IMPALA-10178 Run-time profile shall report skews .. Patch Set 14: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7239/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16474 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 Gerrit-Change-Number: 16474 Gerrit-PatchSet: 14 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 22 Sep 2020 18:21:36 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews
Qifan Chen has uploaded a new patch set (#14). ( http://gerrit.cloudera.org:8080/16474 ) Change subject: IMPALA-10178 Run-time profile shall report skews .. IMPALA-10178 Run-time profile shall report skews This fix addresses the current limitation in runtime profile that skews existing in certain operators such as the rows read counter (RowsRead) in the scan operators are not reported. A skew condition exists when the number of rows processed at each operator instance is not about the same and can be detected through standard deviation (stddev). A high stddev (say > 5) usually implies the existence of skew. With the fix and in the average fragment profile, such skew is detected for the following counters 1. RowsRead in HDFS_SCAN_NODE profile 2. ProbeRows and BuildRows in HASH_JOIN_NODE profile 3. RowsReturned in GroupingAggregator profile and reported as follows: 1. In the skew summary section which lists the names of the operators with skews; 2. In each corresponding operator, the name of the counters and the corresponding stddev values. Examples of skews reported for a hash join and an hdfs scan. Averaged Fragment F00:(Total: 1s075ms, non-child: 26.919ms, ... ... ... num instances: 3 skew(s) found at: HASH_JOIN_NODE (id=4), HDFS_SCAN_NODE (id=0) HASH_JOIN_NODE (id=4):(Total: 1s204ms, non-child: 2.166ms, ... Skew details: ProbeRows ([16904, 17750, 19197], stddev=946.77) ... ... HDFS_SCAN_NODE (id=0):(Total: 1s032ms, non-child: 1s032ms, ... Skew details: RowsRead ([913887, 917913, 1048604], stddev=62578.85) Testing: 1. Added a new test test_skew_reporting_in_runtime_profile to test_observability.py to verify that the skews are reported. 2. Ran Core tests successfully. Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 --- M be/src/runtime/coordinator-backend-state.cc M be/src/util/runtime-profile-counters.h M be/src/util/runtime-profile.cc M be/src/util/runtime-profile.h M be/src/util/stat-util.h M tests/query_test/test_observability.py 6 files changed, 192 insertions(+), 9 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/16474/14 -- To view, visit http://gerrit.cloudera.org:8080/16474 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 Gerrit-Change-Number: 16474 Gerrit-PatchSet: 14 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16474 ) Change subject: IMPALA-10178 Run-time profile shall report skews .. Patch Set 10: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7228/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16474 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 Gerrit-Change-Number: 16474 Gerrit-PatchSet: 10 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 21 Sep 2020 21:49:00 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews
Qifan Chen has uploaded a new patch set (#10). ( http://gerrit.cloudera.org:8080/16474 ) Change subject: IMPALA-10178 Run-time profile shall report skews .. IMPALA-10178 Run-time profile shall report skews This fix addresses the current limitation in runtime profile that skews existing in certain operators such as the rows read counter (RowsRead) in the scan operators are not reported. A skew condition exists when the number of rows processed at each operator instance is not about the same and can be detected through standard deviation (stddev). A high stddev (say > 5) usually implies the existence of skew. With the fix and in the average fragment profile, such skew is detected for the following counters 1. RowsRead in HDFS_SCAN_NODE profile 2. ProbeRows and BuildRows in HASH_JOIN_NODE profile 3. RowsReturned in GroupingAggregator profile and reported as follows: 1. In the skew summary section which lists the names of the operators with skews; 2. In each corresponding operator, the name of the counters and the corresponding stddev values. Examples of skews reported for a hash join and an hdfs scan. Averaged Fragment F00:(Total: 1s075ms, non-child: 26.919ms, ... ... ... num instances: 3 skew(s) found at: HASH_JOIN_NODE (id=4), HDFS_SCAN_NODE (id=0) HASH_JOIN_NODE (id=4):(Total: 1s204ms, non-child: 2.166ms, ... Skew details: ProbeRows ([16904, 17750, 19197], stddev=946.77) ... ... HDFS_SCAN_NODE (id=0):(Total: 1s032ms, non-child: 1s032ms, ... Skew details: RowsRead ([913887, 917913, 1048604], stddev=62578.85) TODO: 1. Add unit tests; 2. Run core tests. Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 --- M be/src/runtime/coordinator-backend-state.cc M be/src/util/CMakeLists.txt A be/src/util/runtime-profile-counters.cc M be/src/util/runtime-profile-counters.h M be/src/util/runtime-profile.cc M be/src/util/runtime-profile.h M be/src/util/stat-util.h 7 files changed, 149 insertions(+), 8 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/16474/10 -- To view, visit http://gerrit.cloudera.org:8080/16474 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 Gerrit-Change-Number: 16474 Gerrit-PatchSet: 10 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16474 ) Change subject: IMPALA-10178 Run-time profile shall report skews .. Patch Set 9: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7225/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16474 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 Gerrit-Change-Number: 16474 Gerrit-PatchSet: 9 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 21 Sep 2020 19:33:20 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews
Qifan Chen has uploaded a new patch set (#9). ( http://gerrit.cloudera.org:8080/16474 ) Change subject: IMPALA-10178 Run-time profile shall report skews .. IMPALA-10178 Run-time profile shall report skews This fix addresses the current limitation in runtime profile that skews existing in certain operators such as the rows read counter (RowsRead) in the scan operators are not reported. A skew condition exists when the number of rows processed at each operator instance is not about the same and can be detected through standard deviation (stddev). A high stddev (say > 5) usually implies the existence of skew. With the fix and in the average fragment profile, such skew is reported as follows: 1. In the skew summary section which lists the names of the operators with skews; 2. In each corresponding operator, the name of the counters and the corresponding stddev values. Examples of skews reported for a hash join and an hdfs scan. Averaged Fragment F00:(Total: 1s075ms, non-child: 26.919ms, ... ... ... num instances: 3 skew(s) found at: HASH_JOIN_NODE (id=4), HDFS_SCAN_NODE (id=0) HASH_JOIN_NODE (id=4):(Total: 1s204ms, non-child: 2.166ms, ... Skew details: ProbeRows (values=[16904, 17750, 19197], stddev=946.770828) ... ... HDFS_SCAN_NODE (id=0):(Total: 1s032ms, non-child: 1s032ms, ... Skew details: RowsRead (values=[913887, 917913, 1048604], stddev=62578.853590) TODO: 1. Add unit tests; 2. Run core tests. Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 --- M be/src/runtime/coordinator-backend-state.cc M be/src/util/CMakeLists.txt A be/src/util/runtime-profile-counters.cc M be/src/util/runtime-profile-counters.h M be/src/util/runtime-profile.cc M be/src/util/runtime-profile.h M be/src/util/stat-util.h 7 files changed, 141 insertions(+), 8 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/16474/9 -- To view, visit http://gerrit.cloudera.org:8080/16474 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 Gerrit-Change-Number: 16474 Gerrit-PatchSet: 9 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong