[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews

2020-10-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16474 )

Change subject: IMPALA-10178 Run-time profile shall report skews
..


Patch Set 45: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/16474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
Gerrit-Change-Number: 16474
Gerrit-PatchSet: 45
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 20 Oct 2020 23:30:49 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews

2020-10-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/16474 )

Change subject: IMPALA-10178 Run-time profile shall report skews
..

IMPALA-10178 Run-time profile shall report skews

This fix addresses the current limitation in runtime profile that
skews existing in certain operators such as the rows read counter
(RowsRead) in the scan operators are not reported. A skew condition
exists when the number of rows processed at each operator instance
is not about the same and can be detected through coefficient of variation
(CoV). A high CoV (say > 1.0) usually implies the existence of
skew.

With the fix, such skew is detected for the following counters
  1. RowsRead in HDFS_SCAN_NODE and KUDU_SCAN_NODE
  2. ProbeRows and BuildRows in HASH_JOIN_NODE
  3. RowsReturned in GroupingAggregator, EXCHANGE and SORT_NODE

and reported as follows:
  1. In execution profile, a new skew summary that lists the names
 of the operators with skews;
  2. In the averaged profile for the corresponding operator, the list
 of values of the counter across all fragment instances in the
 backend processes;
  3. Skew detection formula: CoV > limit and mean > 5,000
  4. A new query option 'report_skew_limit'
 < 0: disable skew reporting
 >= 0: enable skew reporting and supply the CoV limit

Examples of skews reported for a hash join and an hdfs scan.

In execution profile:

  ... ...
  skew(s) found at: HASH_JOIN_NODE (id=4), HDFS_SCAN_NODE (id=0)

  Per Node Peak Memory Usage: ...
  ... ...

In averaged profiles:

  HDFS_SCAN_NODE (id=2): ...
  Skew details: RowsRead ([2004992,1724693,2001351],
  CoV=0.07, mean=1910345)

Testing:
1. Added test_skew_reporting_in_runtime_profile in
   test_observability.py to verify that the skews are reported.
2. Ran Core tests successfully.

Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
Reviewed-on: http://gerrit.cloudera.org:8080/16474
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M be/src/runtime/coordinator.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/runtime-profile-counters.h
M be/src/util/runtime-profile.cc
M be/src/util/runtime-profile.h
M be/src/util/stat-util.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M tests/query_test/test_hash_join_timer.py
M tests/query_test/test_observability.py
11 files changed, 249 insertions(+), 12 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/16474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
Gerrit-Change-Number: 16474
Gerrit-PatchSet: 46
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews

2020-10-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16474 )

Change subject: IMPALA-10178 Run-time profile shall report skews
..


Patch Set 45: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/16474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
Gerrit-Change-Number: 16474
Gerrit-PatchSet: 45
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 20 Oct 2020 18:13:31 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews

2020-10-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16474 )

Change subject: IMPALA-10178 Run-time profile shall report skews
..


Patch Set 45:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6590/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/16474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
Gerrit-Change-Number: 16474
Gerrit-PatchSet: 45
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 20 Oct 2020 18:13:32 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews

2020-10-20 Thread Sahil Takiar (Code Review)
Sahil Takiar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16474 )

Change subject: IMPALA-10178 Run-time profile shall report skews
..


Patch Set 44: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/16474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
Gerrit-Change-Number: 16474
Gerrit-PatchSet: 44
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 20 Oct 2020 18:13:16 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews

2020-10-19 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16474 )

Change subject: IMPALA-10178 Run-time profile shall report skews
..


Patch Set 44:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7490/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
Gerrit-Change-Number: 16474
Gerrit-PatchSet: 44
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 20 Oct 2020 03:40:47 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews

2020-10-19 Thread Qifan Chen (Code Review)
Qifan Chen has uploaded a new patch set (#44). ( 
http://gerrit.cloudera.org:8080/16474 )

Change subject: IMPALA-10178 Run-time profile shall report skews
..

IMPALA-10178 Run-time profile shall report skews

This fix addresses the current limitation in runtime profile that
skews existing in certain operators such as the rows read counter
(RowsRead) in the scan operators are not reported. A skew condition
exists when the number of rows processed at each operator instance
is not about the same and can be detected through coefficient of variation
(CoV). A high CoV (say > 1.0) usually implies the existence of
skew.

With the fix, such skew is detected for the following counters
  1. RowsRead in HDFS_SCAN_NODE and KUDU_SCAN_NODE
  2. ProbeRows and BuildRows in HASH_JOIN_NODE
  3. RowsReturned in GroupingAggregator, EXCHANGE and SORT_NODE

and reported as follows:
  1. In execution profile, a new skew summary that lists the names
 of the operators with skews;
  2. In the averaged profile for the corresponding operator, the list
 of values of the counter across all fragment instances in the
 backend processes;
  3. Skew detection formula: CoV > limit and mean > 5,000
  4. A new query option 'report_skew_limit'
 < 0: disable skew reporting
 >= 0: enable skew reporting and supply the CoV limit

Examples of skews reported for a hash join and an hdfs scan.

In execution profile:

  ... ...
  skew(s) found at: HASH_JOIN_NODE (id=4), HDFS_SCAN_NODE (id=0)

  Per Node Peak Memory Usage: ...
  ... ...

In averaged profiles:

  HDFS_SCAN_NODE (id=2): ...
  Skew details: RowsRead ([2004992,1724693,2001351],
  CoV=0.07, mean=1910345)

Testing:
1. Added test_skew_reporting_in_runtime_profile in
   test_observability.py to verify that the skews are reported.
2. Ran Core tests successfully.

Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
---
M be/src/runtime/coordinator.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/runtime-profile-counters.h
M be/src/util/runtime-profile.cc
M be/src/util/runtime-profile.h
M be/src/util/stat-util.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M tests/query_test/test_hash_join_timer.py
M tests/query_test/test_observability.py
11 files changed, 249 insertions(+), 12 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/16474/44
--
To view, visit http://gerrit.cloudera.org:8080/16474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
Gerrit-Change-Number: 16474
Gerrit-PatchSet: 44
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews

2020-10-19 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16474 )

Change subject: IMPALA-10178 Run-time profile shall report skews
..


Patch Set 43:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/7487/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/16474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
Gerrit-Change-Number: 16474
Gerrit-PatchSet: 43
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 19 Oct 2020 23:56:12 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews

2020-10-19 Thread Qifan Chen (Code Review)
Qifan Chen has uploaded a new patch set (#43). ( 
http://gerrit.cloudera.org:8080/16474 )

Change subject: IMPALA-10178 Run-time profile shall report skews
..

IMPALA-10178 Run-time profile shall report skews

This fix addresses the current limitation in runtime profile that
skews existing in certain operators such as the rows read counter
(RowsRead) in the scan operators are not reported. A skew condition
exists when the number of rows processed at each operator instance
is not about the same and can be detected through coefficient of variation
(CoV). A high CoV (say > 1.0) usually implies the existence of
skew.

With the fix, such skew is detected for the following counters
  1. RowsRead in HDFS_SCAN_NODE and KUDU_SCAN_NODE
  2. ProbeRows and BuildRows in HASH_JOIN_NODE
  3. RowsReturned in GroupingAggregator, EXCHANGE and SORT_NODE

and reported as follows:
  1. In execution profile, a new skew summary that lists the names
 of the operators with skews;
  2. In the averaged profile for the corresponding operator, the list
 of values of the counter across all fragment instances in the
 backend processes;
  3. Skew detection formula: CoV > limit and mean > 5,000
  4. A new query option 'report_skew_limit'
 < 0: disable skew reporting
 >= 0: enable skew reporting and supply the CoV limit

Examples of skews reported for a hash join and an hdfs scan.

In execution profile:

  ... ...
  skew(s) found at: HASH_JOIN_NODE (id=4), HDFS_SCAN_NODE (id=0)

  Per Node Peak Memory Usage: ...
  ... ...

In averaged profiles:

  HDFS_SCAN_NODE (id=2): ...
  Skew details: RowsRead ([2004992,1724693,2001351],
  CoV=0.07, mean=1910345)

Testing:
1. Added test_skew_reporting_in_runtime_profile in
   test_observability.py to verify that the skews are reported.
2. Ran Core tests successfully.

Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
---
M be/src/runtime/coordinator.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/runtime-profile-counters.h
M be/src/util/runtime-profile.cc
M be/src/util/runtime-profile.h
M be/src/util/stat-util.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M tests/query_test/test_hash_join_timer.py
M tests/query_test/test_observability.py
11 files changed, 249 insertions(+), 12 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/16474/43
--
To view, visit http://gerrit.cloudera.org:8080/16474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
Gerrit-Change-Number: 16474
Gerrit-PatchSet: 43
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews

2020-10-19 Thread Sahil Takiar (Code Review)
Sahil Takiar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16474 )

Change subject: IMPALA-10178 Run-time profile shall report skews
..


Patch Set 42: Code-Review+2

(1 comment)

minor nit

http://gerrit.cloudera.org:8080/#/c/16474/42/be/src/util/runtime-profile-counters.h
File be/src/util/runtime-profile-counters.h:

http://gerrit.cloudera.org:8080/#/c/16474/42/be/src/util/runtime-profile-counters.h@498
PS42, Line 498:   const int ROW_AVERAGE_LIMIT=5000;
should be static + has formatting issues. should be something like:

 static const int ROW_AVERAGE_LIMIT = 5000;



--
To view, visit http://gerrit.cloudera.org:8080/16474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
Gerrit-Change-Number: 16474
Gerrit-PatchSet: 42
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 19 Oct 2020 18:21:37 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews

2020-10-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16474 )

Change subject: IMPALA-10178 Run-time profile shall report skews
..


Patch Set 42:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7440/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
Gerrit-Change-Number: 16474
Gerrit-PatchSet: 42
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 13 Oct 2020 16:07:11 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews

2020-10-13 Thread Qifan Chen (Code Review)
Qifan Chen has uploaded a new patch set (#42). ( 
http://gerrit.cloudera.org:8080/16474 )

Change subject: IMPALA-10178 Run-time profile shall report skews
..

IMPALA-10178 Run-time profile shall report skews

This fix addresses the current limitation in runtime profile that
skews existing in certain operators such as the rows read counter
(RowsRead) in the scan operators are not reported. A skew condition
exists when the number of rows processed at each operator instance
is not about the same and can be detected through coefficient of variation
(CoV). A high CoV (say > 1.0) usually implies the existence of
skew.

With the fix, such skew is detected for the following counters
  1. RowsRead in HDFS_SCAN_NODE and KUDU_SCAN_NODE
  2. ProbeRows and BuildRows in HASH_JOIN_NODE
  3. RowsReturned in GroupingAggregator, EXCHANGE and SORT_NODE

and reported as follows:
  1. In execution profile, a new skew summary that lists the names
 of the operators with skews;
  2. In the averaged profile for the corresponding operator, the list
 of values of the counter across all fragment instances in the
 backend processes;
  3. Skew detection formula: CoV > limit and mean > 5,000
  4. A new query option 'report_skew_limit'
 < 0: disable skew reporting
 >= 0: enable skew reporting and supply the CoV limit

Examples of skews reported for a hash join and an hdfs scan.

In execution profile:

  ... ...
  skew(s) found at: HASH_JOIN_NODE (id=4), HDFS_SCAN_NODE (id=0)

  Per Node Peak Memory Usage: ...
  ... ...

In averaged profiles:

  HDFS_SCAN_NODE (id=2): ...
  Skew details: RowsRead ([2004992,1724693,2001351],
  CoV=0.07, mean=1910345)

Testing:
1. Added test_skew_reporting_in_runtime_profile in
   test_observability.py to verify that the skews are reported.
2. Ran Core tests successfully.

Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
---
M be/src/runtime/coordinator.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/runtime-profile-counters.h
M be/src/util/runtime-profile.cc
M be/src/util/runtime-profile.h
M be/src/util/stat-util.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M tests/query_test/test_hash_join_timer.py
M tests/query_test/test_observability.py
11 files changed, 249 insertions(+), 12 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/16474/42
--
To view, visit http://gerrit.cloudera.org:8080/16474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
Gerrit-Change-Number: 16474
Gerrit-PatchSet: 42
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews

2020-10-12 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16474 )

Change subject: IMPALA-10178 Run-time profile shall report skews
..


Patch Set 41: Code-Review+1

(2 comments)

I had a couple of comments in addition to Sahil's.

http://gerrit.cloudera.org:8080/#/c/16474/41/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/16474/41/be/src/runtime/coordinator.cc@1222
PS41, Line 1222: float
This is a double in the thrift definition


http://gerrit.cloudera.org:8080/#/c/16474/41/be/src/util/runtime-profile.cc
File be/src/util/runtime-profile.cc:

http://gerrit.cloudera.org:8080/#/c/16474/41/be/src/util/runtime-profile.cc@1926
PS41, Line 1926: 5000
Can you make this a named constant?



--
To view, visit http://gerrit.cloudera.org:8080/16474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
Gerrit-Change-Number: 16474
Gerrit-PatchSet: 41
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 12 Oct 2020 18:22:08 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews

2020-10-12 Thread Sahil Takiar (Code Review)
Sahil Takiar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16474 )

Change subject: IMPALA-10178 Run-time profile shall report skews
..


Patch Set 41: Code-Review+1

(6 comments)

mostly nits, otherwise approach LGTM.

@Tim if you want to take another look as well. I think the patch has changed 
significantly over the past few weeks.

http://gerrit.cloudera.org:8080/#/c/16474/41/be/src/service/query-options.cc
File be/src/service/query-options.cc:

http://gerrit.cloudera.org:8080/#/c/16474/41/be/src/service/query-options.cc@998
PS41, Line 998:   }
  :   if (set_query_options_mask != NULL) {
  : DCHECK_LT(option, set_query_options_mask->size());
  : set_query_options_mask->set(option);
  :   }
unnecessary change


http://gerrit.cloudera.org:8080/#/c/16474/41/be/src/util/runtime-profile-counters.h
File be/src/util/runtime-profile-counters.h:

http://gerrit.cloudera.org:8080/#/c/16474/41/be/src/util/runtime-profile-counters.h@488
PS41, Line 488:   bool EvaluateSkewWithCoV(double threshold, std::stringstream* 
details);
nit: document return value


http://gerrit.cloudera.org:8080/#/c/16474/41/be/src/util/runtime-profile.h
File be/src/util/runtime-profile.h:

http://gerrit.cloudera.org:8080/#/c/16474/41/be/src/util/runtime-profile.h@203
PS41, Line 203: recurssively
nit: typo


http://gerrit.cloudera.org:8080/#/c/16474/41/be/src/util/runtime-profile.cc
File be/src/util/runtime-profile.cc:

http://gerrit.cloudera.org:8080/#/c/16474/41/be/src/util/runtime-profile.cc@675
PS41, Line 675: Each spec is a pair of a profile name prefix and a list of
  :   // counter names.
would be nice to mention that all counters *have* to be averaged counters


http://gerrit.cloudera.org:8080/#/c/16474/41/be/src/util/runtime-profile.cc@677
PS41, Line 677:   static const unordered_map> 
skew_profile_specs = {
  :   {"KUDU_SCAN_NODE", {"RowsRead"}}, {"HDFS_SCAN_NODE", 
{"RowsRead"}},
  :   {"HASH_JOIN_NODE", {"ProbeRows", "BuildRows"}},
  :   {"GroupingAggregator", {"RowsReturned"}}, 
{"EXCHANGE_NODE", {"RowsReturned"}},
  :   {"SORT_NODE", {"RowsReturned"}}};
nit: would be nice to define a struct that encapsulates the entries in the map. 
something like

 struct SkewProfileSpec {
   string node_name;
   vector counter_names;
 };


http://gerrit.cloudera.org:8080/#/c/16474/41/be/src/util/runtime-profile.cc@1960
PS41, Line 1960:   if (remove_last_comma) {
   : ss.seekp(-1, std::ios_base::end);
   :   }
you can probably simplify the string manipulation logic by using something like 
boost::algorithm::join - 
https://stackoverflow.com/questions/1833447/a-good-example-for-boostalgorithmjoin



--
To view, visit http://gerrit.cloudera.org:8080/16474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
Gerrit-Change-Number: 16474
Gerrit-PatchSet: 41
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 12 Oct 2020 17:42:07 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews

2020-10-11 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16474 )

Change subject: IMPALA-10178 Run-time profile shall report skews
..


Patch Set 41:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7419/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
Gerrit-Change-Number: 16474
Gerrit-PatchSet: 41
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sun, 11 Oct 2020 14:55:55 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews

2020-10-11 Thread Qifan Chen (Code Review)
Qifan Chen has uploaded a new patch set (#41). ( 
http://gerrit.cloudera.org:8080/16474 )

Change subject: IMPALA-10178 Run-time profile shall report skews
..

IMPALA-10178 Run-time profile shall report skews

This fix addresses the current limitation in runtime profile that
skews existing in certain operators such as the rows read counter
(RowsRead) in the scan operators are not reported. A skew condition
exists when the number of rows processed at each operator instance
is not about the same and can be detected through coefficient of variation
(CoV). A high CoV (say > 1.0) usually implies the existence of
skew.

With the fix, such skew is detected for the following counters
  1. RowsRead in HDFS_SCAN_NODE and KUDU_SCAN_NODE
  2. ProbeRows and BuildRows in HASH_JOIN_NODE
  3. RowsReturned in GroupingAggregator, EXCHANGE and SORT_NODE

and reported as follows:
  1. In execution profile, a new skew summary that lists the names
 of the operators with skews;
  2. In the averaged profile for the corresponding operator, the list
 of values of the counter across all fragment instances in the
 backend processes;
  3. Skew detection formula: CoV > limit and mean > 5,000
  4. A new query option 'report_skew_limit'
 < 0: disable skew reporting
 >= 0: enable skew reporting and supply the CoV limit

Examples of skews reported for a hash join and an hdfs scan.

In execution profile:

  ... ...
  skew(s) found at: HASH_JOIN_NODE (id=4), HDFS_SCAN_NODE (id=0)

  Per Node Peak Memory Usage: ...
  ... ...

In averaged profiles:

  HDFS_SCAN_NODE (id=2): ...
  Skew details: RowsRead ([2004992,1724693,2001351],
  stddev/mean=0.07, mean=1910345)

Testing:
1. Added test_skew_reporting_in_runtime_profile in
   test_observability.py to verify that the skews are reported.
2. Ran Core tests successfully.

Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
---
M be/src/runtime/coordinator.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/runtime-profile-counters.h
M be/src/util/runtime-profile.cc
M be/src/util/runtime-profile.h
M be/src/util/stat-util.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M tests/query_test/test_hash_join_timer.py
M tests/query_test/test_observability.py
11 files changed, 243 insertions(+), 17 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/16474/41
--
To view, visit http://gerrit.cloudera.org:8080/16474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
Gerrit-Change-Number: 16474
Gerrit-PatchSet: 41
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews

2020-10-11 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16474 )

Change subject: IMPALA-10178 Run-time profile shall report skews
..


Patch Set 40:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/7418/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/16474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
Gerrit-Change-Number: 16474
Gerrit-PatchSet: 40
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sun, 11 Oct 2020 14:29:09 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews

2020-10-11 Thread Qifan Chen (Code Review)
Qifan Chen has uploaded a new patch set (#40). ( 
http://gerrit.cloudera.org:8080/16474 )

Change subject: IMPALA-10178 Run-time profile shall report skews
..

IMPALA-10178 Run-time profile shall report skews

This fix addresses the current limitation in runtime profile that
skews existing in certain operators such as the rows read counter
(RowsRead) in the scan operators are not reported. A skew condition
exists when the number of rows processed at each operator instance
is not about the same and can be detected through coefficient of variation
(CoV). A high CoV (say > 1.0) usually implies the existence of
skew.

With the fix, such skew is detected for the following counters
  1. RowsRead in HDFS_SCAN_NODE and KUDU_SCAN_NODE
  2. ProbeRows and BuildRows in HASH_JOIN_NODE
  3. RowsReturned in GroupingAggregator, EXCHANGE and SORT_NODE

and reported as follows:
  1. In execution profile, a new skew summary that lists the names
 of the operators with skews;
  2. In the averaged profile for the corresponding operator, the list
 of values of the counter across all fragment instances in the
 backend processes;
  3. Skew detection formula: CoV > limit and mean > 5,000
  4. A new query option 'report_skew_limit'
 < 0: disable skew reporting
 >= 0: enable skew reporting and supply the CoV limit

Examples of skews reported for a hash join and an hdfs scan.

In execution profile:

  ... ...
  skew(s) found at: HASH_JOIN_NODE (id=4), HDFS_SCAN_NODE (id=0)

  Per Node Peak Memory Usage: ...
  ... ...

In averaged profiles:

  HDFS_SCAN_NODE (id=2): ...
  Skew details: RowsRead ([2004992,1724693,2001351],
  stddev/mean=0.07, mean=1910345)

Testing:
1. Added test_skew_reporting_in_runtime_profile in
   test_observability.py to verify that the skews are reported.
2. Ran Core tests successfully.

Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
---
M be/src/runtime/coordinator.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/runtime-profile-counters.h
M be/src/util/runtime-profile.cc
M be/src/util/runtime-profile.h
M be/src/util/stat-util.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M tests/query_test/test_hash_join_timer.py
M tests/query_test/test_observability.py
11 files changed, 241 insertions(+), 17 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/16474/40
--
To view, visit http://gerrit.cloudera.org:8080/16474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
Gerrit-Change-Number: 16474
Gerrit-PatchSet: 40
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews

2020-10-09 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16474 )

Change subject: IMPALA-10178 Run-time profile shall report skews
..


Patch Set 38:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7407/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
Gerrit-Change-Number: 16474
Gerrit-PatchSet: 38
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sat, 10 Oct 2020 02:08:53 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews

2020-10-09 Thread Qifan Chen (Code Review)
Qifan Chen has uploaded a new patch set (#38). ( 
http://gerrit.cloudera.org:8080/16474 )

Change subject: IMPALA-10178 Run-time profile shall report skews
..

IMPALA-10178 Run-time profile shall report skews

This fix addresses the current limitation in runtime profile that
skews existing in certain operators such as the rows read counter
(RowsRead) in the scan operators are not reported. A skew condition
exists when the number of rows processed at each operator instance
is not about the same and can be detected through coefficient of variation
(CoV). A high CoV (say > 1.0) usually implies the existence of
skew.

With the fix, such skew is detected for the following counters
  1. RowsRead in HDFS_SCAN_NODE and KUDU_SCAN_NODE
  2. ProbeRows and BuildRows in HASH_JOIN_NODE
  3. RowsReturned in GroupingAggregator, EXCHANGE and SORT_NODE

and reported as follows:
  1. In execution profile, a new skew summary that lists the names
 of the operators with skews;
  2. In the averaged profile for the corresponding operator, the list
 of values of the counter across all fragment instances in the
 backend processes;
  3. Skew detection formula: CoV > limit and mean > 5,000
  4. A new query option 'report_skew_limit'
 < 0: disable skew reporting
 >= 0: enable skew reporting and supply the CoV limit

Examples of skews reported for a hash join and an hdfs scan.

In execution profile:

  ... ...
  skew(s) found at: HASH_JOIN_NODE (id=4), HDFS_SCAN_NODE (id=0)

  Per Node Peak Memory Usage: ...
  ... ...

In averaged profiles:

  HDFS_SCAN_NODE (id=2): ...
  Skew details: RowsRead ([2004992,1724693,2001351],
  stddev/mean=0.07, mean=1910345)

Testing:
1. Added test_skew_reporting_in_runtime_profile in
   test_observability.py to verify that the skews are reported.
2. Ran Core tests successfully.

Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
---
M be/src/runtime/coordinator.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/runtime-profile-counters.h
M be/src/util/runtime-profile.cc
M be/src/util/runtime-profile.h
M be/src/util/stat-util.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M tests/query_test/test_hash_join_timer.py
M tests/query_test/test_observability.py
11 files changed, 243 insertions(+), 17 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/16474/38
--
To view, visit http://gerrit.cloudera.org:8080/16474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
Gerrit-Change-Number: 16474
Gerrit-PatchSet: 38
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews

2020-10-09 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16474 )

Change subject: IMPALA-10178 Run-time profile shall report skews
..


Patch Set 37:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/7405/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/16474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
Gerrit-Change-Number: 16474
Gerrit-PatchSet: 37
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sat, 10 Oct 2020 00:48:43 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews

2020-10-09 Thread Qifan Chen (Code Review)
Qifan Chen has uploaded a new patch set (#37). ( 
http://gerrit.cloudera.org:8080/16474 )

Change subject: IMPALA-10178 Run-time profile shall report skews
..

IMPALA-10178 Run-time profile shall report skews

This fix addresses the current limitation in runtime profile that
skews existing in certain operators such as the rows read counter
(RowsRead) in the scan operators are not reported. A skew condition
exists when the number of rows processed at each operator instance
is not about the same and can be detected through coefficient of variation
(CoV). A high CoV (say > 1.0) usually implies the existence of
skew.

With the fix, such skew is detected for the following counters
  1. RowsRead in HDFS_SCAN_NODE and KUDU_SCAN_NODE
  2. ProbeRows and BuildRows in HASH_JOIN_NODE
  3. RowsReturned in GroupingAggregator, EXCHANGE and SORT_NODE

and reported as follows:
  1. In execution profile, a new skew summary that lists the names
 of the operators with skews;
  2. In the averaged profile for the corresponding operator, the list
 of values of the counter across all fragment instances in the
 backend processes;
  3. Skew detection formula: CoV > limit and mean > 5,000
  4. A new query option 'report_skew_limit'
 < 0: disable skew reporting
 >= 0: enable skew reporting and supply the CoV limit

Examples of skews reported for a hash join and an hdfs scan.

In execution profile:

  ... ...
  skew(s) found at: HASH_JOIN_NODE (id=4), HDFS_SCAN_NODE (id=0)

  Per Node Peak Memory Usage: ...
  ... ...

In averaged profiles:

  HDFS_SCAN_NODE (id=2): ...
  Skew details: RowsRead ([2004992,1724693,2001351],
  stddev/mean=0.07, mean=1910345)

Testing:
1. Added test_skew_reporting_in_runtime_profile in
   test_observability.py to verify that the skews are reported.
2. Ran Core tests successfully.

Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
---
M be/src/runtime/coordinator.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/runtime-profile-counters.h
M be/src/util/runtime-profile.cc
M be/src/util/runtime-profile.h
M be/src/util/stat-util.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M tests/query_test/test_hash_join_timer.py
M tests/query_test/test_observability.py
11 files changed, 241 insertions(+), 17 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/16474/37
--
To view, visit http://gerrit.cloudera.org:8080/16474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
Gerrit-Change-Number: 16474
Gerrit-PatchSet: 37
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews

2020-10-02 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16474 )

Change subject: IMPALA-10178 Run-time profile shall report skews
..


Patch Set 35:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7346/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
Gerrit-Change-Number: 16474
Gerrit-PatchSet: 35
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 02 Oct 2020 18:36:11 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews

2020-10-02 Thread Qifan Chen (Code Review)
Qifan Chen has uploaded a new patch set (#35). ( 
http://gerrit.cloudera.org:8080/16474 )

Change subject: IMPALA-10178 Run-time profile shall report skews
..

IMPALA-10178 Run-time profile shall report skews

This fix addresses the current limitation in runtime profile that
skews existing in certain operators such as the rows read counter
(RowsRead) in the scan operators are not reported. A skew condition
exists when the number of rows processed at each operator instance
is not about the same and can be detected through coefficient of variation
(CoV). A high CoV (say > 1.0) usually implies the existence of
skew.

With the fix, such skew is detected for the following counters
  1. RowsRead in HDFS_SCAN_NODE and KUDU_SCAN_NODE
  2. ProbeRows and BuildRows in HASH_JOIN_NODE
  3. RowsReturned in GroupingAggregator, EXCHANGE and SORT_NODE

and reported as follows:
  1. In execution profile, a new skew summary that lists the names
 of the operators with skews;
  2. In the averaged profile for the corresponding operator, the list
 of values of the counter across all fragment instances in the
 backend processes;
  3. Skew detection formula: CoV > limit and mean > 5,000
  4. A new query option 'report_skew_limit'
 < 0: disable skew reporting
 >= 0: enable skew reporting and supply the CoV limit

Examples of skews reported for a hash join and an hdfs scan.

In execution profile:

  ... ...
  skew(s) found at: HASH_JOIN_NODE (id=4), HDFS_SCAN_NODE (id=0)

  Per Node Peak Memory Usage: ...
  ... ...

In averaged profiles:

  HDFS_SCAN_NODE (id=2): ...
  Skew details: RowsRead ([2004992,1724693,2001351],
  stddev/mean=0.07, mean=1910345)

Testing:
1. Added test_skew_reporting_in_runtime_profile in
   test_observability.py to verify that the skews are reported.
2. Ran Core tests successfully.

Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
---
M be/src/runtime/coordinator.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/runtime-profile-counters.h
M be/src/util/runtime-profile.cc
M be/src/util/runtime-profile.h
M be/src/util/stat-util.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M tests/query_test/test_hash_join_timer.py
M tests/query_test/test_observability.py
11 files changed, 243 insertions(+), 17 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/16474/35
--
To view, visit http://gerrit.cloudera.org:8080/16474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
Gerrit-Change-Number: 16474
Gerrit-PatchSet: 35
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews

2020-10-01 Thread Sahil Takiar (Code Review)
Sahil Takiar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16474 )

Change subject: IMPALA-10178 Run-time profile shall report skews
..


Patch Set 31:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16474/31/be/src/util/runtime-profile.cc
File be/src/util/runtime-profile.cc:

http://gerrit.cloudera.org:8080/#/c/16474/31/be/src/util/runtime-profile.cc@1949
PS31, Line 1949:   if (stddev > 5) {
> Oh, that is very interesting result. Can you send me the data?
If its useful for future work, why not add it later on? Or at least don't 
expose it in the runtime profile.

my concern is that when aggressive mode is enabled, most queries will report 
skew, and then customers will start complaining asking why their queries are 
skewed, even though the problem is actually not that serious, or that is just 
expected behavior.



--
To view, visit http://gerrit.cloudera.org:8080/16474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
Gerrit-Change-Number: 16474
Gerrit-PatchSet: 31
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 01 Oct 2020 18:02:58 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews

2020-10-01 Thread Qifan Chen (Code Review)
Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16474 )

Change subject: IMPALA-10178 Run-time profile shall report skews
..


Patch Set 34:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16474/31/be/src/util/runtime-profile.cc
File be/src/util/runtime-profile.cc:

http://gerrit.cloudera.org:8080/#/c/16474/31/be/src/util/runtime-profile.cc@1949
PS31, Line 1949: template 
> Still not seeing the use of having an aggressive mode like this. I applied
Oh, that is very interesting result. Can you send me the data?

The purpose of the aggressive mode is to report all possible skews (excluding 
scans), to be useful for skew busting work in the future.



--
To view, visit http://gerrit.cloudera.org:8080/16474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
Gerrit-Change-Number: 16474
Gerrit-PatchSet: 34
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 01 Oct 2020 17:49:39 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews

2020-10-01 Thread Sahil Takiar (Code Review)
Sahil Takiar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16474 )

Change subject: IMPALA-10178 Run-time profile shall report skews
..


Patch Set 34:

> Uploaded patch set 34: Patch Set 33 was rebased.

Whoops, sorry ignore this. I was testing this locally and looks like I 
accidentally rebased it.


--
To view, visit http://gerrit.cloudera.org:8080/16474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
Gerrit-Change-Number: 16474
Gerrit-PatchSet: 34
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 01 Oct 2020 17:42:57 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews

2020-10-01 Thread Sahil Takiar (Code Review)
Sahil Takiar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16474 )

Change subject: IMPALA-10178 Run-time profile shall report skews
..


Patch Set 34:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16474/31/be/src/util/runtime-profile.cc
File be/src/util/runtime-profile.cc:

http://gerrit.cloudera.org:8080/#/c/16474/31/be/src/util/runtime-profile.cc@1949
PS31, Line 1949: template 
> Done. Scan nodes are excluded from aggressive reporting.
Still not seeing the use of having an aggressive mode like this. I applied the 
most recent version of the patch locally and I tested this out a bit. I ran 
about 75 TPC-DS queries against the mini-cluster (so a 1 GB dataset) using 
Parquet. About 5 queries report skew with the default skew threshold (which 
seems reasonable), but 68 queries report skew in the aggressive mode.



--
To view, visit http://gerrit.cloudera.org:8080/16474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
Gerrit-Change-Number: 16474
Gerrit-PatchSet: 34
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 01 Oct 2020 17:42:36 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews

2020-09-29 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16474 )

Change subject: IMPALA-10178 Run-time profile shall report skews
..


Patch Set 33:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7316/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
Gerrit-Change-Number: 16474
Gerrit-PatchSet: 33
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 30 Sep 2020 00:37:12 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews

2020-09-29 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16474 )

Change subject: IMPALA-10178 Run-time profile shall report skews
..


Patch Set 32:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7315/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
Gerrit-Change-Number: 16474
Gerrit-PatchSet: 32
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 30 Sep 2020 00:33:37 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews

2020-09-29 Thread Qifan Chen (Code Review)
Qifan Chen has uploaded a new patch set (#33). ( 
http://gerrit.cloudera.org:8080/16474 )

Change subject: IMPALA-10178 Run-time profile shall report skews
..

IMPALA-10178 Run-time profile shall report skews

This fix addresses the current limitation in runtime profile that
skews existing in certain operators such as the rows read counter
(RowsRead) in the scan operators are not reported. A skew condition
exists when the number of rows processed at each operator instance
is not about the same and can be detected through standard deviation
(stddev). A high stddev (say > 5) usually implies the existence of
skew.

With the fix, such skew is detected for the following counters
  1. RowsRead in HDFS_SCAN_NODE and KUDU_SCAN_NODE
  2. ProbeRows and BuildRows in HASH_JOIN_NODE
  3. RowsReturned in GroupingAggregator, EXCHANGE and SORT_NODE

and reported as follows:
  1. In execution profile, a new skew summary that lists the names
 of the operators with skews;
  2. In the averaged profile for the corresponding operator, the list
 of values of the counter across all fragment instances in the
 backend processes, and the skew detection formula;
  3. Skew detection formula:
 a: stddev/mean > 0.05 and mean > 1,000,000, for conservative
reporting
 b: stddev > 5, for aggressive reporting on non-scan nodes
  4. A new query option 'report_skew'
 = -1, 0: disable skew reporting
 = 1: enable conservative skew reporting (default)
 = 2: enable aggressive skew reporting

Examples of skews reported for a hash join and an hdfs scan.

In execution profile:

  ... ...
  skew(s) found at: HASH_JOIN_NODE (id=4), HDFS_SCAN_NODE (id=0)
  Per Node Peak Memory Usage: ...
  Per Node Bytes Read: ...
  Per Node User Time: ...
  Per Node System Time:
  ... ...

In averaged profiles(conservative reporting):

  HDFS_SCAN_NODE (id=2): ...
  Skew details: RowsRead ([2004992,1724693,2001351],
  stddev/mean=0.07, mean=1910345)

In averaged profiles (aggressive reporting):

  HASH_JOIN_NODE (id=4): ...
  Skew details: ProbeRows ([16904,17750,19197],
   stddev=946.77)

Testing:
1. Added test_aggressive_skew_reporting_in_runtime_profile and
   test_conservative_skew_reporting_in_runtime_profile in
   test_observability.py to verify that the skews are reported.
2. Ran Core tests successfully.

Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
---
M be/src/runtime/coordinator.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/runtime-profile-counters.h
M be/src/util/runtime-profile.cc
M be/src/util/runtime-profile.h
M be/src/util/stat-util.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M tests/query_test/test_hash_join_timer.py
M tests/query_test/test_observability.py
11 files changed, 296 insertions(+), 17 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/16474/33
--
To view, visit http://gerrit.cloudera.org:8080/16474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
Gerrit-Change-Number: 16474
Gerrit-PatchSet: 33
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews

2020-09-29 Thread Qifan Chen (Code Review)
Qifan Chen has uploaded a new patch set (#32). ( 
http://gerrit.cloudera.org:8080/16474 )

Change subject: IMPALA-10178 Run-time profile shall report skews
..

IMPALA-10178 Run-time profile shall report skews

This fix addresses the current limitation in runtime profile that
skews existing in certain operators such as the rows read counter
(RowsRead) in the scan operators are not reported. A skew condition
exists when the number of rows processed at each operator instance
is not about the same and can be detected through standard deviation
(stddev). A high stddev (say > 5) usually implies the existence of
skew.

With the fix, such skew is detected for the following counters
  1. RowsRead in HDFS_SCAN_NODE and KUDU_SCAN_NODE
  2. ProbeRows and BuildRows in HASH_JOIN_NODE
  3. RowsReturned in GroupingAggregator, EXCHANGE and SORT_NODE

and reported as follows:
  1. In execution profile, a new skew summary that lists the names
 of the operators with skews;
  2. In the averaged profile for the corresponding operator, the list
 of values of the counter across all fragment instances in the
 backend processes, and the skew detection formula;
  3. Skew detection formula:
 a: stddev/mean > 0.05 and mean > 1,000,000, for conservative reporting
 b: stddev > 5, for aggressive reporting on non-scan nodes
  4. A new query option 'report_skew'
 = -1, 0: disable skew reporting
 = 1: enable conservative skew reporting (default)
 = 2: enable aggressive skew reporting

Examples of skews reported for a hash join and an hdfs scan.

In execution profile:

  ... ...
  skew(s) found at: HASH_JOIN_NODE (id=4), HDFS_SCAN_NODE (id=0)
  Per Node Peak Memory Usage: ...
  Per Node Bytes Read: ...
  Per Node User Time: ...
  Per Node System Time:
  ... ...

In averaged profiles(conservative reporting):

  HDFS_SCAN_NODE (id=2): ...
  Skew details: RowsRead ([2004992,1724693,2001351],
  stddev/mean=0.07, mean=1910345)

In averaged profiles (aggressive reporting):

  HASH_JOIN_NODE (id=4): ...
  Skew details: ProbeRows ([16904,17750,19197],
   stddev=946.77)

Testing:
1. Added test_aggressive_skew_reporting_in_runtime_profile and
   test_conservative_skew_reporting_in_runtime_profile in
   test_observability.py to verify that the skews are reported.
2. Ran Core tests successfully.

Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
---
M be/src/runtime/coordinator.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/runtime-profile-counters.h
M be/src/util/runtime-profile.cc
M be/src/util/runtime-profile.h
M be/src/util/stat-util.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M tests/query_test/test_hash_join_timer.py
M tests/query_test/test_observability.py
11 files changed, 296 insertions(+), 17 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/16474/32
--
To view, visit http://gerrit.cloudera.org:8080/16474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
Gerrit-Change-Number: 16474
Gerrit-PatchSet: 32
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews

2020-09-29 Thread Sahil Takiar (Code Review)
Sahil Takiar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16474 )

Change subject: IMPALA-10178 Run-time profile shall report skews
..


Patch Set 31:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/16474/21/be/src/util/runtime-profile.cc
File be/src/util/runtime-profile.cc:

http://gerrit.cloudera.org:8080/#/c/16474/21/be/src/util/runtime-profile.cc@1928
PS21, Line 1928: double stddev = 0.0;
> To report severe skews only for impala, maybe we can use CV (instead of 
> stddev) as a threshold. Say
cv > 5% && mean over 1 million.

Makes sense to me.


http://gerrit.cloudera.org:8080/#/c/16474/31/be/src/util/runtime-profile.cc
File be/src/util/runtime-profile.cc:

http://gerrit.cloudera.org:8080/#/c/16474/31/be/src/util/runtime-profile.cc@1949
PS31, Line 1949:   if (stddev > 5) {
I'm still not sure how useful this is. Even if it is moved to an "aggressive" 
option. I can see it still leading to a lot of false positives.

Any chance that when writing this you actually meant to calculate the z-score 
(the number of standard deviations by which a value is above or below the 
mean). I've seen references where outlier detection algorithms check if the 
z-score is greater than 3 (or in this case 5).



--
To view, visit http://gerrit.cloudera.org:8080/16474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
Gerrit-Change-Number: 16474
Gerrit-PatchSet: 31
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 29 Sep 2020 20:02:32 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews

2020-09-28 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16474 )

Change subject: IMPALA-10178 Run-time profile shall report skews
..


Patch Set 31:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7300/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
Gerrit-Change-Number: 16474
Gerrit-PatchSet: 31
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 28 Sep 2020 17:00:02 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews

2020-09-28 Thread Qifan Chen (Code Review)
Qifan Chen has uploaded a new patch set (#31). ( 
http://gerrit.cloudera.org:8080/16474 )

Change subject: IMPALA-10178 Run-time profile shall report skews
..

IMPALA-10178 Run-time profile shall report skews

This fix addresses the current limitation in runtime profile that
skews existing in certain operators such as the rows read counter
(RowsRead) in the scan operators are not reported. A skew condition
exists when the number of rows processed at each operator instance
is not about the same and can be detected through standard deviation
(stddev). A high stddev (say > 5) usually implies the existence of
skew.

With the fix, such skew is detected for the following counters
  1. RowsRead in HDFS_SCAN_NODE and KUDU_SCAN_NODE
  2. ProbeRows and BuildRows in HASH_JOIN_NODE
  3. RowsReturned in GroupingAggregator, EXCHANGE and SORT_NODE

and reported as follows:
  1. In execution profile, a new skew summary that lists the names
 of the operators with skews;
  2. In the averaged profile for the corresponding operator, the list
 of values of the counter across all fragment instances in the
 backend processes, and the skew detection formula;
  3. Skew detection formula:
 a: stddev/mean > 0.05 and mean > 1,000,000, for conservative reporting
 b: stddev > 5, for aggressive reporting
  4. A new query option 'report_skew'
 = -1, 0: disable skew reporting
 = 1: enable conservative skew reporting (default)
 = 2: enable aggressive skew reporting

Examples of skews reported for a hash join and an hdfs scan.

In execution profile:

  ... ...
  skew(s) found at: HASH_JOIN_NODE (id=4), HDFS_SCAN_NODE (id=0)
  Per Node Peak Memory Usage: ...
  Per Node Bytes Read: ...
  Per Node User Time: ...
  Per Node System Time:
  ... ...

In averaged profiles(conservative reporting):

  HDFS_SCAN_NODE (id=2): ...
  Skew details: RowsRead ([2004992,1724693,2001351],
  stddev/mean=0.07, mean=1910345)

In averaged profiles (aggressive reporting):

  HASH_JOIN_NODE (id=4): ...
  Skew details: ProbeRows ([16904,17750,19197],
   stddev=946.77)

  HDFS_SCAN_NODE (id=0): ...
  Skew details: RowsRead ([913887,917913,1048604],
  stddev=62578.85)

Testing:
1. Added test_aggressive_skew_reporting_in_runtime_profile and
   test_conservative_skew_reporting_in_runtime_profile in
   test_observability.py to verify that the skews are reported.
2. Ran Core tests successfully.

Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
---
M be/src/runtime/coordinator.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/runtime-profile-counters.h
M be/src/util/runtime-profile.cc
M be/src/util/runtime-profile.h
M be/src/util/stat-util.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M tests/query_test/test_hash_join_timer.py
M tests/query_test/test_observability.py
11 files changed, 295 insertions(+), 17 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/16474/31
--
To view, visit http://gerrit.cloudera.org:8080/16474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
Gerrit-Change-Number: 16474
Gerrit-PatchSet: 31
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews

2020-09-28 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16474 )

Change subject: IMPALA-10178 Run-time profile shall report skews
..


Patch Set 30:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7297/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
Gerrit-Change-Number: 16474
Gerrit-PatchSet: 30
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 28 Sep 2020 14:27:01 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews

2020-09-28 Thread Qifan Chen (Code Review)
Qifan Chen has uploaded a new patch set (#30). ( 
http://gerrit.cloudera.org:8080/16474 )

Change subject: IMPALA-10178 Run-time profile shall report skews
..

IMPALA-10178 Run-time profile shall report skews

This fix addresses the current limitation in runtime profile that
skews existing in certain operators such as the rows read counter
(RowsRead) in the scan operators are not reported. A skew condition
exists when the number of rows processed at each operator instance
is not about the same and can be detected through standard deviation
(stddev). A high stddev (say > 5) usually implies the existence of
skew.

With the fix, such skew is detected for the following counters
  1. RowsRead in HDFS_SCAN_NODE and KUDU_SCAN_NODE
  2. ProbeRows and BuildRows in HASH_JOIN_NODE
  3. RowsReturned in GroupingAggregator, EXCHANGE and SORT_NODE

and reported as follows:
  1. In execution profile, a new skew summary that lists the names
 of the operators with skews;
  2. In the averaged profile for the corresponding operator, the list
 of values of the counter across all fragment instances in the
 backend processes, and the skew detection formula;
  3. Skew detection formula:
 a: stddev/mean > 0.05 and mean > 1,000,000, for conservative reporting
 b: stddev > 5, for aggressive reporting
  4. A new query option 'report_skew'
 = -1, 0: disable skew reporting
 = 1: enable conservative skew reporting (default)
 = 2: enable aggressive skew reporting

Examples of skews reported for a hash join and an hdfs scan.

In execution profile:

  ... ...
  skew(s) found at: HASH_JOIN_NODE (id=4), HDFS_SCAN_NODE (id=0)
  Per Node Peak Memory Usage: ...
  Per Node Bytes Read: ...
  Per Node User Time: ...
  Per Node System Time:
  ... ...

In averaged profiles(conservative reporting):

  HDFS_SCAN_NODE (id=2): ...
  Skew details: RowsRead ([2004992,1724693,2001351],
  stddev/mean=0.07, mean=1910345)

In averaged profiles (aggressive reporting):

  HASH_JOIN_NODE (id=4): ...
  Skew details: ProbeRows ([16904,17750,19197],
   stddev=946.77)

  HDFS_SCAN_NODE (id=0): ...
  Skew details: RowsRead ([913887,917913,1048604],
  stddev=62578.85)

Testing:
1. Added test_aggressive_skew_reporting_in_runtime_profile and
   test_conservative_skew_reporting_in_runtime_profile in
   test_observability.py to verify that the skews are reported.
2. Ran Core tests successfully.

Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
---
M be/src/runtime/coordinator.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/runtime-profile-counters.h
M be/src/util/runtime-profile.cc
M be/src/util/runtime-profile.h
M be/src/util/stat-util.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M tests/query_test/test_hash_join_timer.py
M tests/query_test/test_observability.py
11 files changed, 290 insertions(+), 17 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/16474/30
--
To view, visit http://gerrit.cloudera.org:8080/16474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
Gerrit-Change-Number: 16474
Gerrit-PatchSet: 30
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews

2020-09-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16474 )

Change subject: IMPALA-10178 Run-time profile shall report skews
..


Patch Set 29:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7293/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
Gerrit-Change-Number: 16474
Gerrit-PatchSet: 29
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sun, 27 Sep 2020 21:36:08 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews

2020-09-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16474 )

Change subject: IMPALA-10178 Run-time profile shall report skews
..


Patch Set 28:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7292/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
Gerrit-Change-Number: 16474
Gerrit-PatchSet: 28
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sun, 27 Sep 2020 21:35:37 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews

2020-09-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16474 )

Change subject: IMPALA-10178 Run-time profile shall report skews
..


Patch Set 27:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7291/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
Gerrit-Change-Number: 16474
Gerrit-PatchSet: 27
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sun, 27 Sep 2020 21:30:41 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews

2020-09-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16474 )

Change subject: IMPALA-10178 Run-time profile shall report skews
..


Patch Set 26:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7290/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
Gerrit-Change-Number: 16474
Gerrit-PatchSet: 26
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sun, 27 Sep 2020 21:18:35 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews

2020-09-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16474 )

Change subject: IMPALA-10178 Run-time profile shall report skews
..


Patch Set 29:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16474/29/be/src/util/runtime-profile-counters.h
File be/src/util/runtime-profile-counters.h:

http://gerrit.cloudera.org:8080/#/c/16474/29/be/src/util/runtime-profile-counters.h@416
PS29, Line 416:   ///  Input argument 'option':
line has trailing whitespace



--
To view, visit http://gerrit.cloudera.org:8080/16474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
Gerrit-Change-Number: 16474
Gerrit-PatchSet: 29
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sun, 27 Sep 2020 21:16:05 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews

2020-09-27 Thread Qifan Chen (Code Review)
Qifan Chen has uploaded a new patch set (#29). ( 
http://gerrit.cloudera.org:8080/16474 )

Change subject: IMPALA-10178 Run-time profile shall report skews
..

IMPALA-10178 Run-time profile shall report skews

This fix addresses the current limitation in runtime profile that
skews existing in certain operators such as the rows read counter
(RowsRead) in the scan operators are not reported. A skew condition
exists when the number of rows processed at each operator instance
is not about the same and can be detected through standard deviation
(stddev). A high stddev (say > 5) usually implies the existence of
skew.

With the fix, such skew is detected for the following counters
  1. RowsRead in HDFS_SCAN_NODE and KUDU_SCAN_NODE
  2. ProbeRows and BuildRows in HASH_JOIN_NODE
  3. RowsReturned in GroupingAggregator, EXCHANGE and SORT_NODE

and reported as follows:
  1. In execution profile, a new skew summary that lists the names
 of the operators with skews;
  2. In the averaged profile for the corresponding operator, the list
 of values of the counter across all fragment instances in the
 backend processes, and the skew detection formula;
  3. Skew detection formula:
 a: stddev/mean > 0.05 and mean > 1,000,000, for conservative reporting
 b: stddev > 5, for aggressive reporting
  4. A new query option 'report_skew'
 = -1, 0: disable skew reporting
 = 1: enable conservative skew reporting (default)
 = 2: enable aggressive skew reporting

Examples of skews reported for a hash join and an hdfs scan.

In execution profile:

  ... ...
  skew(s) found at: HASH_JOIN_NODE (id=4), HDFS_SCAN_NODE (id=0)
  Per Node Peak Memory Usage: ...
  Per Node Bytes Read: ...
  Per Node User Time: ...
  Per Node System Time:
  ... ...

In averaged profiles(conservative reporting):

  HDFS_SCAN_NODE (id=2): ...
  Skew details: RowsRead ([2004992,1724693,2001351],
  stddev/mean=0.07, mean=1910345)

In averaged profiles (aggressive reporting):

  HASH_JOIN_NODE (id=4): ...
  Skew details: ProbeRows ([16904,17750,19197],
   stddev=946.77)

  HDFS_SCAN_NODE (id=0): ...
  Skew details: RowsRead ([913887,917913,1048604],
  stddev=62578.85)

Testing:
1. Added test_aggressive_skew_reporting_in_runtime_profile and
   test_conservative_skew_reporting_in_runtime_profile in
   test_observability.py to verify that the skews are reported.
2. Ran Core tests successfully.

Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
---
M be/src/runtime/coordinator.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/runtime-profile-counters.h
M be/src/util/runtime-profile.cc
M be/src/util/runtime-profile.h
M be/src/util/stat-util.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M tests/query_test/test_hash_join_timer.py
M tests/query_test/test_observability.py
11 files changed, 290 insertions(+), 17 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/16474/29
--
To view, visit http://gerrit.cloudera.org:8080/16474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
Gerrit-Change-Number: 16474
Gerrit-PatchSet: 29
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews

2020-09-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16474 )

Change subject: IMPALA-10178 Run-time profile shall report skews
..


Patch Set 28:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16474/28/be/src/util/runtime-profile-counters.h
File be/src/util/runtime-profile-counters.h:

http://gerrit.cloudera.org:8080/#/c/16474/28/be/src/util/runtime-profile-counters.h@416
PS28, Line 416:   ///  Input argument 'option':
line has trailing whitespace



--
To view, visit http://gerrit.cloudera.org:8080/16474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
Gerrit-Change-Number: 16474
Gerrit-PatchSet: 28
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sun, 27 Sep 2020 21:14:39 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews

2020-09-27 Thread Qifan Chen (Code Review)
Qifan Chen has uploaded a new patch set (#28). ( 
http://gerrit.cloudera.org:8080/16474 )

Change subject: IMPALA-10178 Run-time profile shall report skews
..

IMPALA-10178 Run-time profile shall report skews

This fix addresses the current limitation in runtime profile that
skews existing in certain operators such as the rows read counter
(RowsRead) in the scan operators are not reported. A skew condition
exists when the number of rows processed at each operator instance
is not about the same and can be detected through standard deviation
(stddev). A high stddev (say > 5) usually implies the existence of
skew.

With the fix, such skew is detected for the following counters
  1. RowsRead in HDFS_SCAN_NODE and KUDU_SCAN_NODE
  2. ProbeRows and BuildRows in HASH_JOIN_NODE
  3. RowsReturned in GroupingAggregator, EXCHANGE and SORT_NODE

and reported as follows:
  1. In execution profile, a new skew summary that lists the names
 of the operators with skews;
  2. In the averaged profile for the corresponding operator, the list
 of values of the counter across all fragment instances in the
 backend processes, and the skew detection formula;
  3. Skew detection formula:
 a: stddev/mean > 0.05 and mean > 1,000,000, for conservative reporting
 b: stddev > 5, for aggressive reporting
  4. A new query option 'report_skew'
 = -1, 0: disable skew reporting
 = 1: enable conservative skew reporting (default)
 = 2: enable aggressive skew reporting

Examples of skews reported for a hash join and an hdfs scan.

In execution profile:

  ... ...
  skew(s) found at: HASH_JOIN_NODE (id=4), HDFS_SCAN_NODE (id=0)
  Per Node Peak Memory Usage: ...
  Per Node Bytes Read: ...
  Per Node User Time: ...
  Per Node System Time:
  ... ...

In averaged profiles(conservative reporting):

  HDFS_SCAN_NODE (id=2): ...

  Skew details: RowsRead ([2004992,1724693,2001351],
  stddev/mean=0.07, mean=1910345)

In averaged profiles (aggressive reporting):

  HASH_JOIN_NODE (id=4): ...
  Skew details: ProbeRows ([16904,17750,19197],
   stddev=946.77)
... ...

  HDFS_SCAN_NODE (id=0): ...
  Skew details: RowsRead ([913887,917913,1048604],
  stddev=62578.85)

Testing:
1. Added test_aggressive_skew_reporting_in_runtime_profile and
   test_conservative_skew_reporting_in_runtime_profile in
   test_observability.py to verify that the skews are reported.
2. Ran Core tests successfully.

Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
---
M be/src/runtime/coordinator.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/runtime-profile-counters.h
M be/src/util/runtime-profile.cc
M be/src/util/runtime-profile.h
M be/src/util/stat-util.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M tests/query_test/test_hash_join_timer.py
M tests/query_test/test_observability.py
11 files changed, 290 insertions(+), 17 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/16474/28
--
To view, visit http://gerrit.cloudera.org:8080/16474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
Gerrit-Change-Number: 16474
Gerrit-PatchSet: 28
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews

2020-09-27 Thread Qifan Chen (Code Review)
Qifan Chen has uploaded a new patch set (#27). ( 
http://gerrit.cloudera.org:8080/16474 )

Change subject: IMPALA-10178 Run-time profile shall report skews
..

IMPALA-10178 Run-time profile shall report skews

This fix addresses the current limitation in runtime profile that
skews existing in certain operators such as the rows read counter
(RowsRead) in the scan operators are not reported. A skew condition
exists when the number of rows processed at each operator instance
is not about the same and can be detected through standard deviation
(stddev). A high stddev (say > 5) usually implies the existence of
skew.

With the fix, such skew is detected for the following counters
  1. RowsRead in HDFS_SCAN_NODE and KUDU_SCAN_NODE
  2. ProbeRows and BuildRows in HASH_JOIN_NODE
  3. RowsReturned in GroupingAggregator, EXCHANGE and SORT_NODE

and reported as follows:
  1. In execution profile, a new skew summary that lists the names
 of the operators with skews;
  2. In the averaged profile for the corresponding operator, the list
 of values of the counter across all fragment instances in the
 backend processes, and the skew detection formula;
  3. Skew detection formula:
 a: stddev/mean > 0.05 and mean > 1,000,000, for conservative reporting
 b: stddev > 5, for aggressive reporting
  4. A new query option 'report_skew'
 = -1, 0: disable skew reporting
 = 1: enable conservative skew report (default)
 = 2: enable aggressive skew report

Examples of skews reported for a hash join and an hdfs scan.

In execution profile:

  ... ...
  skew(s) found at: HASH_JOIN_NODE (id=4), HDFS_SCAN_NODE (id=0)
  Per Node Peak Memory Usage: ...
  Per Node Bytes Read: ...
  Per Node User Time: ...
  Per Node System Time:
  ... ...

In averaged profiles(conservative reporting):

  HDFS_SCAN_NODE (id=2): ...

  Skew details: RowsRead ([2004992,1724693,2001351],
  stddev/mean=0.07, mean=1910345)

In averaged profiles (aggressive reporting):

  HASH_JOIN_NODE (id=4): ...
  Skew details: ProbeRows ([16904,17750,19197],
   stddev=946.77)
... ...

  HDFS_SCAN_NODE (id=0): ...
  Skew details: RowsRead ([913887,917913,1048604],
  stddev=62578.85)

Testing:
1. Added test_aggressive_skew_reporting_in_runtime_profile and
   test_conservative_skew_reporting_in_runtime_profile in
   test_observability.py to verify that the skews are reported.
2. Ran Core tests successfully.

Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
---
M be/src/runtime/coordinator.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/runtime-profile-counters.h
M be/src/util/runtime-profile.cc
M be/src/util/runtime-profile.h
M be/src/util/stat-util.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M tests/query_test/test_hash_join_timer.py
M tests/query_test/test_observability.py
11 files changed, 290 insertions(+), 17 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/16474/27
--
To view, visit http://gerrit.cloudera.org:8080/16474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
Gerrit-Change-Number: 16474
Gerrit-PatchSet: 27
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews

2020-09-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16474 )

Change subject: IMPALA-10178 Run-time profile shall report skews
..


Patch Set 27:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16474/27/be/src/util/runtime-profile-counters.h
File be/src/util/runtime-profile-counters.h:

http://gerrit.cloudera.org:8080/#/c/16474/27/be/src/util/runtime-profile-counters.h@416
PS27, Line 416:   ///  Input argument 'option':
line has trailing whitespace



--
To view, visit http://gerrit.cloudera.org:8080/16474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
Gerrit-Change-Number: 16474
Gerrit-PatchSet: 27
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sun, 27 Sep 2020 21:13:31 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews

2020-09-27 Thread Qifan Chen (Code Review)
Qifan Chen has uploaded a new patch set (#26). ( 
http://gerrit.cloudera.org:8080/16474 )

Change subject: IMPALA-10178 Run-time profile shall report skews
..

IMPALA-10178 Run-time profile shall report skews

This fix addresses the current limitation in runtime profile that
skews existing in certain operators such as the rows read counter
(RowsRead) in the scan operators are not reported. A skew condition
exists when the number of rows processed at each operator instance
is not about the same and can be detected through standard deviation
(stddev). A high stddev (say > 5) usually implies the existence of
skew.

With the fix, such skew is detected for the following counters
  1. RowsRead in HDFS_SCAN_NODE and KUDU_SCAN_NODE
  2. ProbeRows and BuildRows in HASH_JOIN_NODE
  3. RowsReturned in GroupingAggregator, EXCHANGE and SORT_NODE

and reported as follows:
  1. In execution profile, a new skew summary that lists the names
 of the operators with skews;
  2. In the averaged profile for the corresponding operator, the list
 of values of the counter across all fragment instances in the
 backend processes, and the skew detection formula;
  3. Skew detection formula:
 a: stddev/mean > 0.05 and mean > 1,000,000, for conservative reporting
 b: stddev > 5, for aggressive reporting
  4. A new query option 'report_skew'
 = -1, 0: disable skew reporting
 = 1: report skew conservatively (default)
 = 2: report skew aggressively

Examples of skews reported for a hash join and an hdfs scan.

In execution profile:

  ... ...
  skew(s) found at: HASH_JOIN_NODE (id=4), HDFS_SCAN_NODE (id=0)
  Per Node Peak Memory Usage: ...
  Per Node Bytes Read: ...
  Per Node User Time: ...
  Per Node System Time:
  ... ...

In averaged profiles:

  HASH_JOIN_NODE (id=4): ...
  Skew details: ProbeRows ([16904,17750,19197],
   stddev=946.77)
... ...

  HDFS_SCAN_NODE (id=0): ...
  Skew details: RowsRead ([913887,917913,1048604],
  stddev=62578.85)

Testing:
1. Added test_aggressive_skew_reporting_in_runtime_profile and
   test_conservative_skew_reporting_in_runtime_profile in
   test_observability.py to verify that the skews are reported.
2. Ran Core tests successfully.

Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
---
M be/src/runtime/coordinator.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/runtime-profile-counters.h
M be/src/util/runtime-profile.cc
M be/src/util/runtime-profile.h
M be/src/util/stat-util.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M tests/query_test/test_hash_join_timer.py
M tests/query_test/test_observability.py
11 files changed, 285 insertions(+), 17 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/16474/26
--
To view, visit http://gerrit.cloudera.org:8080/16474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
Gerrit-Change-Number: 16474
Gerrit-PatchSet: 26
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews

2020-09-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16474 )

Change subject: IMPALA-10178 Run-time profile shall report skews
..


Patch Set 25:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7289/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
Gerrit-Change-Number: 16474
Gerrit-PatchSet: 25
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sun, 27 Sep 2020 18:48:06 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews

2020-09-27 Thread Qifan Chen (Code Review)
Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16474 )

Change subject: IMPALA-10178 Run-time profile shall report skews
..


Patch Set 25:

Add a new query option 'report_skew' to specify the skew reporting mode.

-1,0: disable reporting at all;
1: conservatively: use the formula stddev/mean > 0.05 && mean > 1,000,000, 
default
2: aggressively: use the formula stddev > 5


--
To view, visit http://gerrit.cloudera.org:8080/16474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
Gerrit-Change-Number: 16474
Gerrit-PatchSet: 25
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sun, 27 Sep 2020 18:28:53 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews

2020-09-27 Thread Qifan Chen (Code Review)
Qifan Chen has uploaded a new patch set (#25). ( 
http://gerrit.cloudera.org:8080/16474 )

Change subject: IMPALA-10178 Run-time profile shall report skews
..

IMPALA-10178 Run-time profile shall report skews

This fix addresses the current limitation in runtime profile that
skews existing in certain operators such as the rows read counter
(RowsRead) in the scan operators are not reported. A skew condition
exists when the number of rows processed at each operator instance
is not about the same and can be detected through standard deviation
(stddev). A high stddev (say > 5) usually implies the existence of
skew.

With the fix, such skew is detected for the following counters

  1. RowsRead in HDFS_SCAN_NODE and KUDU_SCAN_NODE profile
  2. ProbeRows and BuildRows in HASH_JOIN_NODE profile
  3. RowsReturned in GroupingAggregator, EXCHANGE and SORT_NODE
 profile

and reported as follows:

  1. In a new skew summary in execution profile that lists the names
 of the operators with skews;
  2. In each corresponding operator in the averaged profile for a
 fragment, the name of the counter, the list of values of the
 counter across all fragment instances in the backend processes,
 and the stddev value.

Examples of skews reported for a hash join and an hdfs scan.

In execution profile:

  ... ...
  skew(s) found at: HASH_JOIN_NODE (id=4), HDFS_SCAN_NODE (id=0)
  Per Node Peak Memory Usage: ...
  Per Node Bytes Read: ...
  Per Node User Time: ...
  Per Node System Time:
  ... ...

In averaged profiles:

  HASH_JOIN_NODE (id=4): ...
  Skew details: ProbeRows ([16904,17750,19197],
   stddev=946.77)
... ...

  HDFS_SCAN_NODE (id=0): ...
  Skew details: RowsRead ([913887,917913,1048604],
  stddev=62578.85)

The reporting is controlled by a new query option 'report_skew' as
follows.
  -1, 0: disabled
  1: report skew conservatively, iff stddev/mean > 0.05 and mean > 1,000,000
  2: report skew aggressively, iff stddev > 5.

Testing:
1. Added a new test test_skew_reporting_in_runtime_profile in
   test_observability.py to verify that the skews are reported.
2. Ran Core tests successfully.

Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
---
M be/src/runtime/coordinator.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/runtime-profile-counters.h
M be/src/util/runtime-profile.cc
M be/src/util/runtime-profile.h
M be/src/util/stat-util.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M tests/query_test/test_hash_join_timer.py
M tests/query_test/test_observability.py
11 files changed, 260 insertions(+), 17 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/16474/25
--
To view, visit http://gerrit.cloudera.org:8080/16474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
Gerrit-Change-Number: 16474
Gerrit-PatchSet: 25
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews

2020-09-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16474 )

Change subject: IMPALA-10178 Run-time profile shall report skews
..


Patch Set 24:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7286/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
Gerrit-Change-Number: 16474
Gerrit-PatchSet: 24
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sat, 26 Sep 2020 00:20:17 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews

2020-09-25 Thread Qifan Chen (Code Review)
Qifan Chen has uploaded a new patch set (#24). ( 
http://gerrit.cloudera.org:8080/16474 )

Change subject: IMPALA-10178 Run-time profile shall report skews
..

IMPALA-10178 Run-time profile shall report skews

This fix addresses the current limitation in runtime profile that
skews existing in certain operators such as the rows read counter
(RowsRead) in the scan operators are not reported. A skew condition
exists when the number of rows processed at each operator instance
is not about the same and can be detected through standard deviation
(stddev). A high stddev (say > 5) usually implies the existence of
skew.

With the fix, such skew is detected for the following counters

  1. RowsRead in HDFS_SCAN_NODE and KUDU_SCAN_NODE profile
  2. ProbeRows and BuildRows in HASH_JOIN_NODE profile
  3. RowsReturned in GroupingAggregator, EXCHANGE and SORT_NODE
 profile

and reported as follows:

  1. In a new skew summary in execution profile that lists the names
 of the operators with skews;
  2. In each corresponding operator in the averaged profile for a
 fragment, the name of the counter, the list of values of the
 counter across all fragment instances in the backend processes,
 and the stddev value.

Examples of skews reported for a hash join and an hdfs scan.

In execution profile:

  ... ...
  skew(s) found at: HASH_JOIN_NODE (id=4), HDFS_SCAN_NODE (id=0)
  Per Node Peak Memory Usage: ...
  Per Node Bytes Read: ...
  Per Node User Time: ...
  Per Node System Time:
  ... ...

In averaged profiles:

  HASH_JOIN_NODE (id=4): ...
  Skew details: ProbeRows ([16904,17750,19197],
   stddev=946.77)
... ...

  HDFS_SCAN_NODE (id=0): ...
  Skew details: RowsRead ([913887,917913,1048604],
  stddev=62578.85)

Testing:
1. Added a new test test_skew_reporting_in_runtime_profile in
   test_observability.py to verify that the skews are reported.
2. Ran Core tests successfully.

Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
---
M be/src/runtime/coordinator.cc
M be/src/util/runtime-profile-counters.h
M be/src/util/runtime-profile.cc
M be/src/util/runtime-profile.h
M be/src/util/stat-util.h
M tests/query_test/test_hash_join_timer.py
M tests/query_test/test_observability.py
7 files changed, 177 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/16474/24
--
To view, visit http://gerrit.cloudera.org:8080/16474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
Gerrit-Change-Number: 16474
Gerrit-PatchSet: 24
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews

2020-09-25 Thread Qifan Chen (Code Review)
Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16474 )

Change subject: IMPALA-10178 Run-time profile shall report skews
..


Patch Set 22:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16474/21/be/src/util/runtime-profile.cc
File be/src/util/runtime-profile.cc:

http://gerrit.cloudera.org:8080/#/c/16474/21/be/src/util/runtime-profile.cc@1928
PS21, Line 1928:   vector values;
> > In the past, stddev with a threshold of 5 served the purpose well.
Here are some examples from a profile 
QueryIDa841807823cbdc837375aa60.txt involving tables in the order of 
million rows and a DoP of 212.

Hash join 08
left child: 145.00M rows
frag instances: 212
stddev: 11421.5
mean: 683975
stddev/mean: 0.0166

Hdfs_scan 18
fragment instances=209
stddev = 918947
mean = 4.45696e+06
stddev / mean = 0.206

hash exchange 38
fragment instances=209
stddev=13692.9
mean = 1.52542e+06
Stddev/mean = 0.0089

Here stddev/mean is called coefficient of variation (CV), also known as 
relative standard deviation (RSD).  It shows the extent of variability in 
relation to the mean of the population. In our case, the less the CV, the 
better. When all values are the same and >=1, CV is 0 because stddev is 0.

If we look at these three examples above, we can see that hdfs scan at node 18 
has a CV value of 20%. That is a skew case in my opinion. Skews with other two 
are much less.

The intention of reporting skew is to reveal processing imbalance. The 
translation of the skew-ness to performance loss has to be done separately. In 
the case of filtering, my theory is that if the matching values are distributed 
evenly, and the scanners are applied evenly, the rows read should be about the 
same.

To report severe skews only for impala, maybe we can use CV (instead of stddev) 
as a threshold. Say
cv > 5% && mean over 1 million.



--
To view, visit http://gerrit.cloudera.org:8080/16474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
Gerrit-Change-Number: 16474
Gerrit-PatchSet: 22
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 25 Sep 2020 17:36:28 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews

2020-09-24 Thread Sahil Takiar (Code Review)
Sahil Takiar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16474 )

Change subject: IMPALA-10178 Run-time profile shall report skews
..


Patch Set 21:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16474/21/be/src/util/runtime-profile.cc
File be/src/util/runtime-profile.cc:

http://gerrit.cloudera.org:8080/#/c/16474/21/be/src/util/runtime-profile.cc@1928
PS21, Line 1928:   if (stddev > 5.0) {
> In the past, stddev with a threshold of 5 served the purpose well.

Would be interested into seeing what evidence we have to support this. Might be 
worth running this logic against some larger runtime profiles and see what 
comes out.

With Parquet encodings, Parquet filter pushdown, page skipping, runtime 
filters, etc. I wouldn't expect the number of rows read by a scan node to be 
that close together.

just want to make sure that the skew flag doesn't start popping up on every 
runtime profile we get, in which case folks will start to ignore it.

another benchmark might be to see how many TPC-DS or TPC-H profiles the skew 
flag pops up on. maybe 30 GB would be enough scale, not sure.



--
To view, visit http://gerrit.cloudera.org:8080/16474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
Gerrit-Change-Number: 16474
Gerrit-PatchSet: 21
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 24 Sep 2020 20:32:43 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews

2020-09-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16474 )

Change subject: IMPALA-10178 Run-time profile shall report skews
..


Patch Set 22:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7278/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
Gerrit-Change-Number: 16474
Gerrit-PatchSet: 22
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 24 Sep 2020 19:57:55 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews

2020-09-24 Thread Qifan Chen (Code Review)
Qifan Chen has uploaded a new patch set (#22). ( 
http://gerrit.cloudera.org:8080/16474 )

Change subject: IMPALA-10178 Run-time profile shall report skews
..

IMPALA-10178 Run-time profile shall report skews

This fix addresses the current limitation in runtime profile that
skews existing in certain operators such as the rows read counter
(RowsRead) in the scan operators are not reported. A skew condition
exists when the number of rows processed at each operator instance
is not about the same and can be detected through standard deviation
(stddev). A high stddev (say > 5) usually implies the existence of
skew.

With the fix, such skew is detected for the following counters

  1. RowsRead in HDFS_SCAN_NODE and KUDU_SCAN_NODE profile
  2. ProbeRows and BuildRows in HASH_JOIN_NODE profile
  3. RowsReturned in GroupingAggregator, EXCHANGE and SORT_NODE
 profile

and reported as follows:

  1. In a new skew summary in execution profile that lists the names
 of the operators with skews;
  2. In each corresponding operator in the averaged profile for a
 fragment, the name of the counter, the list of values of the
 counter across all fragment instances in the backend processes,
 and the stddev value.

Examples of skews reported for a hash join and an hdfs scan.

In execution profile:

  ... ...
  skew(s) found at: HASH_JOIN_NODE (id=4), HDFS_SCAN_NODE (id=0)
  Per Node Peak Memory Usage: ...
  Per Node Bytes Read: ...
  Per Node User Time: ...
  Per Node System Time:
  ... ...

In averaged profiles:

  HASH_JOIN_NODE (id=4): ...
  Skew details: ProbeRows ([16904,17750,19197],
   stddev=946.77)
... ...

  HDFS_SCAN_NODE (id=0): ...
  Skew details: RowsRead ([913887,917913,1048604],
  stddev=62578.85)

Testing:
1. Added a new test test_skew_reporting_in_runtime_profile in
   test_observability.py to verify that the skews are reported.
2. Ran Core tests successfully.

Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
---
M be/src/runtime/coordinator.cc
M be/src/util/runtime-profile-counters.h
M be/src/util/runtime-profile.cc
M be/src/util/runtime-profile.h
M be/src/util/stat-util.h
M tests/query_test/test_hash_join_timer.py
M tests/query_test/test_observability.py
7 files changed, 182 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/16474/22
--
To view, visit http://gerrit.cloudera.org:8080/16474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
Gerrit-Change-Number: 16474
Gerrit-PatchSet: 22
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews

2020-09-24 Thread Qifan Chen (Code Review)
Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16474 )

Change subject: IMPALA-10178 Run-time profile shall report skews
..


Patch Set 21:

(3 comments)

Thanks for the review!

http://gerrit.cloudera.org:8080/#/c/16474/21//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16474/21//COMMIT_MSG@28
PS21, Line 28:   2. In each corresponding operator in the averaged profile, the 
name
 :  of the counter, the list of values of the counter across the
 :  impalad backend processes, and the stddev value.
> I'm a bit confused as to whether this just detects skew across all fragment
An averaged profile is created per fragment, summarizing all data from all 
fragment instances for that fragment. Regardless of on the fragment instances 
are distributed across the nodes, the skew is computed for each fragment.

Reworded.


http://gerrit.cloudera.org:8080/#/c/16474/21/be/src/util/runtime-profile.h
File be/src/util/runtime-profile.h:

http://gerrit.cloudera.org:8080/#/c/16474/21/be/src/util/runtime-profile.h@202
PS21, Line 202:   // Generate a string enumerating profiles rooted at this.
  :   std::string DebugString(int indent = 0);
> where is this used?
Removed.


http://gerrit.cloudera.org:8080/#/c/16474/21/be/src/util/runtime-profile.cc
File be/src/util/runtime-profile.cc:

http://gerrit.cloudera.org:8080/#/c/16474/21/be/src/util/runtime-profile.cc@1928
PS21, Line 1928:   if (stddev > 5.0) {
> how well does this work as the number of rows processed by a counter increa
Yes, a stddev of 5 may not be a big deal with respect to very large row count. 
However, it still captures the variation. And a large stddev implies large 
variation which should be reduced somehow toward a stddev of 0.

In the past, stddev with a threshold of 5 served the purpose well.



--
To view, visit http://gerrit.cloudera.org:8080/16474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
Gerrit-Change-Number: 16474
Gerrit-PatchSet: 21
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 24 Sep 2020 19:34:16 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews

2020-09-24 Thread Sahil Takiar (Code Review)
Sahil Takiar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16474 )

Change subject: IMPALA-10178 Run-time profile shall report skews
..


Patch Set 21:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/16474/21//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16474/21//COMMIT_MSG@28
PS21, Line 28:   2. In each corresponding operator in the averaged profile, the 
name
 :  of the counter, the list of values of the counter across the
 :  impalad backend processes, and the stddev value.
I'm a bit confused as to whether this just detects skew across all fragment 
instances on a single node, or does detect skew across all fragment instances 
across all nodes?


http://gerrit.cloudera.org:8080/#/c/16474/21/be/src/util/runtime-profile.h
File be/src/util/runtime-profile.h:

http://gerrit.cloudera.org:8080/#/c/16474/21/be/src/util/runtime-profile.h@202
PS21, Line 202:   // Generate a string enumerating profiles rooted at this.
  :   std::string DebugString(int indent = 0);
where is this used?


http://gerrit.cloudera.org:8080/#/c/16474/21/be/src/util/runtime-profile.cc
File be/src/util/runtime-profile.cc:

http://gerrit.cloudera.org:8080/#/c/16474/21/be/src/util/runtime-profile.cc@1928
PS21, Line 1928:   if (stddev > 5.0) {
how well does this work as the number of rows processed by a counter increases? 
e.g. if there are nodes processing billions of rows, a std-dev of more than 5 
doesn't seem that statistically significant

I'm not entirely sure how it works but the single_node_perf_benchmark.py uses 
various tests to check if a difference in runtime profile counters are 
statistically significant. see report_benchmark_results.py which refers to 
things like "ttest t-value" and the "Mann-Whitney Z-value".

I'm not stats expert but, simply hardcoding the threshold to 5 seems odd.



--
To view, visit http://gerrit.cloudera.org:8080/16474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
Gerrit-Change-Number: 16474
Gerrit-PatchSet: 21
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 24 Sep 2020 17:59:59 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews

2020-09-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16474 )

Change subject: IMPALA-10178 Run-time profile shall report skews
..


Patch Set 21:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7275/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
Gerrit-Change-Number: 16474
Gerrit-PatchSet: 21
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 24 Sep 2020 17:49:30 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews

2020-09-24 Thread Qifan Chen (Code Review)
Qifan Chen has uploaded a new patch set (#21). ( 
http://gerrit.cloudera.org:8080/16474 )

Change subject: IMPALA-10178 Run-time profile shall report skews
..

IMPALA-10178 Run-time profile shall report skews

This fix addresses the current limitation in runtime profile that
skews existing in certain operators such as the rows read counter
(RowsRead) in the scan operators are not reported. A skew condition
exists when the number of rows processed at each operator instance
is not about the same and can be detected through standard deviation
(stddev). A high stddev (say > 5) usually implies the existence of
skew.

With the fix, such skew is detected for the following counters

  1. RowsRead in HDFS_SCAN_NODE and KUDU_SCAN_NODE profile
  2. ProbeRows and BuildRows in HASH_JOIN_NODE profile
  3. RowsReturned in GroupingAggregator, EXCHANGE and SORT_NODE
 profile

and reported as follows:

  1. In a new skew summary in execution profile that lists the names
 of the operators with skews;
  2. In each corresponding operator in the averaged profile, the name
 of the counter, the list of values of the counter across the
 impalad backend processes, and the stddev value.

Examples of skews reported for a hash join and an hdfs scan.

  ... ...
  Execution Profile ... ...
  ... ...
  skew(s) found at: HASH_JOIN_NODE (id=4), HDFS_SCAN_NODE (id=0)
  Per Node Peak Memory Usage: ...
  Per Node Bytes Read: ...
  Per Node User Time: ...
  Per Node System Time:
  ... ...

  HASH_JOIN_NODE (id=4): ...
  Skew details: ProbeRows ([16904,17750,19197],
   stddev=946.77)
... ...

  HDFS_SCAN_NODE (id=0): ...
  Skew details: RowsRead ([913887,917913,1048604],
  stddev=62578.85)

Testing:
1. Added a new test test_skew_reporting_in_runtime_profile in
   test_observability.py to verify that the skews are reported.
2. Ran Core tests successfully.

Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
---
M be/src/runtime/coordinator.cc
M be/src/util/runtime-profile-counters.h
M be/src/util/runtime-profile.cc
M be/src/util/runtime-profile.h
M be/src/util/stat-util.h
M tests/query_test/test_hash_join_timer.py
M tests/query_test/test_observability.py
7 files changed, 201 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/16474/21
--
To view, visit http://gerrit.cloudera.org:8080/16474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
Gerrit-Change-Number: 16474
Gerrit-PatchSet: 21
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews

2020-09-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16474 )

Change subject: IMPALA-10178 Run-time profile shall report skews
..


Patch Set 20:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7262/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
Gerrit-Change-Number: 16474
Gerrit-PatchSet: 20
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 24 Sep 2020 00:14:15 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews

2020-09-23 Thread Qifan Chen (Code Review)
Qifan Chen has uploaded a new patch set (#20). ( 
http://gerrit.cloudera.org:8080/16474 )

Change subject: IMPALA-10178 Run-time profile shall report skews
..

IMPALA-10178 Run-time profile shall report skews

This fix addresses the current limitation in runtime profile that
skews existing in certain operators such as the rows read counter
(RowsRead) in the scan operators are not reported. A skew condition
exists when the number of rows processed at each operator instance
is not about the same and can be detected through standard deviation
(stddev). A high stddev (say > 5) usually implies the existence of
skew.

With the fix, such skew is detected for the following counters

  1. RowsRead in HDFS_SCAN_NODE and KUDU_SCAN_NODE profile
  2. ProbeRows and BuildRows in HASH_JOIN_NODE profile
  3. RowsReturned in GroupingAggregator, EXCHANGE and SORT_NODE
 profile

and reported as follows:

  1. In a new skew summary in execution profile that lists the names
 of the operators with skews;
  2. In each corresponding operator in the averaged profile, the name
 of the counter, the list of values of the counter across the
 impalad backend processes, and the stddev value.

Examples of skews reported for a hash join and an hdfs scan.

  ... ...
  Execution Profile ... ...
  ... ...
  skew(s) found at: HASH_JOIN_NODE (id=4), HDFS_SCAN_NODE (id=0)
  Per Node Peak Memory Usage: ...
  Per Node Bytes Read: ...
  Per Node User Time: ...
  Per Node System Time:
  ... ...

  HASH_JOIN_NODE (id=4): ...
  Skew details: ProbeRows ([16904,17750,19197],
   stddev=946.77)
... ...

  HDFS_SCAN_NODE (id=0): ...
  Skew details: RowsRead ([913887,917913,1048604],
  stddev=62578.85)

Testing:
1. Added a new test test_skew_reporting_in_runtime_profile in
   test_observability.py to verify that the skews are reported.
2. Ran Core tests successfully.

Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
---
M be/src/runtime/coordinator.cc
M be/src/util/runtime-profile-counters.h
M be/src/util/runtime-profile.cc
M be/src/util/runtime-profile.h
M be/src/util/stat-util.h
M tests/query_test/test_hash_join_timer.py
M tests/query_test/test_observability.py
7 files changed, 201 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/16474/20
--
To view, visit http://gerrit.cloudera.org:8080/16474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
Gerrit-Change-Number: 16474
Gerrit-PatchSet: 20
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews

2020-09-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16474 )

Change subject: IMPALA-10178 Run-time profile shall report skews
..


Patch Set 19:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7258/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
Gerrit-Change-Number: 16474
Gerrit-PatchSet: 19
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 23 Sep 2020 21:28:24 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews

2020-09-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16474 )

Change subject: IMPALA-10178 Run-time profile shall report skews
..


Patch Set 18:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7257/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
Gerrit-Change-Number: 16474
Gerrit-PatchSet: 18
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 23 Sep 2020 21:16:35 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews

2020-09-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16474 )

Change subject: IMPALA-10178 Run-time profile shall report skews
..


Patch Set 19:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/16474/19/tests/query_test/test_observability.py
File tests/query_test/test_observability.py:

http://gerrit.cloudera.org:8080/#/c/16474/19/tests/query_test/test_observability.py@806
PS19, Line 806: =
flake8: E225 missing whitespace around operator


http://gerrit.cloudera.org:8080/#/c/16474/19/tests/query_test/test_observability.py@814
PS19, Line 814:
flake8: E221 multiple spaces before operator



--
To view, visit http://gerrit.cloudera.org:8080/16474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
Gerrit-Change-Number: 16474
Gerrit-PatchSet: 19
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 23 Sep 2020 21:08:54 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews

2020-09-23 Thread Qifan Chen (Code Review)
Qifan Chen has uploaded a new patch set (#19). ( 
http://gerrit.cloudera.org:8080/16474 )

Change subject: IMPALA-10178 Run-time profile shall report skews
..

IMPALA-10178 Run-time profile shall report skews

This fix addresses the current limitation in runtime profile that
skews existing in certain operators such as the rows read counter
(RowsRead) in the scan operators are not reported. A skew condition
exists when the number of rows processed at each operator instance
is not about the same and can be detected through standard deviation
(stddev). A high stddev (say > 5) usually implies the existence of
skew.

With the fix, such skew is detected for the following counters

  1. RowsRead in HDFS_SCAN_NODE and KUDU_SCAN_NODE profile
  2. ProbeRows and BuildRows in HASH_JOIN_NODE profile
  3. RowsReturned in GroupingAggregator, EXCHANGE and SORT_NODE
 profile

and reported as follows:

  1. In a new skew summary in execution profile that lists the names
 of the operators with skews;
  2. In each corresponding operator in the averaged profile, the name
 of the counter, the list of values of the counter across the
 impalad backend processes, and the stddev value.

Examples of skews reported for a hash join and an hdfs scan.

  ... ...
  Execution Profile ... ...
  ... ...
  skew(s) found at: HASH_JOIN_NODE (id=4), HDFS_SCAN_NODE (id=0)
  Per Node Peak Memory Usage: ...
  Per Node Bytes Read: ...
  Per Node User Time: ...
  Per Node System Time:
  ... ...

  HASH_JOIN_NODE (id=4): ...
  Skew details: ProbeRows ([16904,17750,19197],
   stddev=946.77)
... ...

  HDFS_SCAN_NODE (id=0): ...
  Skew details: RowsRead ([913887,917913,1048604],
  stddev=62578.85)

Testing:
1. Added a new test test_skew_reporting_in_runtime_profile in
   test_observability.py to verify that the skews are reported.
2. Ran Core tests successfully.

Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
---
M be/src/runtime/coordinator.cc
M be/src/util/runtime-profile-counters.h
M be/src/util/runtime-profile.cc
M be/src/util/runtime-profile.h
M be/src/util/stat-util.h
M tests/query_test/test_hash_join_timer.py
M tests/query_test/test_observability.py
7 files changed, 195 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/16474/19
--
To view, visit http://gerrit.cloudera.org:8080/16474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
Gerrit-Change-Number: 16474
Gerrit-PatchSet: 19
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews

2020-09-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16474 )

Change subject: IMPALA-10178 Run-time profile shall report skews
..


Patch Set 18:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/16474/18/be/src/util/stat-util.h
File be/src/util/stat-util.h:

http://gerrit.cloudera.org:8080/#/c/16474/18/be/src/util/stat-util.h@45
PS18, Line 45:   /// Computes the mean and the standard deviation (population) 
from an array of
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/16474/18/tests/query_test/test_hash_join_timer.py
File tests/query_test/test_hash_join_timer.py:

http://gerrit.cloudera.org:8080/#/c/16474/18/tests/query_test/test_hash_join_timer.py@141
PS18, Line 141: ;
flake8: E703 statement ends with a semicolon


http://gerrit.cloudera.org:8080/#/c/16474/18/tests/query_test/test_observability.py
File tests/query_test/test_observability.py:

http://gerrit.cloudera.org:8080/#/c/16474/18/tests/query_test/test_observability.py@806
PS18, Line 806: =
flake8: E225 missing whitespace around operator


http://gerrit.cloudera.org:8080/#/c/16474/18/tests/query_test/test_observability.py@814
PS18, Line 814:
flake8: E221 multiple spaces before operator



-- 
To view, visit http://gerrit.cloudera.org:8080/16474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
Gerrit-Change-Number: 16474
Gerrit-PatchSet: 18
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 23 Sep 2020 20:55:15 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews

2020-09-23 Thread Qifan Chen (Code Review)
Qifan Chen has uploaded a new patch set (#18). ( 
http://gerrit.cloudera.org:8080/16474 )

Change subject: IMPALA-10178 Run-time profile shall report skews
..

IMPALA-10178 Run-time profile shall report skews

This fix addresses the current limitation in runtime profile that
skews existing in certain operators such as the rows read counter
(RowsRead) in the scan operators are not reported. A skew condition
exists when the number of rows processed at each operator instance
is not about the same and can be detected through standard deviation
(stddev). A high stddev (say > 5) usually implies the existence of
skew.

With the fix, such skew is detected for the following counters

  1. RowsRead in HDFS_SCAN_NODE and KUDU_SCAN_NODE profile
  2. ProbeRows and BuildRows in HASH_JOIN_NODE profile
  3. RowsReturned in GroupingAggregator, EXCHANGE and SORT_NODE
 profile

and reported as follows:

  1. In a new skew summary in execution profile that lists the names
 of the operators with skews;
  2. In each corresponding operator in the averaged profile, the name
 of the counter, the list of values of the counter across the
 impalad backend processes, and the stddev value.

Examples of skews reported for a hash join and an hdfs scan.

  ... ...
  Execution Profile ... ...
  ... ...
  skew(s) found at: HASH_JOIN_NODE (id=4), HDFS_SCAN_NODE (id=0)
  Per Node Peak Memory Usage: ...
  Per Node Bytes Read: ...
  Per Node User Time: ...
  Per Node System Time:
  ... ...

  HASH_JOIN_NODE (id=4): ...
  Skew details: ProbeRows ([16904,17750,19197],
   stddev=946.77)
... ...

  HDFS_SCAN_NODE (id=0): ...
  Skew details: RowsRead ([913887,917913,1048604],
  stddev=62578.85)

Testing:
1. Added a new test test_skew_reporting_in_runtime_profile in
   test_observability.py to verify that the skews are reported.
2. Ran Core tests successfully.

Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
---
M be/src/runtime/coordinator.cc
M be/src/util/runtime-profile-counters.h
M be/src/util/runtime-profile.cc
M be/src/util/runtime-profile.h
M be/src/util/stat-util.h
M tests/query_test/test_hash_join_timer.py
M tests/query_test/test_observability.py
7 files changed, 195 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/16474/18
--
To view, visit http://gerrit.cloudera.org:8080/16474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
Gerrit-Change-Number: 16474
Gerrit-PatchSet: 18
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews

2020-09-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16474 )

Change subject: IMPALA-10178 Run-time profile shall report skews
..


Patch Set 16:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7252/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
Gerrit-Change-Number: 16474
Gerrit-PatchSet: 16
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 23 Sep 2020 18:10:08 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews

2020-09-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16474 )

Change subject: IMPALA-10178 Run-time profile shall report skews
..


Patch Set 16:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/16474/16/be/src/util/runtime-profile-counters.h
File be/src/util/runtime-profile-counters.h:

http://gerrit.cloudera.org:8080/#/c/16474/16/be/src/util/runtime-profile-counters.h@414
PS16, Line 414:   /// all valid raw values backing this average counter.
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/16474/16/be/src/util/runtime-profile-counters.h@415
PS16, Line 415:   ///
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/16474/16/be/src/util/runtime-profile-counters.h@417
PS16, Line 417:   /// all valid raw values and the population stddev in the 
form of:
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/16474/16/be/src/util/runtime-profile-counters.h@419
PS16, Line 419:   ///
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/16474/16/be/src/util/stat-util.h
File be/src/util/stat-util.h:

http://gerrit.cloudera.org:8080/#/c/16474/16/be/src/util/stat-util.h@45
PS16, Line 45:   /// Computes the mean and the standard deviation (population) 
from an array of
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/16474/16/tests/query_test/test_observability.py
File tests/query_test/test_observability.py:

http://gerrit.cloudera.org:8080/#/c/16474/16/tests/query_test/test_observability.py@801
PS16, Line 801: #
flake8: E265 block comment should start with '# '


http://gerrit.cloudera.org:8080/#/c/16474/16/tests/query_test/test_observability.py@807
PS16, Line 807: =
flake8: E225 missing whitespace around operator


http://gerrit.cloudera.org:8080/#/c/16474/16/tests/query_test/test_observability.py@817
PS16, Line 817:
flake8: E221 multiple spaces before operator



--
To view, visit http://gerrit.cloudera.org:8080/16474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
Gerrit-Change-Number: 16474
Gerrit-PatchSet: 16
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 23 Sep 2020 17:53:46 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews

2020-09-23 Thread Qifan Chen (Code Review)
Qifan Chen has uploaded a new patch set (#16). ( 
http://gerrit.cloudera.org:8080/16474 )

Change subject: IMPALA-10178 Run-time profile shall report skews
..

IMPALA-10178 Run-time profile shall report skews

This fix addresses the current limitation in runtime profile that
skews existing in certain operators such as the rows read counter
(RowsRead) in the scan operators are not reported. A skew condition
exists when the number of rows processed at each operator instance
is not about the same and can be detected through standard deviation
(stddev). A high stddev (say > 5) usually implies the existence of
skew.

With the fix and in an average fragment profile, such skew is
detected for the following counters

  1. RowsRead in HDFS_SCAN_NODE and KUDU_SCAN_NODE profile
  2. ProbeRows and BuildRows in HASH_JOIN_NODE profile
  3. RowsReturned in GroupingAggregator, EXCHANGE and SORT_NODE
 profile

and reported as follows:

  1. In the skew summary section which lists the names of the
 operators with skews;
  2. In each corresponding operator, the name of the counters
 and the corresponding stddev values.

Examples of skews reported for a hash join and an hdfs scan.

  Averaged Fragment F00:(Total: 1s075ms, non-child: 26.919ms, ...
... ...
num instances: 3
skew(s) found at: HASH_JOIN_NODE (id=4), HDFS_SCAN_NODE (id=0)

  HASH_JOIN_NODE (id=4):(Total: 1s204ms, non-child: 2.166ms, ...
  Skew details: ProbeRows ([16904, 17750, 19197],
   stddev=946.77)
... ...

  HDFS_SCAN_NODE (id=0):(Total: 1s032ms, non-child: 1s032ms, ...
  Skew details: RowsRead ([913887, 917913, 1048604],
  stddev=62578.85)

Testing:
1. Added a new test test_skew_reporting_in_runtime_profile to
   test_observability.py to verify that the skews are reported.
2. Ran Core tests successfully.

Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
---
M be/src/runtime/coordinator-backend-state.cc
M be/src/util/runtime-profile-counters.h
M be/src/util/runtime-profile.cc
M be/src/util/runtime-profile.h
M be/src/util/stat-util.h
M tests/query_test/test_observability.py
6 files changed, 195 insertions(+), 11 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/16474/16
--
To view, visit http://gerrit.cloudera.org:8080/16474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
Gerrit-Change-Number: 16474
Gerrit-PatchSet: 16
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews

2020-09-23 Thread Qifan Chen (Code Review)
Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16474 )

Change subject: IMPALA-10178 Run-time profile shall report skews
..


Patch Set 15:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/16474/15//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16474/15//COMMIT_MSG@22
PS15, Line 22:   3. RowsReturned in GroupingAggregator profile
> It would be good to add this for sort operations too. We have SortDataSize
Good point.

Added the following:
{"EXCHANGE_NODE", "RowsReturned"} and {"SORT_NODE", "RowsReturned"}.

Since sort does not drop tuples, I guess RowsReturned should be OK.


http://gerrit.cloudera.org:8080/#/c/16474/15//COMMIT_MSG@36
PS15, Line 36: skew(s) found at: HASH_JOIN_NODE (id=4), HDFS_SCAN_NODE 
(id=0)
> I thought a bit about whether using the info string was the right approach
Yeah. The skew summary follows the current model by adding some extra info 
strings to the aggregated profile.


http://gerrit.cloudera.org:8080/#/c/16474/15/be/src/util/runtime-profile-counters.h
File be/src/util/runtime-profile-counters.h:

http://gerrit.cloudera.org:8080/#/c/16474/15/be/src/util/runtime-profile-counters.h@415
PS15, Line 415: o.
> can you explicitly say that it's returned in 'details'.
Done


http://gerrit.cloudera.org:8080/#/c/16474/15/be/src/util/runtime-profile-counters.h@416
PS15, Line 416:
> nit: convention is to use pointer for output args.
Done


http://gerrit.cloudera.org:8080/#/c/16474/15/be/src/util/runtime-profile.cc
File be/src/util/runtime-profile.cc:

http://gerrit.cloudera.org:8080/#/c/16474/15/be/src/util/runtime-profile.cc@1969
PS15, Line 1969:   int num_valid_values = NumValidValues();
> I don't think this quite works since valid values could be added concurrent
Done


http://gerrit.cloudera.org:8080/#/c/16474/15/be/src/util/stat-util.h
File be/src/util/stat-util.h:

http://gerrit.cloudera.org:8080/#/c/16474/15/be/src/util/stat-util.h@28
PS15, Line 28:   /// Computes standard deviation given mean
> Is this the population standard deviation or the sample standard deviation?
Added some comments to clarify that it is the population version that is 
computed. Also Add 'P' in the function name.


http://gerrit.cloudera.org:8080/#/c/16474/15/be/src/util/stat-util.h@28
PS15, Line 28:   /// Computes standard deviation given mean
> I guess this documentation was already missing but would be good to fix
Done



--
To view, visit http://gerrit.cloudera.org:8080/16474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
Gerrit-Change-Number: 16474
Gerrit-PatchSet: 15
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 23 Sep 2020 17:51:29 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews

2020-09-22 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16474 )

Change subject: IMPALA-10178 Run-time profile shall report skews
..


Patch Set 15:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16474/15/be/src/util/stat-util.h
File be/src/util/stat-util.h:

http://gerrit.cloudera.org:8080/#/c/16474/15/be/src/util/stat-util.h@28
PS15, Line 28:   /// Computes standard deviation given mean
> Is this the population standard deviation or the sample standard deviation?
I guess this documentation was already missing but would be good to fix



--
To view, visit http://gerrit.cloudera.org:8080/16474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
Gerrit-Change-Number: 16474
Gerrit-PatchSet: 15
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 23 Sep 2020 03:49:55 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews

2020-09-22 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16474 )

Change subject: IMPALA-10178 Run-time profile shall report skews
..


Patch Set 15:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/16474/15//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16474/15//COMMIT_MSG@22
PS15, Line 22:   3. RowsReturned in GroupingAggregator profile
It would be good to add this for sort operations too. We have SortDataSize 
already, which would be an OK metric, or I guess we could add a count of the 
rows in the sorter. Could be a follow-on patch but might be good to include 
here.

It would also be good to include RowsReturned for exchanges, since that could 
be another source of skew.


http://gerrit.cloudera.org:8080/#/c/16474/15//COMMIT_MSG@36
PS15, Line 36: skew(s) found at: HASH_JOIN_NODE (id=4), HDFS_SCAN_NODE 
(id=0)
I thought a bit about whether using the info string was the right approach (as 
opposed to adding it to the thrift in a more structure way) and I think this 
makes sense, since all the tools can already handle info strings.


http://gerrit.cloudera.org:8080/#/c/16474/15/be/src/util/runtime-profile-counters.h
File be/src/util/runtime-profile-counters.h:

http://gerrit.cloudera.org:8080/#/c/16474/15/be/src/util/runtime-profile-counters.h@415
PS15, Line 415: o.
can you explicitly say that it's returned in 'details'.

When does it return true/false?


http://gerrit.cloudera.org:8080/#/c/16474/15/be/src/util/runtime-profile-counters.h@416
PS15, Line 416:
nit: convention is to use pointer for output args.


http://gerrit.cloudera.org:8080/#/c/16474/15/be/src/util/runtime-profile.cc
File be/src/util/runtime-profile.cc:

http://gerrit.cloudera.org:8080/#/c/16474/15/be/src/util/runtime-profile.cc@1969
PS15, Line 1969:   int num_valid_values = NumValidValues();
I don't think this quite works since valid values could be added concurrently 
with this method executes. I think you instead need to do a single pass over 
the array prepend the ", " if it's not the first.


http://gerrit.cloudera.org:8080/#/c/16474/15/be/src/util/stat-util.h
File be/src/util/stat-util.h:

http://gerrit.cloudera.org:8080/#/c/16474/15/be/src/util/stat-util.h@28
PS15, Line 28:   /// Computes standard deviation given mean
Is this the population standard deviation or the sample standard deviation? 
Would be good to document in the comment cause it's caused confusion in the 
past when it's ambiguous.

I don't know which is the right one to use in this context and it probably 
doesn't matter for the skew threshold. So ok to punt on that.



--
To view, visit http://gerrit.cloudera.org:8080/16474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
Gerrit-Change-Number: 16474
Gerrit-PatchSet: 15
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 23 Sep 2020 03:40:36 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews

2020-09-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16474 )

Change subject: IMPALA-10178 Run-time profile shall report skews
..


Patch Set 15:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7244/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
Gerrit-Change-Number: 16474
Gerrit-PatchSet: 15
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 22 Sep 2020 21:22:38 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews

2020-09-22 Thread Qifan Chen (Code Review)
Qifan Chen has uploaded a new patch set (#15). ( 
http://gerrit.cloudera.org:8080/16474 )

Change subject: IMPALA-10178 Run-time profile shall report skews
..

IMPALA-10178 Run-time profile shall report skews

This fix addresses the current limitation in runtime profile that
skews existing in certain operators such as the rows read counter
(RowsRead) in the scan operators are not reported. A skew condition
exists when the number of rows processed at each operator instance
is not about the same and can be detected through standard deviation
(stddev). A high stddev (say > 5) usually implies the existence of
skew.

With the fix and in the average fragment profile, such skew is
detected for the following counters

  1. RowsRead in HDFS_SCAN_NODE and KUDU_SCAN_NODE profile
  2. ProbeRows and BuildRows in HASH_JOIN_NODE profile
  3. RowsReturned in GroupingAggregator profile

and reported as follows:

  1. In the skew summary section which lists the names of the
 operators with skews;
  2. In each corresponding operator, the name of the counters
 and the corresponding stddev values.

Examples of skews reported for a hash join and an hdfs scan.

  Averaged Fragment F00:(Total: 1s075ms, non-child: 26.919ms, ...
... ...
num instances: 3
skew(s) found at: HASH_JOIN_NODE (id=4), HDFS_SCAN_NODE (id=0)

  HASH_JOIN_NODE (id=4):(Total: 1s204ms, non-child: 2.166ms, ...
  Skew details: ProbeRows ([16904, 17750, 19197],
   stddev=946.77)
... ...

  HDFS_SCAN_NODE (id=0):(Total: 1s032ms, non-child: 1s032ms, ...
  Skew details: RowsRead ([913887, 917913, 1048604],
  stddev=62578.85)

Testing:
1. Added a new test test_skew_reporting_in_runtime_profile to
   test_observability.py to verify that the skews are reported.
2. Ran Core tests successfully.

Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
---
M be/src/runtime/coordinator-backend-state.cc
M be/src/util/runtime-profile-counters.h
M be/src/util/runtime-profile.cc
M be/src/util/runtime-profile.h
M be/src/util/stat-util.h
M tests/query_test/test_observability.py
6 files changed, 193 insertions(+), 9 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/16474/15
--
To view, visit http://gerrit.cloudera.org:8080/16474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
Gerrit-Change-Number: 16474
Gerrit-PatchSet: 15
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews

2020-09-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16474 )

Change subject: IMPALA-10178 Run-time profile shall report skews
..


Patch Set 14:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7239/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
Gerrit-Change-Number: 16474
Gerrit-PatchSet: 14
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 22 Sep 2020 18:21:36 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews

2020-09-22 Thread Qifan Chen (Code Review)
Qifan Chen has uploaded a new patch set (#14). ( 
http://gerrit.cloudera.org:8080/16474 )

Change subject: IMPALA-10178 Run-time profile shall report skews
..

IMPALA-10178 Run-time profile shall report skews

This fix addresses the current limitation in runtime profile that
skews existing in certain operators such as the rows read counter
(RowsRead) in the scan operators are not reported. A skew condition
exists when the number of rows processed at each operator instance
is not about the same and can be detected through standard deviation
(stddev). A high stddev (say > 5) usually implies the existence of
skew.

With the fix and in the average fragment profile, such skew is
detected for the following counters

  1. RowsRead in HDFS_SCAN_NODE profile
  2. ProbeRows and BuildRows in HASH_JOIN_NODE profile
  3. RowsReturned in GroupingAggregator profile

and reported as follows:

  1. In the skew summary section which lists the names of the
 operators with skews;
  2. In each corresponding operator, the name of the counters
 and the corresponding stddev values.

Examples of skews reported for a hash join and an hdfs scan.

  Averaged Fragment F00:(Total: 1s075ms, non-child: 26.919ms, ...
... ...
num instances: 3
skew(s) found at: HASH_JOIN_NODE (id=4), HDFS_SCAN_NODE (id=0)

  HASH_JOIN_NODE (id=4):(Total: 1s204ms, non-child: 2.166ms, ...
  Skew details: ProbeRows ([16904, 17750, 19197],
   stddev=946.77)
... ...

  HDFS_SCAN_NODE (id=0):(Total: 1s032ms, non-child: 1s032ms, ...
  Skew details: RowsRead ([913887, 917913, 1048604],
  stddev=62578.85)

Testing:
1. Added a new test test_skew_reporting_in_runtime_profile to
   test_observability.py to verify that the skews are reported.
2. Ran Core tests successfully.

Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
---
M be/src/runtime/coordinator-backend-state.cc
M be/src/util/runtime-profile-counters.h
M be/src/util/runtime-profile.cc
M be/src/util/runtime-profile.h
M be/src/util/stat-util.h
M tests/query_test/test_observability.py
6 files changed, 192 insertions(+), 9 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/16474/14
--
To view, visit http://gerrit.cloudera.org:8080/16474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
Gerrit-Change-Number: 16474
Gerrit-PatchSet: 14
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews

2020-09-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16474 )

Change subject: IMPALA-10178 Run-time profile shall report skews
..


Patch Set 10:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7228/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
Gerrit-Change-Number: 16474
Gerrit-PatchSet: 10
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 21 Sep 2020 21:49:00 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews

2020-09-21 Thread Qifan Chen (Code Review)
Qifan Chen has uploaded a new patch set (#10). ( 
http://gerrit.cloudera.org:8080/16474 )

Change subject: IMPALA-10178 Run-time profile shall report skews
..

IMPALA-10178 Run-time profile shall report skews

This fix addresses the current limitation in runtime profile that
skews existing in certain operators such as the rows read counter
(RowsRead) in the scan operators are not reported. A skew condition
exists when the number of rows processed at each operator instance
is not about the same and can be detected through standard deviation
(stddev). A high stddev (say > 5) usually implies the existence of
skew.

With the fix and in the average fragment profile, such skew is
detected for the following counters

  1. RowsRead in HDFS_SCAN_NODE profile
  2. ProbeRows and BuildRows in HASH_JOIN_NODE profile
  3. RowsReturned in GroupingAggregator profile

and reported as follows:

  1. In the skew summary section which lists the names of the
 operators with skews;
  2. In each corresponding operator, the name of the counters
 and the corresponding stddev values.

Examples of skews reported for a hash join and an hdfs scan.

Averaged Fragment F00:(Total: 1s075ms, non-child: 26.919ms, ...
  ... ...
  num instances: 3
  skew(s) found at: HASH_JOIN_NODE (id=4), HDFS_SCAN_NODE (id=0)

HASH_JOIN_NODE (id=4):(Total: 1s204ms, non-child: 2.166ms, ...
Skew details: ProbeRows ([16904, 17750, 19197],
 stddev=946.77)
  ... ...

HDFS_SCAN_NODE (id=0):(Total: 1s032ms, non-child: 1s032ms, ...
Skew details: RowsRead ([913887, 917913, 1048604],
stddev=62578.85)

TODO:
1. Add unit tests;
2. Run core tests.

Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
---
M be/src/runtime/coordinator-backend-state.cc
M be/src/util/CMakeLists.txt
A be/src/util/runtime-profile-counters.cc
M be/src/util/runtime-profile-counters.h
M be/src/util/runtime-profile.cc
M be/src/util/runtime-profile.h
M be/src/util/stat-util.h
7 files changed, 149 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/16474/10
--
To view, visit http://gerrit.cloudera.org:8080/16474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
Gerrit-Change-Number: 16474
Gerrit-PatchSet: 10
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews

2020-09-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16474 )

Change subject: IMPALA-10178 Run-time profile shall report skews
..


Patch Set 9:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7225/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
Gerrit-Change-Number: 16474
Gerrit-PatchSet: 9
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 21 Sep 2020 19:33:20 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10178 Run-time profile shall report skews

2020-09-21 Thread Qifan Chen (Code Review)
Qifan Chen has uploaded a new patch set (#9). ( 
http://gerrit.cloudera.org:8080/16474 )

Change subject: IMPALA-10178 Run-time profile shall report skews
..

IMPALA-10178 Run-time profile shall report skews

This fix addresses the current limitation in runtime profile that
skews existing in certain operators such as the rows read counter
(RowsRead) in the scan operators are not reported. A skew condition
exists when the number of rows processed at each operator instance
is not about the same and can be detected through standard deviation
(stddev). A high stddev (say > 5) usually implies the existence of
skew.

With the fix and in the average fragment profile, such skew is
reported as follows:
  1. In the skew summary section which lists the names of the
 operators with skews;
  2. In each corresponding operator, the name of the counters
 and the corresponding stddev values.

Examples of skews reported for a hash join and an hdfs scan.

Averaged Fragment F00:(Total: 1s075ms, non-child: 26.919ms, ...
  ... ...
  num instances: 3
  skew(s) found at: HASH_JOIN_NODE (id=4), HDFS_SCAN_NODE (id=0)

HASH_JOIN_NODE (id=4):(Total: 1s204ms, non-child: 2.166ms, ...
Skew details: ProbeRows (values=[16904, 17750, 19197],
 stddev=946.770828)
  ... ...

HDFS_SCAN_NODE (id=0):(Total: 1s032ms, non-child: 1s032ms, ...
Skew details: RowsRead (values=[913887, 917913, 1048604],
stddev=62578.853590)

TODO:
1. Add unit tests;
2. Run core tests.

Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
---
M be/src/runtime/coordinator-backend-state.cc
M be/src/util/CMakeLists.txt
A be/src/util/runtime-profile-counters.cc
M be/src/util/runtime-profile-counters.h
M be/src/util/runtime-profile.cc
M be/src/util/runtime-profile.h
M be/src/util/stat-util.h
7 files changed, 141 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/16474/9
--
To view, visit http://gerrit.cloudera.org:8080/16474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
Gerrit-Change-Number: 16474
Gerrit-PatchSet: 9
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong