Sahil Takiar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16474 )

Change subject: IMPALA-10178 Run-time profile shall report skews
......................................................................


Patch Set 31:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/16474/21/be/src/util/runtime-profile.cc
File be/src/util/runtime-profile.cc:

http://gerrit.cloudera.org:8080/#/c/16474/21/be/src/util/runtime-profile.cc@1928
PS21, Line 1928:     double stddev = 0.0;
> To report severe skews only for impala, maybe we can use CV (instead of 
> stddev) as a threshold. Say
cv > 5% && mean over 1 million.

Makes sense to me.


http://gerrit.cloudera.org:8080/#/c/16474/31/be/src/util/runtime-profile.cc
File be/src/util/runtime-profile.cc:

http://gerrit.cloudera.org:8080/#/c/16474/31/be/src/util/runtime-profile.cc@1949
PS31, Line 1949:   if (stddev > 5) {
I'm still not sure how useful this is. Even if it is moved to an "aggressive" 
option. I can see it still leading to a lot of false positives.

Any chance that when writing this you actually meant to calculate the z-score 
(the number of standard deviations by which a value is above or below the 
mean). I've seen references where outlier detection algorithms check if the 
z-score is greater than 3 (or in this case 5).



--
To view, visit http://gerrit.cloudera.org:8080/16474
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1
Gerrit-Change-Number: 16474
Gerrit-PatchSet: 31
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <stak...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Comment-Date: Tue, 29 Sep 2020 20:02:32 +0000
Gerrit-HasComments: Yes

Reply via email to