Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/16474 )
Change subject: IMPALA-10178 Run-time profile shall report skews ...................................................................... Patch Set 31: (2 comments) http://gerrit.cloudera.org:8080/#/c/16474/21/be/src/util/runtime-profile.cc File be/src/util/runtime-profile.cc: http://gerrit.cloudera.org:8080/#/c/16474/21/be/src/util/runtime-profile.cc@1928 PS21, Line 1928: double stddev = 0.0; > To report severe skews only for impala, maybe we can use CV (instead of > stddev) as a threshold. Say cv > 5% && mean over 1 million. Makes sense to me. http://gerrit.cloudera.org:8080/#/c/16474/31/be/src/util/runtime-profile.cc File be/src/util/runtime-profile.cc: http://gerrit.cloudera.org:8080/#/c/16474/31/be/src/util/runtime-profile.cc@1949 PS31, Line 1949: if (stddev > 5) { I'm still not sure how useful this is. Even if it is moved to an "aggressive" option. I can see it still leading to a lot of false positives. Any chance that when writing this you actually meant to calculate the z-score (the number of standard deviations by which a value is above or below the mean). I've seen references where outlier detection algorithms check if the z-score is greater than 3 (or in this case 5). -- To view, visit http://gerrit.cloudera.org:8080/16474 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I91041f2856eef8293ea78f1721f97469062589a1 Gerrit-Change-Number: 16474 Gerrit-PatchSet: 31 Gerrit-Owner: Qifan Chen <qc...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com> Gerrit-Reviewer: Sahil Takiar <stak...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Comment-Date: Tue, 29 Sep 2020 20:02:32 +0000 Gerrit-HasComments: Yes