[ https://issues.apache.org/jira/browse/HADOOP-18426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17601142#comment-17601142 ]
ASF GitHub Bot commented on HADOOP-18426: ----------------------------------------- Hexiaoqiao commented on PR #4844: URL: https://github.com/apache/hadoop/pull/4844#issuecomment-1238935726 Committed to trunk. Thanks @xkrogen and @zhangshuyan0 for your works! > Improve the accuracy of MutableStat mean > ---------------------------------------- > > Key: HADOOP-18426 > URL: https://issues.apache.org/jira/browse/HADOOP-18426 > Project: Hadoop Common > Issue Type: Bug > Reporter: Shuyan Zhang > Assignee: Shuyan Zhang > Priority: Major > Labels: pull-request-available > > The current MutableStat mean calculation method is more prone to loss > accuracy because the sum of samples is too large. > Storing large integers in the double type results in a loss of accuracy. For > example, 9223372036854775707 and 9223372036854775708 are both stored as > doubles as 9223372036854776000. Therefore, we should try to avoid using the > cumulative total sum method to calculate the average, but update the average > every time we sample. All in all, we can process each sample on its own to > improve mean accuracy. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org