[jira] [Updated] (HADOOP-18426) Improve the accuracy of MutableStat mean
[ https://issues.apache.org/jira/browse/HADOOP-18426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HADOOP-18426: Labels: pull-request-available (was: ) > Improve the accuracy of MutableStat mean > > > Key: HADOOP-18426 > URL: https://issues.apache.org/jira/browse/HADOOP-18426 > Project: Hadoop Common > Issue Type: Bug >Reporter: Shuyan Zhang >Priority: Major > Labels: pull-request-available > > The current MutableStat mean calculation method is more prone to loss > accuracy because the sum of samples is too large. We can process each sample > on its own to improve mean accuracy. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18426) Improve the accuracy of MutableStat mean
[ https://issues.apache.org/jira/browse/HADOOP-18426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuyan Zhang updated HADOOP-18426: -- Description: The current MutableStat mean calculation method is more prone to loss accuracy because the sum of samples is too large. Storing large integers in the double type results in a loss of accuracy. For example, 9223372036854775707 and 9223372036854775708 are both stored as doubles as 9223372036854776000. Therefore, we should try to avoid using the cumulative total sum method to calculate the average, but update the average every time we sample. All in all, we can process each sample on its own to improve mean accuracy. was:The current MutableStat mean calculation method is more prone to loss accuracy because the sum of samples is too large. We can process each sample on its own to improve mean accuracy. > Improve the accuracy of MutableStat mean > > > Key: HADOOP-18426 > URL: https://issues.apache.org/jira/browse/HADOOP-18426 > Project: Hadoop Common > Issue Type: Bug >Reporter: Shuyan Zhang >Assignee: Shuyan Zhang >Priority: Major > Labels: pull-request-available > > The current MutableStat mean calculation method is more prone to loss > accuracy because the sum of samples is too large. > Storing large integers in the double type results in a loss of accuracy. For > example, 9223372036854775707 and 9223372036854775708 are both stored as > doubles as 9223372036854776000. Therefore, we should try to avoid using the > cumulative total sum method to calculate the average, but update the average > every time we sample. All in all, we can process each sample on its own to > improve mean accuracy. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18426) Improve the accuracy of MutableStat mean
[ https://issues.apache.org/jira/browse/HADOOP-18426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HADOOP-18426: Component/s: common Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Improve the accuracy of MutableStat mean > > > Key: HADOOP-18426 > URL: https://issues.apache.org/jira/browse/HADOOP-18426 > Project: Hadoop Common > Issue Type: Bug > Components: common >Affects Versions: 3.4.0 >Reporter: Shuyan Zhang >Assignee: Shuyan Zhang >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > > The current MutableStat mean calculation method is more prone to loss > accuracy because the sum of samples is too large. > Storing large integers in the double type results in a loss of accuracy. For > example, 9223372036854775707 and 9223372036854775708 are both stored as > doubles as 9223372036854776000. Therefore, we should try to avoid using the > cumulative total sum method to calculate the average, but update the average > every time we sample. All in all, we can process each sample on its own to > improve mean accuracy. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org