[jira] [Updated] (HADOOP-18426) Improve the accuracy of MutableStat mean

Shuyan Zhang (Jira) Wed, 31 Aug 2022 22:15:05 -0700


     [ 
https://issues.apache.org/jira/browse/HADOOP-18426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Shuyan Zhang updated HADOOP-18426:
----------------------------------
    Description: 
The current MutableStat mean calculation method is more prone to loss accuracy 
because the sum of samples is too large. 
Storing large integers in the double type results in a loss of accuracy. For 
example, 9223372036854775707 and 9223372036854775708 are both stored as doubles 
as 9223372036854776000. Therefore, we should try to avoid using the cumulative 
total sum method to calculate the average, but update the average every time we 
sample. All in all, we can process each sample on its own to improve mean 
accuracy.

  was:The current MutableStat mean calculation method is more prone to loss 
accuracy because the sum of samples is too large. We can process each sample on 
its own to improve mean accuracy.


> Improve the accuracy of MutableStat mean
> ----------------------------------------
>
>                 Key: HADOOP-18426
>                 URL: https://issues.apache.org/jira/browse/HADOOP-18426
>             Project: Hadoop Common
>          Issue Type: Bug
>            Reporter: Shuyan Zhang
>            Assignee: Shuyan Zhang
>            Priority: Major
>              Labels: pull-request-available
>
> The current MutableStat mean calculation method is more prone to loss 
> accuracy because the sum of samples is too large. 
> Storing large integers in the double type results in a loss of accuracy. For 
> example, 9223372036854775707 and 9223372036854775708 are both stored as 
> doubles as 9223372036854776000. Therefore, we should try to avoid using the 
> cumulative total sum method to calculate the average, but update the average 
> every time we sample. All in all, we can process each sample on its own to 
> improve mean accuracy.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-18426) Improve the accuracy of MutableStat mean

Reply via email to