subject:"\[jira\] \[Updated\] \(HADOOP\-18426\) Improve the accuracy of MutableStat mean"

[jira] [Updated] (HADOOP-18426) Improve the accuracy of MutableStat mean

2022-08-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-18426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HADOOP-18426:

Labels: pull-request-available  (was: )

> Improve the accuracy of MutableStat mean
> 
>
> Key: HADOOP-18426
> URL: https://issues.apache.org/jira/browse/HADOOP-18426
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Shuyan Zhang
>Priority: Major
>  Labels: pull-request-available
>
> The current MutableStat mean calculation method is more prone to loss 
> accuracy because the sum of samples is too large. We can process each sample 
> on its own to improve mean accuracy.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-18426) Improve the accuracy of MutableStat mean

2022-08-31 Thread Shuyan Zhang (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-18426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shuyan Zhang updated HADOOP-18426:
--
Description: 
The current MutableStat mean calculation method is more prone to loss accuracy 
because the sum of samples is too large. 
Storing large integers in the double type results in a loss of accuracy. For 
example, 9223372036854775707 and 9223372036854775708 are both stored as doubles 
as 9223372036854776000. Therefore, we should try to avoid using the cumulative 
total sum method to calculate the average, but update the average every time we 
sample. All in all, we can process each sample on its own to improve mean 
accuracy.

  was:The current MutableStat mean calculation method is more prone to loss 
accuracy because the sum of samples is too large. We can process each sample on 
its own to improve mean accuracy.


> Improve the accuracy of MutableStat mean
> 
>
> Key: HADOOP-18426
> URL: https://issues.apache.org/jira/browse/HADOOP-18426
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Shuyan Zhang
>Assignee: Shuyan Zhang
>Priority: Major
>  Labels: pull-request-available
>
> The current MutableStat mean calculation method is more prone to loss 
> accuracy because the sum of samples is too large. 
> Storing large integers in the double type results in a loss of accuracy. For 
> example, 9223372036854775707 and 9223372036854775708 are both stored as 
> doubles as 9223372036854776000. Therefore, we should try to avoid using the 
> cumulative total sum method to calculate the average, but update the average 
> every time we sample. All in all, we can process each sample on its own to 
> improve mean accuracy.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-18426) Improve the accuracy of MutableStat mean

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-18426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HADOOP-18426:

  Component/s: common
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Improve the accuracy of MutableStat mean
> 
>
> Key: HADOOP-18426
> URL: https://issues.apache.org/jira/browse/HADOOP-18426
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common
>Affects Versions: 3.4.0
>Reporter: Shuyan Zhang
>Assignee: Shuyan Zhang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> The current MutableStat mean calculation method is more prone to loss 
> accuracy because the sum of samples is too large. 
> Storing large integers in the double type results in a loss of accuracy. For 
> example, 9223372036854775707 and 9223372036854775708 are both stored as 
> doubles as 9223372036854776000. Therefore, we should try to avoid using the 
> cumulative total sum method to calculate the average, but update the average 
> every time we sample. All in all, we can process each sample on its own to 
> improve mean accuracy.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-18426) Improve the accuracy of MutableStat mean

[jira] [Updated] (HADOOP-18426) Improve the accuracy of MutableStat mean

[jira] [Updated] (HADOOP-18426) Improve the accuracy of MutableStat mean

3 matches

Site Navigation

Mail list logo

Footer information