[jira] [Commented] (MATH-1253) Skewness could get more precision from slightly reordered code.
[ https://issues.apache.org/jira/browse/MATH-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15025190#comment-15025190 ] Otmar Ertl commented on MATH-1253: -- Provided that there is no overflow, I do not think that my proposal is more accurate. The relative error of a sum of floating point numbers is the same after scaling the numbers by the same factor. > Skewness could get more precision from slightly reordered code. > --- > > Key: MATH-1253 > URL: https://issues.apache.org/jira/browse/MATH-1253 > Project: Commons Math > Issue Type: Bug >Affects Versions: 3.5 >Reporter: Bill Murphy >Priority: Minor > > In Skewness.java, approx line 180, there is code like: > {noformat} > double accum3 = 0.0; > for (int i = begin; i < begin + length; i++) { > final double d = values[i] - m; > accum3 += d * d * d; > } > accum3 /= variance * FastMath.sqrt(variance); > {noformat} > If the division was moved into the for loop, accum3 would be less likely to > overflow to Infinity (or -Infinity). This might allow computation to return a > result in a case such as: > {noformat} > double[] numArray = { 1.234E11, 1.234E51, 1.234E101, 1.234E151 }; > Skewnessskew = new Skewness(); > doublesk = skew.evaluate( numArray ); > {noformat} > Currently, this returns NaN, but I'd prefer it returned approx > 1.154700538379252. > The change I'm proposing would have the code instead read like: > {noformat} > double accum3 = 0.0; > double divisor = variance * FastMath.sqrt(variance); > for (int i = begin; i < begin + length; i++) { > final double d = values[i] - m; > accum3 += d * d * d / divisor; > } > {noformat} > Thanks! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MATH-1253) Skewness could get more precision from slightly reordered code.
[ https://issues.apache.org/jira/browse/MATH-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15024241#comment-15024241 ] Gilles commented on MATH-1253: -- Independently of extending the valid range, wouldn't Otmar's proposal also give a more accurate result (than the current code), especially when the number of samples is large? > Skewness could get more precision from slightly reordered code. > --- > > Key: MATH-1253 > URL: https://issues.apache.org/jira/browse/MATH-1253 > Project: Commons Math > Issue Type: Bug >Affects Versions: 3.5 >Reporter: Bill Murphy >Priority: Minor > > In Skewness.java, approx line 180, there is code like: > {noformat} > double accum3 = 0.0; > for (int i = begin; i < begin + length; i++) { > final double d = values[i] - m; > accum3 += d * d * d; > } > accum3 /= variance * FastMath.sqrt(variance); > {noformat} > If the division was moved into the for loop, accum3 would be less likely to > overflow to Infinity (or -Infinity). This might allow computation to return a > result in a case such as: > {noformat} > double[] numArray = { 1.234E11, 1.234E51, 1.234E101, 1.234E151 }; > Skewnessskew = new Skewness(); > doublesk = skew.evaluate( numArray ); > {noformat} > Currently, this returns NaN, but I'd prefer it returned approx > 1.154700538379252. > The change I'm proposing would have the code instead read like: > {noformat} > double accum3 = 0.0; > double divisor = variance * FastMath.sqrt(variance); > for (int i = begin; i < begin + length; i++) { > final double d = values[i] - m; > accum3 += d * d * d / divisor; > } > {noformat} > Thanks! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MATH-1253) Skewness could get more precision from slightly reordered code.
[ https://issues.apache.org/jira/browse/MATH-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15023289#comment-15023289 ] Thomas Neidhart commented on MATH-1253: --- just ftr: I did some tests with octave and scipy, and both return NaN or overflow for the input. > Skewness could get more precision from slightly reordered code. > --- > > Key: MATH-1253 > URL: https://issues.apache.org/jira/browse/MATH-1253 > Project: Commons Math > Issue Type: Bug >Affects Versions: 3.5 >Reporter: Bill Murphy >Priority: Minor > > In Skewness.java, approx line 180, there is code like: > {noformat} > double accum3 = 0.0; > for (int i = begin; i < begin + length; i++) { > final double d = values[i] - m; > accum3 += d * d * d; > } > accum3 /= variance * FastMath.sqrt(variance); > {noformat} > If the division was moved into the for loop, accum3 would be less likely to > overflow to Infinity (or -Infinity). This might allow computation to return a > result in a case such as: > {noformat} > double[] numArray = { 1.234E11, 1.234E51, 1.234E101, 1.234E151 }; > Skewnessskew = new Skewness(); > doublesk = skew.evaluate( numArray ); > {noformat} > Currently, this returns NaN, but I'd prefer it returned approx > 1.154700538379252. > The change I'm proposing would have the code instead read like: > {noformat} > double accum3 = 0.0; > double divisor = variance * FastMath.sqrt(variance); > for (int i = begin; i < begin + length; i++) { > final double d = values[i] - m; > accum3 += d * d * d / divisor; > } > {noformat} > Thanks! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MATH-1253) Skewness could get more precision from slightly reordered code.
[ https://issues.apache.org/jira/browse/MATH-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14702114#comment-14702114 ] Bill Murphy commented on MATH-1253: --- I agree with Otmar's comments entirely. I do not yet know a better suggestion. Even a slight extension of the range of correct results seems to me to be an improvement. I would still prefer going forwards with these mods as suggested, but a true fix as outlined by Otmar would be hugely better. Does someone else know enough numerical analysis stuff to provide a mostly-correct approximation over the range of doubles? It is sadly beyond me as of today. > Skewness could get more precision from slightly reordered code. > --- > > Key: MATH-1253 > URL: https://issues.apache.org/jira/browse/MATH-1253 > Project: Commons Math > Issue Type: Bug >Affects Versions: 3.5 >Reporter: Bill Murphy >Priority: Minor > > In Skewness.java, approx line 180, there is code like: > {noformat} > double accum3 = 0.0; > for (int i = begin; i < begin + length; i++) { > final double d = values[i] - m; > accum3 += d * d * d; > } > accum3 /= variance * FastMath.sqrt(variance); > {noformat} > If the division was moved into the for loop, accum3 would be less likely to > overflow to Infinity (or -Infinity). This might allow computation to return a > result in a case such as: > {noformat} > double[] numArray = { 1.234E11, 1.234E51, 1.234E101, 1.234E151 }; > Skewnessskew = new Skewness(); > doublesk = skew.evaluate( numArray ); > {noformat} > Currently, this returns NaN, but I'd prefer it returned approx > 1.154700538379252. > The change I'm proposing would have the code instead read like: > {noformat} > double accum3 = 0.0; > double divisor = variance * FastMath.sqrt(variance); > for (int i = begin; i < begin + length; i++) { > final double d = values[i] - m; > accum3 += d * d * d / divisor; > } > {noformat} > Thanks! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MATH-1253) Skewness could get more precision from slightly reordered code.
[ https://issues.apache.org/jira/browse/MATH-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14662994#comment-14662994 ] Otmar Ertl commented on MATH-1253: -- The given example is problematic anyway, because the numbers (1.234E11, 1.234E51, 1.234E101, 1.234E151) are many orders of magnitude apart, for which cancellation effects are a major issue when using floating point types. I can reproduce the NaN result for values that have either large or small exponents, for example (1.234E148, 1.234E149, 1.234E150, 1.234E151) and (1.234E-148, 1.234E-149, 1.234E-150, 1.234E-151). The proposed code does not avoid this over-/underflows either, because the division is performed after the two multiplications which cause the over-/underflow. However, following code would work: {code} final double variance = (accum - (accum2 * accum2 / length)) / (length - 1); final double stdDevInverse = 1d / FastMath.sqrt(variance); double accum3 = 0.0; for (int i = begin; i < begin + length; i++) { final double d = (values[i] - m) * stdDevInverse; accum3 += d * d * d; } {code} (Here the inverse is precomputed, in order to avoid additional divisions which are likely to be more expensive than multiplications.) Nevertheless, I am not sure, If we really should fix that. The variance calculation is also prone to over-/underflows for numbers greater than sqrt(Double.MAX_VALUE) and smaller than sqrt(Double.MIN_VALUE), respectively. So the fix would "slightly" extend the valid input range from (cbrt(Double.MIN_VALUE), cbrt(Double.MAX_VALUE)) to (sqrt(Double.MIN_VALUE), sqrt(Double.MAX_VALUE)) at the expense of an additional multiplication within the loop. Of course, if there was also an accurate variance calculation method, that avoids over-/underflows for values outside of (sqrt(Double.MIN_VALUE), sqrt(Double.MAX_VALUE)), this fix would make much more sense to me. > Skewness could get more precision from slightly reordered code. > --- > > Key: MATH-1253 > URL: https://issues.apache.org/jira/browse/MATH-1253 > Project: Commons Math > Issue Type: Bug >Affects Versions: 3.5 >Reporter: Bill Murphy >Priority: Minor > > In Skewness.java, approx line 180, there is code like: > {noformat} > double accum3 = 0.0; > for (int i = begin; i < begin + length; i++) { > final double d = values[i] - m; > accum3 += d * d * d; > } > accum3 /= variance * FastMath.sqrt(variance); > {noformat} > If the division was moved into the for loop, accum3 would be less likely to > overflow to Infinity (or -Infinity). This might allow computation to return a > result in a case such as: > {noformat} > double[] numArray = { 1.234E11, 1.234E51, 1.234E101, 1.234E151 }; > Skewnessskew = new Skewness(); > doublesk = skew.evaluate( numArray ); > {noformat} > Currently, this returns NaN, but I'd prefer it returned approx > 1.154700538379252. > The change I'm proposing would have the code instead read like: > {noformat} > double accum3 = 0.0; > double divisor = variance * FastMath.sqrt(variance); > for (int i = begin; i < begin + length; i++) { > final double d = values[i] - m; > accum3 += d * d * d / divisor; > } > {noformat} > Thanks! -- This message was sent by Atlassian JIRA (v6.3.4#6332)