raulcd commented on code in PR #45762:
URL: https://github.com/apache/arrow/pull/45762#discussion_r1995440896
##########
cpp/src/arrow/compute/kernels/aggregate_var_std_internal.h:
##########
@@ -84,14 +84,31 @@ struct Moments {
double Stddev(int ddof) const { return sqrt(Variance(ddof)); }
- double Skew() const {
+ double Skew(bool bias = true) const {
+ double result;
// This may return NaN for m2 == 0 and m3 == 0, which is expected
- return sqrt(count) * m3 / sqrt(m2 * m2 * m2);
+ // or if unbiased and not enough samples (count < 2).
+ if (bias) {
+ result = sqrt(count) * m3 / sqrt(m2 * m2 * m2);
+ } else {
+ result =
+ sqrt(count * (count - 1)) / (count - 2) * (m3 / count) / pow((m2 /
count), 1.5);
+ }
+ return result;
}
- double Kurtosis() const {
+ double Kurtosis(bool bias = true) const {
+ double result;
// This may return NaN for m2 == 0 and m4 == 0, which is expected
- return count * m4 / (m2 * m2) - 3;
+ // or if unbiased and not enough samples (count < 3).
+ if (bias) {
+ result = count * m4 / (m2 * m2) - 3;
+ } else {
+ result = 1.0 / (count - 2) / (count - 3) *
Review Comment:
I think I've tackled this on
https://github.com/apache/arrow/pull/45762/commits/606a663197c56f4bd71378dbc50f0dda44c78ca8
but let me know if there's further improvements you think I could do there!
Thanks for your review @pitrou
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]