[ https://issues.apache.org/jira/browse/ARROW-14778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17446568#comment-17446568 ]
David Li commented on ARROW-14778: ---------------------------------- Ah, it's because we perform all the computations at the input decimal precision/scale (so only 1 decimal digit here). We could perhaps promote it to the max precision/scale, then round it back down? (e.g. for decimal128(5, 1), do computations at decimal128(38, 2) or something then round back down to (5, 1), I haven't thought this though too much, also this would apply to many of the other decimal kernels). > [C++] mean on a decimal truncates and does not round > ---------------------------------------------------- > > Key: ARROW-14778 > URL: https://issues.apache.org/jira/browse/ARROW-14778 > Project: Apache Arrow > Issue Type: Bug > Components: C++ > Reporter: Jonathan Keane > Priority: Major > Labels: query-engine > > {code} > library(arrow, warn.conflicts = FALSE) > library(dplyr, warn.conflicts = FALSE) > df <- data.frame( > x = c(0.1, 0.2, 0.2, 0.2, 0.2) > ) > tab <- Table$create(df) > tab %>% > summarise(mean(x)) %>% > collect() > #> # A tibble: 1 × 1 > #> `mean(x)` > #> <dbl> > #> 1 0.18 > tab %>% > summarise(x = mean(x)) %>% > mutate(x = cast(x, decimal(5, 1))) %>% > collect() > #> # A tibble: 1 × 1 > #> x > #> <dbl> > #> 1 0.2 > tab %>% > mutate(x = cast(x, decimal(5, 1))) %>% > summarise(x = mean(x)) %>% > collect() > #> # A tibble: 1 × 1 > #> x > #> <dbl> > #> 1 0.1 > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)