[Rd] sum() and mean() for (ALTREP) integer sequences

Viechtbauer, Wolfgang (SP) Thu, 02 Sep 2021 03:56:12 -0700

Hi all,

I am trying to understand the performance of functions applied to integer 
sequences. Consider the following:


### begin example ###

library(lobstr)
library(microbenchmark)

x <- sample(1e6)
obj_size(x)
# 4,000,048 B

y <- 1:1e6
obj_size(y)
# 680 B

# So we can see that 'y' uses ALTREP. These are, as expected, the same:

sum(x)
# [1] 500000500000
sum(y)
# [1] 500000500000

# For 'x', we have to go through the trouble of actually summing up 1e6 
integers.
# For 'y', knowing its form, we really just need to do:

1e6*(1e6+1)/2
# [1] 500000500000

# which should be a whole lot faster. And indeed, it is:

microbenchmark(sum(x),sum(y))

# Unit: nanoseconds                                                             
                                        
#    expr    min       lq      mean   median       uq    max neval cld
#  sum(x) 533452 595204.5 634266.90 613102.5 638271.5 978519   100   b          
                                        
#  sum(y)    183    245.5    446.09    338.5    447.0   3233   100  a

# Now what about mean()?

mean(x)
# [1] 500000.5
mean(y)
# [1] 500000.5

# which is the same as

(1e6+1)/2
# [1] 500000.5

# But this surprised me:

microbenchmark(mean(x),mean(y))

# Unit: microseconds                                                            
                                        
#     expr      min        lq     mean   median       uq      max neval cld
#  mean(x)  935.389  943.4795 1021.423  954.689  985.122 2065.974   100  a      
                                        
#  mean(y) 3500.262 3581.9530 3814.664 3637.984 3734.598 5866.768   100   b

### end example ###

So why is mean() on an ALTREP sequence slower when sum() is faster?

And more generally, when using sum() on an ALTREP integer sequence, does R 
actually use something like n*(n+1)/2 (or generalized to sequences a:b -- 
(a+b)*(b-a+1)/2) for computing the sum? If so, why not (it seems) for mean()?

Best,
Wolfgang

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] sum() and mean() for (ALTREP) integer sequences

Reply via email to