Re: [Rd] Performance issue in stats:::weighted.mean.default method

2015-03-05 Thread Tadeáš Palusga

Oops, such an amateur mistake. Thanks a lot for your quick response.

Regards

TP

On 03/05/2015 06:49 PM, Prof Brian Ripley wrote:

On 05/03/2015 14:55, Tadeáš Palusga wrote:

Hi,
   I'm using this mailing list for the first time and I hope this is the
right one. I don't think that the following is a bug but it can be a
performance issue.

By my opinion, there is no need to filter by [w != 0] in last sum of
weighted.mean.default method defined in
src/library/stats/R/weighted.mean.R. There is no need to do it because
you can always sum zero numbers and filtering is too expensive (see
following benchmark snippet)


But 0*x is not necessarily 0, so there is a need to do it ... see

> w <- c(0, 1)
> x <- c(Inf, 1)
> weighted.mean(x, w)
[1] 1
> fun.new(x, w)
[1] NaN





library(microbenchmark)
x <- sample(500,5000,replace=TRUE)
w <- sample(1000,5000,replace=TRUE)/1000 *
ifelse((sample(10,5000,replace=TRUE) -1) > 0, 1, 0)
fun.new <- function(x,w) {sum(x*w)/sum(w)}
fun.orig  <- function(x,w) {sum(x*w[w!=0])/sum(w)}
print(microbenchmark(
   ORIGFN = fun.orig(x,w),
   NEWFN  = fun.new(x,w),
   times = 1000))

#results:
#Unit: microseconds
#   expr min   lq  mean  median  uq  max neval
# ORIGFN 190.889 194.6590 210.08952 198.847 202.928 1779.789 1000
#  NEWFN  20.857  21.7175  24.61149  22.080  22.594 1744.014 1000




So my suggestion is to remove the w != check




Index: weighted.mean.R
===
--- weighted.mean.R (revision 67941)
+++ weighted.mean.R (working copy)
@@ -29,7 +29,7 @@
  stop("'x' and 'w' must have the same length")
  w <- as.double(w) # avoid overflow in sum for integer weights.
  if (na.rm) { i <- !is.na(x); w <- w[i]; x <- x[i] }
-sum((x*w)[w != 0])/sum(w) # --> NaN in empty case
+sum(x*w)/sum(w) # --> NaN in empty case
  }

  ## see note for ?mean.Date


I hope i'm not missing something - I really don't see the reason to have
this filtration here.

BR

Tadeas 'donarus' Palusga

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel





__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Performance issue in stats:::weighted.mean.default method

2015-03-05 Thread Henrik Bengtsson
See weightedMean() in the matrixStats package.  It's optimized for
data type, speed and memory and implemented in native code so it can
avoid some of these intermediate copies.  It's a few times faster than
weighted.mean[.default]();

library(matrixStats)
library(microbenchmark)
n <- 5000
x <- sample(500,n,replace=TRUE)
w <- sample(1000,n,replace=TRUE)/1000 *
ifelse((sample(10,n,replace=TRUE) -1) > 0, 1, 0)
fun.new <- function(x,w) {sum(x*w)/sum(w)}
fun.orig  <- function(x,w) {sum(x*w[w!=0])/sum(w)}
stats <- microbenchmark(
  weightedMean(x,w),
  weighted.mean(x,w),
  ORIGFN = fun.orig(x,w),
  NEWFN  = fun.new(x,w),
  times = 1000
)

> print(stats, digits=3)
Unit: microseconds
expr   minlq  mean medianuqmax neval
  weightedMean(x, w)  28.7  31.7  33.4   32.9  33.8   81.7  1000
 weighted.mean(x, w) 129.6 141.6 149.6  143.7 147.1 2332.9  1000
  ORIGFN 205.7 222.0 235.0  225.4 231.4 2655.8  1000
   NEWFN  38.9  42.3  44.3   42.8  43.6  385.8  1000

Relative performance will vary with n = length(x).

The weightedMean() function handles zero-weight Inf values:

> w <- c(0, 1)
> x <- c(Inf, 1)
> weighted.mean(x, w)
[1] 1
> fun.new(x, w)
[1] NaN
> weightedMean(x,w)
[1] 1

You'll find more benchmark results on weightedMean() vs
weighted.mean() on
https://github.com/HenrikBengtsson/matrixStats/wiki/weightedMean

/Henrik

On Thu, Mar 5, 2015 at 9:49 AM, Prof Brian Ripley  wrote:
> On 05/03/2015 14:55, Tadeáš Palusga wrote:
>>
>> Hi,
>>I'm using this mailing list for the first time and I hope this is the
>> right one. I don't think that the following is a bug but it can be a
>> performance issue.
>>
>> By my opinion, there is no need to filter by [w != 0] in last sum of
>> weighted.mean.default method defined in
>> src/library/stats/R/weighted.mean.R. There is no need to do it because
>> you can always sum zero numbers and filtering is too expensive (see
>> following benchmark snippet)
>
>
> But 0*x is not necessarily 0, so there is a need to do it ... see
>
>> w <- c(0, 1)
>> x <- c(Inf, 1)
>> weighted.mean(x, w)
> [1] 1
>> fun.new(x, w)
> [1] NaN
>
>
>>
>>
>>
>> library(microbenchmark)
>> x <- sample(500,5000,replace=TRUE)
>> w <- sample(1000,5000,replace=TRUE)/1000 *
>> ifelse((sample(10,5000,replace=TRUE) -1) > 0, 1, 0)
>> fun.new <- function(x,w) {sum(x*w)/sum(w)}
>> fun.orig  <- function(x,w) {sum(x*w[w!=0])/sum(w)}
>> print(microbenchmark(
>>ORIGFN = fun.orig(x,w),
>>NEWFN  = fun.new(x,w),
>>times = 1000))
>>
>> #results:
>> #Unit: microseconds
>> #   expr min   lq  mean  median  uq  max neval
>> # ORIGFN 190.889 194.6590 210.08952 198.847 202.928 1779.789  1000
>> #  NEWFN  20.857  21.7175  24.61149  22.080  22.594 1744.014  1000
>>
>>
>>
>>
>> So my suggestion is to remove the w != check
>>
>>
>>
>>
>> Index: weighted.mean.R
>> ===
>> --- weighted.mean.R (revision 67941)
>> +++ weighted.mean.R (working copy)
>> @@ -29,7 +29,7 @@
>>   stop("'x' and 'w' must have the same length")
>>   w <- as.double(w) # avoid overflow in sum for integer weights.
>>   if (na.rm) { i <- !is.na(x); w <- w[i]; x <- x[i] }
>> -sum((x*w)[w != 0])/sum(w) # --> NaN in empty case
>> +sum(x*w)/sum(w) # --> NaN in empty case
>>   }
>>
>>   ## see note for ?mean.Date
>>
>>
>> I hope i'm not missing something - I really don't see the reason to have
>> this filtration here.
>>
>> BR
>>
>> Tadeas 'donarus' Palusga
>>
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>
>
> --
> Brian D. Ripley,  rip...@stats.ox.ac.uk
> Emeritus Professor of Applied Statistics, University of Oxford
> 1 South Parks Road, Oxford OX1 3TG, UK
>
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Performance issue in stats:::weighted.mean.default method

2015-03-05 Thread Prof Brian Ripley

On 05/03/2015 14:55, Tadeáš Palusga wrote:

Hi,
   I'm using this mailing list for the first time and I hope this is the
right one. I don't think that the following is a bug but it can be a
performance issue.

By my opinion, there is no need to filter by [w != 0] in last sum of
weighted.mean.default method defined in
src/library/stats/R/weighted.mean.R. There is no need to do it because
you can always sum zero numbers and filtering is too expensive (see
following benchmark snippet)


But 0*x is not necessarily 0, so there is a need to do it ... see

> w <- c(0, 1)
> x <- c(Inf, 1)
> weighted.mean(x, w)
[1] 1
> fun.new(x, w)
[1] NaN





library(microbenchmark)
x <- sample(500,5000,replace=TRUE)
w <- sample(1000,5000,replace=TRUE)/1000 *
ifelse((sample(10,5000,replace=TRUE) -1) > 0, 1, 0)
fun.new <- function(x,w) {sum(x*w)/sum(w)}
fun.orig  <- function(x,w) {sum(x*w[w!=0])/sum(w)}
print(microbenchmark(
   ORIGFN = fun.orig(x,w),
   NEWFN  = fun.new(x,w),
   times = 1000))

#results:
#Unit: microseconds
#   expr min   lq  mean  median  uq  max neval
# ORIGFN 190.889 194.6590 210.08952 198.847 202.928 1779.789  1000
#  NEWFN  20.857  21.7175  24.61149  22.080  22.594 1744.014  1000




So my suggestion is to remove the w != check




Index: weighted.mean.R
===
--- weighted.mean.R (revision 67941)
+++ weighted.mean.R (working copy)
@@ -29,7 +29,7 @@
  stop("'x' and 'w' must have the same length")
  w <- as.double(w) # avoid overflow in sum for integer weights.
  if (na.rm) { i <- !is.na(x); w <- w[i]; x <- x[i] }
-sum((x*w)[w != 0])/sum(w) # --> NaN in empty case
+sum(x*w)/sum(w) # --> NaN in empty case
  }

  ## see note for ?mean.Date


I hope i'm not missing something - I really don't see the reason to have
this filtration here.

BR

Tadeas 'donarus' Palusga

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Emeritus Professor of Applied Statistics, University of Oxford
1 South Parks Road, Oxford OX1 3TG, UK

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel