On 26/03/2014 22:00, peter dalgaard wrote:

On 26 Mar 2014, at 18:24 , Radford Neal <radf...@cs.toronto.edu> wrote:

From: Richard Cotton <richiero...@gmail.com>

The rep function is very versatile, but that versatility comes at a
cost: it takes a bit of effort to learn (and remember) its syntax.
This is a problem, since rep is one of the first functions many
beginners will come across.  Of the three main uses of rep, two have
simpler alternatives.

rep(x, times = ) has rep.int
rep(x, length.out  = ) has rep_len

I think that a rep_each function would be a worthy addition for the
third use case

rep(x, each = )

(It might also be worth having rep_times as a synonym for rep.int.)

I think this is exactly the wrong approach.  Indeed, the aim should be
to get rid of functions like rep.int (or at least discourage their
use, even if they have to be kept for compatibility).

Why is rep_each(x,n) better than rep(x,each=n)?  There is no saving in
typing (which would be trivial anyway).  There *ought* to be no
significant difference in speed (though that seems to have been the
motive for rep.int).  Are you trying to let students learn R without
ever learning about specifying arguments by name?

And where would you stop?  How about seq_by(a,b,s) rather than having
to arduously type seq(a,b,by=s)?  Maybe we should have glm_binomial,
glm_poisson, etc. so we don't have to remember the "family" argument?
This way lies madness...

Spot on.

Well, maybe a slight disagreement: In a weakly typed language like R, you will 
always have performance losses due to type testing and dispatching, and no 
compiler/interpreter is intelligent enough to predict the types so that this 
can be avoided. Some amout of hinting is needed for reliable speedups, either 
by having special functions for simple cases (allowed to make assumptions on 
their inputs), or some sort of #pragma-like construction.

Actually, rep.int seems to be a poor example of this since the speedup is 
pretty negligible unless you do huge amounts of short replicates. I expect that 
the S-PLUS compatibility was the main reason to have it. Case in point:

As the help says:

     Function ‘rep.int’ is a simple case handled by internal code, and
     provided as a separate function partly for S compatibility and
     partly for speed (especially when names can be dropped).

E.g.

> a <- letters[1:10]; names(a) <- a
> system.time(for(i in 1:1000000) rep.int(a,10))
   user  system elapsed
  1.568   0.001   1.574
> system.time(for(i in 1:1000000) rep(a,10))
   user  system elapsed
  2.804   0.002   2.816

There are also rare occasions where it is useful to use rep.int to circumvent method dispatch.

Note that rep() was an interpreted function when that comment was first written, and the gap was much larger then. (Nor was it byte-compiled, nor generic.) For the version of rep in R 0.65.1:

> system.time(for(i in 1:1000000) rep("a",10))
   user  system elapsed
  1.612   0.000   1.616

vs the current

> system.time(for(i in 1:1000000) rep("a",10))
   user  system elapsed
  0.518   0.000   0.519
> system.time(for(i in 1:1000000) rep.int("a",10))
   user  system elapsed
  0.471   0.000   0.473

system.time(for(i in 1:10000000) rep("a",10))
    user  system elapsed
  16.721   0.125  19.037
system.time(for(i in 1:10000000) rep.int("a",10))
    user  system elapsed
  14.356   0.050  14.611
system.time(for(i in 1:1000000) rep("a",1000))
    user  system elapsed
  11.655   2.157  14.263
system.time(for(i in 1:1000000) rep.int("a",1000))
    user  system elapsed
  10.957   1.708  12.917

--
Brian D. Ripley,                  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (se3lf)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to