[Rd] function call overhead

2011-02-16 Thread Paul Gilbert
(subject changed from: RE: [Rd] Avoiding name clashes: opinion on best practice 
naming  conventions)

Dominick

Is this really true? Is there a speed advantage to defining a local function 
this way, say, within another function, and then calling it within a loop 
rather than the original? Do you have data on this?

Paul

> -Original Message-
> From: r-devel-boun...@r-project.org [mailto:r-devel-bounces@r-
> project.org] On Behalf Of Dominick Samperi
> Sent: February 16, 2011 12:44 PM
... 
> Since the resolution of myPkg::foo() occurs at runtime (via a function
> call) instead
> of at compile time (as it would in C++), this practice can introduce a
> significant
> performance hit. This can be avoided by doing something like:
> mylocalfunc <- myPkg::foo
> [tight loop that uses mylocalfunc repeatedly]
> 
> Here mylocalfunc would not be exported, of course.
> 
> Dominick
...


La version française suit le texte anglais.



This email may contain privileged and/or confidential information, and the Bank 
of
Canada does not waive any related rights. Any distribution, use, or copying of 
this
email or the information it contains by other than the intended recipient is
unauthorized. If you received this email in error please delete it immediately 
from
your system and notify the sender promptly by email that you have done so. 



Le présent courriel peut contenir de l'information privilégiée ou 
confidentielle.
La Banque du Canada ne renonce pas aux droits qui s'y rapportent. Toute 
diffusion,
utilisation ou copie de ce courriel ou des renseignements qu'il contient par une
personne autre que le ou les destinataires désignés est interdite. Si vous 
recevez
ce courriel par erreur, veuillez le supprimer immédiatement et envoyer sans 
délai à
l'expéditeur un message électronique pour l'aviser que vous avez éliminé de 
votre
ordinateur toute copie du courriel reçu.
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] function call overhead

2011-02-16 Thread Jeffrey Ryan
Hi Paul,

> `:::`
function (pkg, name)
{
pkg <- as.character(substitute(pkg))
name <- as.character(substitute(name))
get(name, envir = asNamespace(pkg), inherits = FALSE)
}


and

> `::`
function (pkg, name)
{
pkg <- as.character(substitute(pkg))
name <- as.character(substitute(name))
ns <- tryCatch(asNamespace(pkg), hasNoNamespaceError = function(e) NULL)
if (is.null(ns)) {
pos <- match(paste("package", pkg, sep = ":"), search(),
0L)
if (pos == 0)
stop(gettextf("package %s has no name space and is not on
the search path"),
sQuote(pkg), domain = NA)
get(name, pos = pos, inherits = FALSE)
}
else getExportedValue(pkg, name)
}



are the reasons I think.

Jeff

On Wed, Feb 16, 2011 at 12:13 PM, Paul Gilbert
 wrote:
> (subject changed from: RE: [Rd] Avoiding name clashes: opinion on best 
> practice naming  conventions)
>
> Dominick
>
> Is this really true? Is there a speed advantage to defining a local function 
> this way, say, within another function, and then calling it within a loop 
> rather than the original? Do you have data on this?
>
> Paul
>
>> -Original Message-
>> From: r-devel-boun...@r-project.org [mailto:r-devel-bounces@r-
>> project.org] On Behalf Of Dominick Samperi
>> Sent: February 16, 2011 12:44 PM
> ...
>> Since the resolution of myPkg::foo() occurs at runtime (via a function
>> call) instead
>> of at compile time (as it would in C++), this practice can introduce a
>> significant
>> performance hit. This can be avoided by doing something like:
>> mylocalfunc <- myPkg::foo
>> [tight loop that uses mylocalfunc repeatedly]
>>
>> Here mylocalfunc would not be exported, of course.
>>
>> Dominick
> ...
> 
>
> La version française suit le texte anglais.
>
> 
>
> This email may contain privileged and/or confidential information, and the 
> Bank of
> Canada does not waive any related rights. Any distribution, use, or copying 
> of this
> email or the information it contains by other than the intended recipient is
> unauthorized. If you received this email in error please delete it 
> immediately from
> your system and notify the sender promptly by email that you have done so.
>
> 
>
> Le présent courriel peut contenir de l'information privilégiée ou 
> confidentielle.
> La Banque du Canada ne renonce pas aux droits qui s'y rapportent. Toute 
> diffusion,
> utilisation ou copie de ce courriel ou des renseignements qu'il contient par 
> une
> personne autre que le ou les destinataires désignés est interdite. Si vous 
> recevez
> ce courriel par erreur, veuillez le supprimer immédiatement et envoyer sans 
> délai à
> l'expéditeur un message électronique pour l'aviser que vous avez éliminé de 
> votre
> ordinateur toute copie du courriel reçu.
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>



-- 
Jeffrey Ryan
jeffrey.r...@lemnica.com

www.lemnica.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] function call overhead

2011-02-16 Thread Hadley Wickham
On Wed, Feb 16, 2011 at 12:13 PM, Paul Gilbert
 wrote:
> (subject changed from: RE: [Rd] Avoiding name clashes: opinion on best 
> practice naming  conventions)
>
> Dominick
>
> Is this really true? Is there a speed advantage to defining a local function 
> this way, say, within another function, and then calling it within a loop 
> rather than the original? Do you have data on this?

I wondered about this statement too but:

> system.time(replicate(1e4, base::print))
   user  system elapsed
  0.539   0.001   0.541
> system.time(replicate(1e4, print))
   user  system elapsed
  0.013   0.000   0.012

So it is (relatively) significant, although it's not going to make an
impact unless you're doing thousands of function calls.

Hadley


-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] function call overhead

2011-02-16 Thread Dominick Samperi
On Wed, Feb 16, 2011 at 1:13 PM, Paul Gilbert
 wrote:
> (subject changed from: RE: [Rd] Avoiding name clashes: opinion on best 
> practice naming  conventions)
>
> Dominick,
>
> Is this really true? Is there a speed advantage to defining a local function 
> this way, say, within another function, and then calling it within a loop 
> rather than the original? Do you have data on this?
>
> Paul

I worked on an application where a complex characteristic function was
computed over
and over again to compute a Fourier transform, and there was a very
significant performance
penalty to be paid by using myPgk::foo() compared with foo(). It was
recommended on this
list that I try the local assignment trick and it worked great.

Unfortunately this discourages the use of programming styles that are
more explicit and
easier to follow for the human reader. It also complicates the problem
of explicitly specifying
what version of "foo()" you really mean to use.

Dominick

>> -Original Message-
>> From: r-devel-boun...@r-project.org [mailto:r-devel-bounces@r-
>> project.org] On Behalf Of Dominick Samperi
>> Sent: February 16, 2011 12:44 PM
> ...
>> Since the resolution of myPkg::foo() occurs at runtime (via a function
>> call) instead
>> of at compile time (as it would in C++), this practice can introduce a
>> significant
>> performance hit. This can be avoided by doing something like:
>> mylocalfunc <- myPkg::foo
>> [tight loop that uses mylocalfunc repeatedly]
>>
>> Here mylocalfunc would not be exported, of course.
>>
>> Dominick
> ...
> 
>
> La version française suit le texte anglais.
>
> 
>
> This email may contain privileged and/or confidential information, and the 
> Bank of
> Canada does not waive any related rights. Any distribution, use, or copying 
> of this
> email or the information it contains by other than the intended recipient is
> unauthorized. If you received this email in error please delete it 
> immediately from
> your system and notify the sender promptly by email that you have done so.
>
> 
>
> Le présent courriel peut contenir de l'information privilégiée ou 
> confidentielle.
> La Banque du Canada ne renonce pas aux droits qui s'y rapportent. Toute 
> diffusion,
> utilisation ou copie de ce courriel ou des renseignements qu'il contient par 
> une
> personne autre que le ou les destinataires désignés est interdite. Si vous 
> recevez
> ce courriel par erreur, veuillez le supprimer immédiatement et envoyer sans 
> délai à
> l'expéditeur un message électronique pour l'aviser que vous avez éliminé de 
> votre
> ordinateur toute copie du courriel reçu.
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] function call overhead

2011-02-16 Thread Olaf Mersmann
Dear Hadly, dear list,

On Wed, Feb 16, 2011 at 9:53 PM, Hadley Wickham  wrote:
> I wondered about this statement too but:
>
>> system.time(replicate(1e4, base::print))
>   user  system elapsed
>  0.539   0.001   0.541
>> system.time(replicate(1e4, print))
>   user  system elapsed
>  0.013   0.000   0.012

These timings are skewed. Because I too have wondered about this in
the past, I recently published the microbenchmark package which tries
hard to accurately time it takes to evaluate some expression(s). Using
this package I get:

> library("microbenchmark")
> res <- microbenchmark(print, base::print, times=1)
> res
Unit: nanoeconds  ## I've fixed the typo, but not pushed to CRAN
  minlq  medianuq max
print  576568.069   48389
base::print 41763 43357 44278.5 48403 4749851

A better way to look at this is by converting to evaluations per second:

> print(res, unit="eps")
Unit: evaluations per second
min  lq  median  uqmax
print   17543859.65 15384615.38 14705882.35 14492753.62 20665.8538
base::print23944.6423064.3322584.3220659.88   210.5329

Resolving 23000 names per second or ~15M ist quite a dramatic
difference in my world. The timings obtained by

>  system.time(replicate(1e4, base::print))
   User  System verstrichen
  0.475   0.006   0.483
>  system.time(replicate(1e4, print))
   User  System verstrichen
  0.011   0.001   0.014

are skewed by the overhead of replicate() in this case because the
execution time of the expression under test is so short.

Cheers,
Olaf Mersmann

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] function call overhead

2011-02-28 Thread Paul Johnson
Snipping down to bare minimum history before comment:

On Wed, Feb 16, 2011 at 4:28 PM, Olaf Mersmann
 wrote:
> Dear Hadly, dear list,
>
> On Wed, Feb 16, 2011 at 9:53 PM, Hadley Wickham  wrote:
>
>>> system.time(replicate(1e4, base::print))
>>   user  system elapsed
>>  0.539   0.001   0.541
>>> system.time(replicate(1e4, print))
>>   user  system elapsed
>>  0.013   0.000   0.012

>> library("microbenchmark")
>> res <- microbenchmark(print, base::print, times=1)
>> res
>> print(res, unit="eps")
> Unit: evaluations per second
>                    min          lq      median          uq        max
> print       17543859.65 15384615.38 14705882.35 14492753.62 20665.8538
> base::print    23944.64    23064.33    22584.32    20659.88   210.5329
>

I think it is important to say that this slowdown is not unique to R
and is unrelated to the fact that is R  interpreted.  The same happens
in compiled object-oriented languages like C++ or Objective-C. There
is an inherent cost in the runtime system to find a function or method
that is suitable to an object.

In agent-based modeling simulations, we call it the cost of "method
lookup" because the runtime system has to check for the existence of a
method each time it is called for a given object.   There is a
time-saving approach where one can cache the result of the lookup and
then call that result directly each time through the loop.
Implementing this is pretty complicated, however, and it is
discouraged unless you really need it.  It is especially dangerous
because this optimization throws-away the runtime benefit of matching
the correct method to the class of the object.  (See
http://www.mulle-kybernetik.com/artikel/Optimization/opti-3.html,
where it shows how one can even cache C library functions to avoid
lookup overhead. I'm told that the Obj-C 2.0 runtime will try to
optimize this automatically, I've not tested.)

The R solution is achieving that exact same kind of speed-up by saving
the function lookup in a local variable. The R approach, however, is
implemented much more easily than the Objective-C solution. There is
an obvious danger: if the saved method is not appropriate to an object
to which it applies, something unpredictable will happen.

The same is true in C++.  I was fiddling around with the C++ code that
is included with the R package Siena (awesome package, incidentally)
last year and noticed a similar slowdown with method lookup.  In C++,
I was surprised to find a slowdown inside a class using an instance
variable prefixed with  "this.".  For an IVAR, "this.x" and "x" are
the same thing, but to the runtime system, well, there's slowdown in
finding "this" class and getting x, compared to just using  x.  To the
programmer who is trying to be clear and careful, putting "this." on
the front of IVAR is tidy, but it also slows down the runtime a lot.

Hope this is not more confusing than when I started :)

pj
-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] function call overhead

2011-02-28 Thread Dominick Samperi
On Mon, Feb 28, 2011 at 6:37 PM, Paul Johnson  wrote:
> Snipping down to bare minimum history before comment:
>
> On Wed, Feb 16, 2011 at 4:28 PM, Olaf Mersmann
>  wrote:
>> Dear Hadly, dear list,
>>
>> On Wed, Feb 16, 2011 at 9:53 PM, Hadley Wickham  wrote:
>>
 system.time(replicate(1e4, base::print))
>>>   user  system elapsed
>>>  0.539   0.001   0.541
 system.time(replicate(1e4, print))
>>>   user  system elapsed
>>>  0.013   0.000   0.012
>
>>> library("microbenchmark")
>>> res <- microbenchmark(print, base::print, times=1)
>>> res
>>> print(res, unit="eps")
>> Unit: evaluations per second
>>                    min          lq      median          uq        max
>> print       17543859.65 15384615.38 14705882.35 14492753.62 20665.8538
>> base::print    23944.64    23064.33    22584.32    20659.88   210.5329
>>
>
> I think it is important to say that this slowdown is not unique to R
> and is unrelated to the fact that is R  interpreted.  The same happens
> in compiled object-oriented languages like C++ or Objective-C. There
> is an inherent cost in the runtime system to find a function or method
> that is suitable to an object.
>
> In agent-based modeling simulations, we call it the cost of "method
> lookup" because the runtime system has to check for the existence of a
> method each time it is called for a given object.   There is a
> time-saving approach where one can cache the result of the lookup and
> then call that result directly each time through the loop.
> Implementing this is pretty complicated, however, and it is
> discouraged unless you really need it.  It is especially dangerous
> because this optimization throws-away the runtime benefit of matching
> the correct method to the class of the object.  (See
> http://www.mulle-kybernetik.com/artikel/Optimization/opti-3.html,
> where it shows how one can even cache C library functions to avoid
> lookup overhead. I'm told that the Obj-C 2.0 runtime will try to
> optimize this automatically, I've not tested.)
>
> The R solution is achieving that exact same kind of speed-up by saving
> the function lookup in a local variable. The R approach, however, is
> implemented much more easily than the Objective-C solution. There is
> an obvious danger: if the saved method is not appropriate to an object
> to which it applies, something unpredictable will happen.
>
> The same is true in C++.  I was fiddling around with the C++ code that
> is included with the R package Siena (awesome package, incidentally)
> last year and noticed a similar slowdown with method lookup.  In C++,
> I was surprised to find a slowdown inside a class using an instance
> variable prefixed with  "this.".  For an IVAR, "this.x" and "x" are
> the same thing, but to the runtime system, well, there's slowdown in
> finding "this" class and getting x, compared to just using  x.  To the
> programmer who is trying to be clear and careful, putting "this." on
> the front of IVAR is tidy, but it also slows down the runtime a lot.

In the case of namespace qualification (or template
metaprogramming) in C++ the qualification is resolved at
compile time, so there is no performance hit at runtime.

On the cost of this.x vs x, this probably becomes very small (or zero)
when a smart optimizer is used (one that knows that they are the same).

The performance hit results when what appears to be a field access (foo.x)
is really syntactic sugar for message dispatch (a function call), as is often
the case in agent-based modelling (and in languages that follow the Smalltalk
model, or the Actor model).

Dominick

> Hope this is not more confusing than when I started :)
>
> pj
> --
> Paul E. Johnson
> Professor, Political Science
> 1541 Lilac Lane, Room 504
> University of Kansas
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel