Re: [Rd] RFC: getifexists() {was [Bug 16065] "exists" ...}

2015-01-08 Thread Duncan Murdoch
On 08/01/2015 4:16 AM, Martin Maechler wrote:
> In November, we had a "bug repository conversation"
> with Peter Hagerty and myself:
> 
>   https://bugs.r-project.org/bugzilla/show_bug.cgi?id=16065
> 
> where the bug report title started with
> 
>  --->>  "exists" is a bottleneck for dispatch and package loading, ...
> 
> Peter proposed an extra simplified and henc faster version of exists(),
> and I commented
> 
> > --- Comment #2 from Martin Maechler  ---
> > I'm very grateful that you've started exploring the bottlenecks of 
> loading
> > packages with many S4 classes (and methods)...
> > and I hope we can make real progress there rather sooner than later.
> 
> > OTOH, your `summaryRprof()` in your vignette indicates that exists() 
> may use
> > upto 10% of the time spent in library(reportingTools),  and your speedup
> > proposals of exist()  may go up to ca 30%  which is good and well worth
> > considering,  but still we can only expect 2-3% speedup for package 
> loading
> > which unfortunately is not much.
> 
> > Still I agree it is worth looking at exists() as you did  ... and 
> > consider providing a fast simplified version of it in addition to 
> current
> > exists() [I think].
> 
> > BTW, as we talk about enhancements here, maybe consider a further 
> possibility:
> > My subjective guess is that probably more than half of exists() uses 
> are of the
> > form
> 
> > if(exists(name, where, ...)) {
> >get(name, whare, )
> >..
> > } else { 
> > NULL / error() / .. or similar
> > }
> 
> > i.e. many exists() calls when returning TRUE are immediately followed 
> by the
> > corresponding get() call which repeats quite a bit of the lookup that 
> exists()
> > has done.
> 
> > Instead, I'd imagine a function, say  getifexists(name, ...) that does 
> both at
> > once in the "exists is TRUE" case but in a way we can easily keep the 
> if(.) ..
> > else clause above.  One already existing approach would use
> 
> > if(!inherits(tryCatch(xx <- get(name, where, ...), error=function(e)e), 
> "error")) {
> 
> >   ... (( work with xx )) ...
> 
> > } else  { 
> >NULL / error() / .. or similar
> > }
> 
> > but of course our C implementation would be more efficient and use more 
> concise
> > syntax {which should not look like error handling}.   Follow ups to 
> this idea
> > should really go to R-devel (the mailing list).
> 
> and now I do follow up here myself :
> 
> I found that  'getifexists()' is actually very simple to implement,
> I have already tested it a bit, but not yet committed to R-devel
> (the "R trunk" aka "master branch") because I'd like to get
> public comments {RFC := Request For Comments}.
> 

I don't like the name -- I'd prefer getIfExists.  As Baath (2012, R
Journal) pointed out, R names are very inconsistent in naming
conventions, but lowerCamelCase is the most common choice.  Second most
common is period.separated, so an argument could be made for
get.if.exists, but there's still the possibility of confusion with S3
methods, and users of other languages where "." is an operator find it a
little strange.

If you don't like lowerCamelCase (and a lot of people don't), then I
think underscore_separated is the next best choice, so would use
get_if_exists.

Another possibility is to make no new name at all, and just add an
optional parameter to get() (which if present acts as your value.if.not
parameter, if not present keeps the current "object not found" error).

Duncan Murdoch


> My version of the help file {for both exists() and getifexists()}
> rendered in text is
> 
> -- help(getifexists) ---
> Is an Object Defined?
> 
> Description:
> 
>  Look for an R object of the given name and possibly return it
> 
> Usage:
> 
>  exists(x, where = -1, envir = , frame, mode = "any",
> inherits = TRUE)
>  
>  getifexists(x, where = -1, envir = as.environment(where),
>  mode = "any", inherits = TRUE, value.if.not = NULL)
>  
> Arguments:
> 
>x: a variable name (given as a character string).
> 
>where: where to look for the object (see the details section); if
>   omitted, the function will search as if the name of the
>   object appeared unquoted in an expression.
> 
>envir: an alternative way to specify an environment to look in, but
>   it is usually simpler to just use the ‘where’ argument.
> 
>frame: a frame in the calling list.  Equivalent to giving ‘where’ as
>   ‘sys.frame(frame)’.
> 
> mode: the mode or type of object sought: see the ‘Details’ section.
> 
> inherits: should the enclosing frames of the environment be searched?
> 
> value.if.not: the return value of ‘getifexists(x, *)’ when ‘x’ does not
>   exist.
> 
> Details:
> 
>  The ‘where’ argument can specify the environme

Re: [Rd] RFC: getifexists() {was [Bug 16065] "exists" ...}

2015-01-08 Thread John Nolan
Adding an optional argument to get (and mget) like

val <- get(name, where, ..., value.if.not.found=NULL )   (*)

would be useful for many.  HOWEVER, it is possible that there could be 
some confusion here: (*) can give a NULL because either x exists and 
has value NULL, or because x doesn't exist.   If that matters, the user 
would need to be careful about specifying a value.if.not.found that cannot 
be confused with a valid value of x.  

To avoid this difficulty, perhaps we want both: have Martin's getifexists( ) 
return a list with two values: 
  - a boolean variable 'found'  # = value returned by exists( )
  - a variable 'value'

Then implement get( ) as:

get <- function(x,...,value.if.not.found ) {

  if( missing(value.if.not.found) ) {
a <- getifexists(x,... )
if (!a$found) error("x not found")
  } else {
a <- getifexists(x,...,value.if.not.found )
  }
  return(a$value)
}

Note that value.if.not.found has no default value in above.
It behaves exactly like current get does if value.if.not.found 
is not specified, and if it is specified, it would be faster 
in the common situation mentioned below:   
 if(exists(x,...)) { get(x,...) }

John

P.S. if you like dromedaries call it valueIfNotFound ...

 ..
 John P. Nolan
 Math/Stat Department
 227 Gray Hall,   American University
 4400 Massachusetts Avenue, NW
 Washington, DC 20016-8050

 jpno...@american.edu   voice: 202.885.3140  
 web: academic2.american.edu/~jpnolan
 ..


-"R-devel"  wrote: - 
To: Martin Maechler , R-devel@r-project.org
From: Duncan Murdoch 
Sent by: "R-devel" 
Date: 01/08/2015 06:39AM
Subject: Re: [Rd] RFC: getifexists() {was [Bug 16065] "exists" ...}

On 08/01/2015 4:16 AM, Martin Maechler wrote:
> In November, we had a "bug repository conversation"
> with Peter Hagerty and myself:
> 
>   https://bugs.r-project.org/bugzilla/show_bug.cgi?id=16065
> 
> where the bug report title started with
> 
>  --->>  "exists" is a bottleneck for dispatch and package loading, ...
> 
> Peter proposed an extra simplified and henc faster version of exists(),
> and I commented
> 
> > --- Comment #2 from Martin Maechler  ---
> > I'm very grateful that you've started exploring the bottlenecks of 
> loading
> > packages with many S4 classes (and methods)...
> > and I hope we can make real progress there rather sooner than later.
> 
> > OTOH, your `summaryRprof()` in your vignette indicates that exists() 
> may use
> > upto 10% of the time spent in library(reportingTools),  and your speedup
> > proposals of exist()  may go up to ca 30%  which is good and well worth
> > considering,  but still we can only expect 2-3% speedup for package 
> loading
> > which unfortunately is not much.
> 
> > Still I agree it is worth looking at exists() as you did  ... and 
> > consider providing a fast simplified version of it in addition to 
> current
> > exists() [I think].
> 
> > BTW, as we talk about enhancements here, maybe consider a further 
> possibility:
> > My subjective guess is that probably more than half of exists() uses 
> are of the
> > form
> 
> > if(exists(name, where, ...)) {
> >get(name, whare, )
> >..
> > } else { 
> > NULL / error() / .. or similar
> > }
> 
> > i.e. many exists() calls when returning TRUE are immediately followed 
> by the
> > corresponding get() call which repeats quite a bit of the lookup that 
> exists()
> > has done.
> 
> > Instead, I'd imagine a function, say  getifexists(name, ...) that does 
> both at
> > once in the "exists is TRUE" case but in a way we can easily keep the 
> if(.) ..
> > else clause above.  One already existing approach would use
> 
> > if(!inherits(tryCatch(xx <- get(name, where, ...), error=function(e)e), 
> "error")) {
> 
> >   ... (( work with xx )) ...
> 
> > } else  { 
> >NULL / error() / .. or similar
> > }
> 
> > but of course our C implementation would be more efficient and use more 
> concise
> > syntax {which should not look like error handling}.   Follow ups to 
> this idea
> > should really go to R-devel (the mailing list).
> 
> and now I do follow up here myself :
> 
> I found that  'getifexists()' is actually very simple to implement,
> I have already tested it a bit, but not yet committed to R-devel

Re: [Rd] RFC: getifexists() {was [Bug 16065] "exists" ...}

2015-01-08 Thread Duncan Murdoch

On 08/01/2015 9:03 AM, John Nolan wrote:

Adding an optional argument to get (and mget) like

val <- get(name, where, ..., value.if.not.found=NULL )   (*)


That would be a bad idea, as it would change behaviour of existing uses 
of get().  What I suggested would not
give a default.  If the arg was missing, we'd get the old behaviour, if 
the arg was present, we'd use it.


I'm not sure this is preferable to the separate function 
implementation.  This makes the documentation and implementation of 
get() more complicated, and it would probably be slower for everyone.


Duncan Murdoch



would be useful for many.  HOWEVER, it is possible that there could be
some confusion here: (*) can give a NULL because either x exists and
has value NULL, or because x doesn't exist.   If that matters, the user
would need to be careful about specifying a value.if.not.found that cannot
be confused with a valid value of x.

To avoid this difficulty, perhaps we want both: have Martin's getifexists( )
return a list with two values:
   - a boolean variable 'found'  # = value returned by exists( )
   - a variable 'value'

Then implement get( ) as:

get <- function(x,...,value.if.not.found ) {

   if( missing(value.if.not.found) ) {
 a <- getifexists(x,... )
 if (!a$found) error("x not found")
   } else {
 a <- getifexists(x,...,value.if.not.found )
   }
   return(a$value)
}

Note that value.if.not.found has no default value in above.
It behaves exactly like current get does if value.if.not.found
is not specified, and if it is specified, it would be faster
in the common situation mentioned below:
  if(exists(x,...)) { get(x,...) }

John

P.S. if you like dromedaries call it valueIfNotFound ...

  ..
  John P. Nolan
  Math/Stat Department
  227 Gray Hall,   American University
  4400 Massachusetts Avenue, NW
  Washington, DC 20016-8050

  jpno...@american.edu   voice: 202.885.3140
  web: academic2.american.edu/~jpnolan
  ..


-"R-devel"  wrote: -
To: Martin Maechler , R-devel@r-project.org
From: Duncan Murdoch
Sent by: "R-devel"
Date: 01/08/2015 06:39AM
Subject: Re: [Rd] RFC: getifexists() {was [Bug 16065] "exists" ...}

On 08/01/2015 4:16 AM, Martin Maechler wrote:
> In November, we had a "bug repository conversation"
> with Peter Hagerty and myself:
>
>   https://bugs.r-project.org/bugzilla/show_bug.cgi?id=16065
>
> where the bug report title started with
>
>  --->>  "exists" is a bottleneck for dispatch and package loading, ...
>
> Peter proposed an extra simplified and henc faster version of exists(),
> and I commented
>
> > --- Comment #2 from Martin Maechler  ---
> > I'm very grateful that you've started exploring the bottlenecks of 
loading
> > packages with many S4 classes (and methods)...
> > and I hope we can make real progress there rather sooner than later.
>
> > OTOH, your `summaryRprof()` in your vignette indicates that exists() 
may use
> > upto 10% of the time spent in library(reportingTools),  and your speedup
> > proposals of exist()  may go up to ca 30%  which is good and well worth
> > considering,  but still we can only expect 2-3% speedup for package 
loading
> > which unfortunately is not much.
>
> > Still I agree it is worth looking at exists() as you did  ... and
> > consider providing a fast simplified version of it in addition to 
current
> > exists() [I think].
>
> > BTW, as we talk about enhancements here, maybe consider a further 
possibility:
> > My subjective guess is that probably more than half of exists() uses 
are of the
> > form
>
> > if(exists(name, where, ...)) {
> >get(name, whare, )
> >..
> > } else {
> > NULL / error() / .. or similar
> > }
>
> > i.e. many exists() calls when returning TRUE are immediately followed 
by the
> > corresponding get() call which repeats quite a bit of the lookup that 
exists()
> > has done.
>
> > Instead, I'd imagine a function, say  getifexists(name, ...) that does 
both at
> > once in the "exists is TRUE" case but in a way we can easily keep the 
if(.) ..
> > else clause above.  One already existing approach would use
>
> > if(!inherits(tryCatch(xx <- get(name, where, ...), error=function(e)e), 
"error")) {
>
> >   ... (( work with xx )) ...
>
> > } else  {
> >NULL / error() / .. or similar
> > }
>
> > but of course our C implementation would

Re: [Rd] RFC: getifexists() {was [Bug 16065] "exists" ...}

2015-01-08 Thread Michael Lawrence
If we do add an argument to get(), then it should be named consistently
with the ifnotfound argument of mget(). As mentioned, the possibility of a
NULL value is problematic. One solution is a sentinel value that indicates
an unbound value (like R_UnboundValue).

But another idea (and one pretty similar to John's) is to follow the SYMSXP
design at the C level, where there is a structure that points to the name
and a value. We already have SYMSXPs at the R level of course (name
objects) but they do not provide access to the value, which is typically
R_UnboundValue. But this does not even need to be implemented with SYMSXP.
The design would allow something like:

binding <- getBinding("x", env)
if (hasValue(binding)) {
  x <- value(binding) # throws an error if none
  message(name(binding), "has value", x)
}

That I think it is a bit verbose but readable and could be made fast. And I
think binding objects would be useful in other ways, as they are
essentially a "named object". For example, when iterating over an
environment.

Michael




On Thu, Jan 8, 2015 at 6:03 AM, John Nolan  wrote:

> Adding an optional argument to get (and mget) like
>
> val <- get(name, where, ..., value.if.not.found=NULL )   (*)
>
> would be useful for many.  HOWEVER, it is possible that there could be
> some confusion here: (*) can give a NULL because either x exists and
> has value NULL, or because x doesn't exist.   If that matters, the user
> would need to be careful about specifying a value.if.not.found that cannot
> be confused with a valid value of x.
>
> To avoid this difficulty, perhaps we want both: have Martin's getifexists(
> )
> return a list with two values:
>   - a boolean variable 'found'  # = value returned by exists( )
>   - a variable 'value'
>
> Then implement get( ) as:
>
> get <- function(x,...,value.if.not.found ) {
>
>   if( missing(value.if.not.found) ) {
> a <- getifexists(x,... )
> if (!a$found) error("x not found")
>   } else {
> a <- getifexists(x,...,value.if.not.found )
>   }
>   return(a$value)
> }
>
> Note that value.if.not.found has no default value in above.
> It behaves exactly like current get does if value.if.not.found
> is not specified, and if it is specified, it would be faster
> in the common situation mentioned below:
>  if(exists(x,...)) { get(x,...) }
>
> John
>
> P.S. if you like dromedaries call it valueIfNotFound ...
>
>  ..
>  John P. Nolan
>  Math/Stat Department
>  227 Gray Hall,   American University
>  4400 Massachusetts Avenue, NW
>  Washington, DC 20016-8050
>
>  jpno...@american.edu   voice: 202.885.3140
>  web: academic2.american.edu/~jpnolan
>  ..................
>
>
> -"R-devel"  wrote: -
> To: Martin Maechler , R-devel@r-project.org
> From: Duncan Murdoch
> Sent by: "R-devel"
> Date: 01/08/2015 06:39AM
> Subject: Re: [Rd] RFC: getifexists() {was [Bug 16065] "exists" ...}
>
> On 08/01/2015 4:16 AM, Martin Maechler wrote:
> > In November, we had a "bug repository conversation"
> > with Peter Hagerty and myself:
> >
> >   https://bugs.r-project.org/bugzilla/show_bug.cgi?id=16065
> >
> > where the bug report title started with
> >
> >  --->>  "exists" is a bottleneck for dispatch and package loading, ...
> >
> > Peter proposed an extra simplified and henc faster version of exists(),
> > and I commented
> >
> > > --- Comment #2 from Martin Maechler 
> ---
> > > I'm very grateful that you've started exploring the bottlenecks of
> loading
> > > packages with many S4 classes (and methods)...
> > > and I hope we can make real progress there rather sooner than
> later.
> >
> > > OTOH, your `summaryRprof()` in your vignette indicates that
> exists() may use
> > > upto 10% of the time spent in library(reportingTools),  and your
> speedup
> > > proposals of exist()  may go up to ca 30%  which is good and well
> worth
> > > considering,  but still we can only expect 2-3% speedup for
> package loading
> > > which unfortunately is not much.
> >
> > > Still I agree it is worth looking at exists() as you did  ... and
> > > consider providing a fast simplified version of it in addition to
> current
> > > exists() [I think].
> >
> > > BTW, as we talk about enhancements here, maybe consider a further
> possibility:
> > > My subjective g

Re: [Rd] RFC: getifexists() {was [Bug 16065] "exists" ...}

2015-01-08 Thread Martin Maechler

> Adding an optional argument to get (and mget) like
> val <- get(name, where, ..., value.if.not.found=NULL )   (*)

> would be useful for many.  HOWEVER, it is possible that there could be 
> some confusion here: (*) can give a NULL because either x exists and 
> has value NULL, or because x doesn't exist.   If that matters, the user 
> would need to be careful about specifying a value.if.not.found that cannot 
> be confused with a valid value of x.  

Exactly -- well, of course: That problem { NULL can be the legit value of what 
you
want to get() } was the only reason to have a 'value.if.not' argument at all. 

Note that this is not about a universal replacement of 
the  if(exists(..)) { .. get(..) } idiom, but rather a
replacement of these in the cases where speed matters very much,
which is e.g. in the low level support code for S4 method dispatch.

'value.if.not.found':
Note that CRAN checks requires all arguments to be written in
full length.  Even though we have auto completion in ESS,
Rstudio or other good R IDE's,  I very much like to keep
function calls somewhat compact.

And yes, as you mention the dromedars aka 2-hump camels:  
getIfExist is already horrible to my taste (and "_" is not S-like; 
yes that's all very much a matter of taste and yes I'm from the
20th century).

> To avoid this difficulty, perhaps we want both: have Martin's getifexists( ) 
> return a list with two values: 
>   - a boolean variable 'found'  # = value returned by exists( )
>   - a variable 'value'

> Then implement get( ) as:

> get <- function(x,...,value.if.not.found ) {

>   if( missing(value.if.not.found) ) {
> a <- getifexists(x,... )
> if (!a$found) error("x not found")
>   } else {
> a <- getifexists(x,...,value.if.not.found )
>   }
>   return(a$value)
> }

Interesting...
Note that the above get() implementation would just be "conceptually", as 
all of this is also quite a bit about speed, and we do the
different cases in C anyway [via 'op' code].

> Note that value.if.not.found has no default value in above.
> It behaves exactly like current get does if value.if.not.found 
> is not specified, and if it is specified, it would be faster 
> in the common situation mentioned below:   
>  if(exists(x,...)) { get(x,...) }

Good... Let's talk about your getifexists() as I argue we'd keep
get() exactly as it is now anyway, if we use a new 3rd function (I keep
calling 'getifexists()' for now):

I think in that case, getifexists() would not even *need* an argument 
'value.if.not' (or 'value.if.not.found'); it rather would return a 
  list(found = *, value = *)
in any case.
Alternatively, it could return
  structure(, value = *)

In the first case, our main use case would be

  if((r <- getifexists(x, *))$found) {
 ## work with  r$value
  }

in the 2nd case {structure} :

  if((r <- getifexists(x, *))) {
 ## work with  attr(r,"value")
  }

I think that (both cases) would still be a bit slower (for the above
most important use case) but probably not much
and it would like slightly more readable than my

   if (!is.null(r <- getifexists(x, *))) {
  ## work with  r
   }

After all of this, I think I'd still somewhat prefer my original proposal,
but not strongly -- I had originally also thought of returning the
two parts explicitly, but then tended to prefer the version that
behaved exactly like get() in the case the object is found.

... Nice interesting ideas! ... 
let the proposals and consideration flow ...

Martin


> John

> P.S. if you like dromedaries call it valueIfNotFound ...

:-) ;-)  
I don't .. as I said above, I already strongly dislike more than one hump. 
[ Each capital is one key stroke ("Shift") more ,
  and each "_" is two key strokes more on most key boards...,
  and I do like identifiers that I can also quickly pronounce on
  the phone or in teaching .. ]

>  ..
>  John P. Nolan
>  Math/Stat Department
>  227 Gray Hall,   American University
>  4400 Massachusetts Avenue, NW
>  Washington, DC 20016-8050
>  ..


> -"R-devel"  wrote: - 
> To: Martin Maechler , R-devel@r-project.org
> From: Duncan Murdoch 
> Sent by: "R-devel" 
> Date: 01/08/2015 06:39AM
> Subject: Re: [Rd] RFC: getifexists() {was [Bug 16065] "exists" ...}

> On 08/01/2015 4:16 AM, Martin Maechler wrote:
> > In November, we had a "bug repository conversation"
> > with Peter Hagerty and myself:
> > 
> >   https://bugs.r-project.org/bugzilla/show_bug.cgi?id=1606

Re: [Rd] RFC: getifexists() {was [Bug 16065] "exists" ...}

2015-01-08 Thread luke-tierney

On Thu, 8 Jan 2015, Michael Lawrence wrote:


If we do add an argument to get(), then it should be named consistently
with the ifnotfound argument of mget(). As mentioned, the possibility of a
NULL value is problematic. One solution is a sentinel value that indicates
an unbound value (like R_UnboundValue).


A null default is fine -- it's a default; if it isn't right for a
particular case you can provide something else.



But another idea (and one pretty similar to John's) is to follow the SYMSXP
design at the C level, where there is a structure that points to the name
and a value. We already have SYMSXPs at the R level of course (name
objects) but they do not provide access to the value, which is typically
R_UnboundValue. But this does not even need to be implemented with SYMSXP.
The design would allow something like:

binding <- getBinding("x", env)
if (hasValue(binding)) {
 x <- value(binding) # throws an error if none
 message(name(binding), "has value", x)
}

That I think it is a bit verbose but readable and could be made fast. And I
think binding objects would be useful in other ways, as they are
essentially a "named object". For example, when iterating over an
environment.


This would need a lot more thought. Directly exposing the internals is
definitely not something we want to do as we may well want to change
that design. But there are lots of other corner issues that would have
to be thought through before going forward, such as what happens if an
rm occurs between obtaining a binding object and doing something with
it. Serialization would also need thinking through. This doesn't seem
like a worthwhile place to spend our efforts to me.

Adding getIfExists, or .get, or get0, or whatever seems fine. Adding
an argument to get() with missing giving current behavior may be OK
too. Rewriting exists and get as .Primitives may be sufficient though.

Best,

luke



Michael




On Thu, Jan 8, 2015 at 6:03 AM, John Nolan  wrote:


Adding an optional argument to get (and mget) like

val <- get(name, where, ..., value.if.not.found=NULL )   (*)

would be useful for many.  HOWEVER, it is possible that there could be
some confusion here: (*) can give a NULL because either x exists and
has value NULL, or because x doesn't exist.   If that matters, the user
would need to be careful about specifying a value.if.not.found that cannot
be confused with a valid value of x.

To avoid this difficulty, perhaps we want both: have Martin's getifexists(
)
return a list with two values:
  - a boolean variable 'found'  # = value returned by exists( )
  - a variable 'value'

Then implement get( ) as:

get <- function(x,...,value.if.not.found ) {

  if( missing(value.if.not.found) ) {
a <- getifexists(x,... )
if (!a$found) error("x not found")
  } else {
a <- getifexists(x,...,value.if.not.found )
  }
  return(a$value)
}

Note that value.if.not.found has no default value in above.
It behaves exactly like current get does if value.if.not.found
is not specified, and if it is specified, it would be faster
in the common situation mentioned below:
 if(exists(x,...)) { get(x,...) }

John

P.S. if you like dromedaries call it valueIfNotFound ...

 ..
 John P. Nolan
 Math/Stat Department
 227 Gray Hall,   American University
 4400 Massachusetts Avenue, NW
 Washington, DC 20016-8050

 jpno...@american.edu   voice: 202.885.3140
 web: academic2.american.edu/~jpnolan
 ..


-"R-devel"  wrote: -
To: Martin Maechler , R-devel@r-project.org
From: Duncan Murdoch
Sent by: "R-devel"
Date: 01/08/2015 06:39AM
Subject: Re: [Rd] RFC: getifexists() {was [Bug 16065] "exists" ...}

On 08/01/2015 4:16 AM, Martin Maechler wrote:
> In November, we had a "bug repository conversation"
> with Peter Hagerty and myself:
>
>   https://bugs.r-project.org/bugzilla/show_bug.cgi?id=16065
>
> where the bug report title started with
>
>  --->>  "exists" is a bottleneck for dispatch and package loading, ...
>
> Peter proposed an extra simplified and henc faster version of exists(),
> and I commented
>
> > --- Comment #2 from Martin Maechler 
---
> > I'm very grateful that you've started exploring the bottlenecks of
loading
> > packages with many S4 classes (and methods)...
> > and I hope we can make real progress there rather sooner than
later.
>
> > OTOH, your `summaryRprof()` in your vignette indicates that
exists() may use
> > upto 10% of the time spent in library(reportingTools),  and your
speedup
> > proposals of exist()  may go up to ca 30%  which is good and well
worth
> > considering,  but still we can only expect 2-3% speedup

Re: [Rd] RFC: getifexists() {was [Bug 16065] "exists" ...}

2015-01-08 Thread Jeroen Ooms
On Thu, Jan 8, 2015 at 6:36 AM, Duncan Murdoch  wrote:
>> val <- get(name, where, ..., value.if.not.found=NULL )   (*)
>
> That would be a bad idea, as it would change behaviour of existing uses of
> get().

Another approach would be if the "not found" behavior consists of a
callback, e.g. an expression or function:

  get(name, where, ..., not.found=stop("object ", name, " not found"))

This would cover the case of not.found=NULL, but also allows for
writing code with syntax similar to tryCatch

  obj <- get("foo", not.found = someDefaultValue())

Not sure what this would do to performance though.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] RFC: getifexists() {was [Bug 16065] "exists" ...}

2015-01-08 Thread Peter Haverty
 in above.
>>> It behaves exactly like current get does if value.if.not.found
>>> is not specified, and if it is specified, it would be faster
>>> in the common situation mentioned below:
>>>  if(exists(x,...)) { get(x,...) }
>>>
>>> John
>>>
>>> P.S. if you like dromedaries call it valueIfNotFound ...
>>>
>>>  ..
>>>  John P. Nolan
>>>  Math/Stat Department
>>>  227 Gray Hall,   American University
>>>  4400 Massachusetts Avenue, NW
>>>  Washington, DC 20016-8050
>>>
>>>  jpno...@american.edu   voice: 202.885.3140
>>>  web: academic2.american.edu/~jpnolan
>>>  ..
>>>
>>>
>>> -"R-devel"  wrote: -
>>> To: Martin Maechler , R-devel@r-project.org
>>> From: Duncan Murdoch
>>> Sent by: "R-devel"
>>> Date: 01/08/2015 06:39AM
>>> Subject: Re: [Rd] RFC: getifexists() {was [Bug 16065] "exists" ...}
>>>
>>> On 08/01/2015 4:16 AM, Martin Maechler wrote:
>>> > In November, we had a "bug repository conversation"
>>> > with Peter Hagerty and myself:
>>> >
>>> >   https://bugs.r-project.org/bugzilla/show_bug.cgi?id=16065
>>> >
>>> > where the bug report title started with
>>> >
>>> >  --->>  "exists" is a bottleneck for dispatch and package loading, ...
>>> >
>>> > Peter proposed an extra simplified and henc faster version of exists(),
>>> > and I commented
>>> >
>>> > > --- Comment #2 from Martin Maechler 
>>> ---
>>> > > I'm very grateful that you've started exploring the bottlenecks
>>> of
>>> loading
>>> > > packages with many S4 classes (and methods)...
>>> > > and I hope we can make real progress there rather sooner than
>>> later.
>>> >
>>> > > OTOH, your `summaryRprof()` in your vignette indicates that
>>> exists() may use
>>> > > upto 10% of the time spent in library(reportingTools),  and your
>>> speedup
>>> > > proposals of exist()  may go up to ca 30%  which is good and well
>>> worth
>>> > > considering,  but still we can only expect 2-3% speedup for
>>> package loading
>>> > > which unfortunately is not much.
>>> >
>>> > > Still I agree it is worth looking at exists() as you did  ... and
>>> > > consider providing a fast simplified version of it in addition to
>>> current
>>> > > exists() [I think].
>>> >
>>> > > BTW, as we talk about enhancements here, maybe consider a further
>>> possibility:
>>> > > My subjective guess is that probably more than half of exists()
>>> uses are of the
>>> > > form
>>> >
>>> > > if(exists(name, where, ...)) {
>>> > >get(name, whare, )
>>> > >..
>>> > > } else {
>>> > > NULL / error() / .. or similar
>>> > > }
>>> >
>>> > > i.e. many exists() calls when returning TRUE are immediately
>>> followed by the
>>> > > corresponding get() call which repeats quite a bit of the lookup
>>> that exists()
>>> > > has done.
>>> >
>>> > > Instead, I'd imagine a function, say  getifexists(name, ...) that
>>> does both at
>>> > > once in the "exists is TRUE" case but in a way we can easily keep
>>> the if(.) ..
>>> > > else clause above.  One already existing approach would use
>>> >
>>> > > if(!inherits(tryCatch(xx <- get(name, where, ...),
>>> error=function(e)e), "error")) {
>>> >
>>> > >   ... (( work with xx )) ...
>>> >
>>> > > } else  {
>>> > >NULL / error() / .. or similar
>>> > > }
>>> >
>>> > > but of course our C implementation would be more efficient and
>>> use
>>> more concise
>>> > > syntax {which should not look like error handling}.   Follow ups
>>> to this idea
>>> > > should really go to R-deve

Re: [Rd] RFC: getifexists() {was [Bug 16065] "exists" ...}

2015-01-08 Thread Peter Haverty
Michael's idea has an interesting bonus that he and I discussed earlier.
It would be very convenient to have a container of key/value pairs.  I
imagine many people often write this:

x - mapply( names(x), x, FUN=function(k,v) { # work with key and value }

especially ex perl people accustomed to

while ( ($key, $value) = each( some_hash ) { }

Perhaps there is room for additional discussion of using lists of SYMSXPs
in this manner. (If SYMSXPs are not that safe, perhaps a looping construct
for named vectors that gave the illusion iterating over a list of
two-tuples.)



Pete


Peter M. Haverty, Ph.D.
Genentech, Inc.
phave...@gene.com

On Thu, Jan 8, 2015 at 11:57 AM,  wrote:

> On Thu, 8 Jan 2015, Michael Lawrence wrote:
>
>  If we do add an argument to get(), then it should be named consistently
>> with the ifnotfound argument of mget(). As mentioned, the possibility of a
>> NULL value is problematic. One solution is a sentinel value that indicates
>> an unbound value (like R_UnboundValue).
>>
>
> A null default is fine -- it's a default; if it isn't right for a
> particular case you can provide something else.
>
>
>> But another idea (and one pretty similar to John's) is to follow the
>> SYMSXP
>> design at the C level, where there is a structure that points to the name
>> and a value. We already have SYMSXPs at the R level of course (name
>> objects) but they do not provide access to the value, which is typically
>> R_UnboundValue. But this does not even need to be implemented with SYMSXP.
>> The design would allow something like:
>>
>> binding <- getBinding("x", env)
>> if (hasValue(binding)) {
>>  x <- value(binding) # throws an error if none
>>  message(name(binding), "has value", x)
>> }
>>
>> That I think it is a bit verbose but readable and could be made fast. And
>> I
>> think binding objects would be useful in other ways, as they are
>> essentially a "named object". For example, when iterating over an
>> environment.
>>
>
> This would need a lot more thought. Directly exposing the internals is
> definitely not something we want to do as we may well want to change
> that design. But there are lots of other corner issues that would have
> to be thought through before going forward, such as what happens if an
> rm occurs between obtaining a binding object and doing something with
> it. Serialization would also need thinking through. This doesn't seem
> like a worthwhile place to spend our efforts to me.
>
> Adding getIfExists, or .get, or get0, or whatever seems fine. Adding
> an argument to get() with missing giving current behavior may be OK
> too. Rewriting exists and get as .Primitives may be sufficient though.
>
> Best,
>
> luke
>
>
>  Michael
>>
>>
>>
>>
>> On Thu, Jan 8, 2015 at 6:03 AM, John Nolan  wrote:
>>
>>  Adding an optional argument to get (and mget) like
>>>
>>> val <- get(name, where, ..., value.if.not.found=NULL )   (*)
>>>
>>> would be useful for many.  HOWEVER, it is possible that there could be
>>> some confusion here: (*) can give a NULL because either x exists and
>>> has value NULL, or because x doesn't exist.   If that matters, the user
>>> would need to be careful about specifying a value.if.not.found that
>>> cannot
>>> be confused with a valid value of x.
>>>
>>> To avoid this difficulty, perhaps we want both: have Martin's
>>> getifexists(
>>> )
>>> return a list with two values:
>>>   - a boolean variable 'found'  # = value returned by exists( )
>>>   - a variable 'value'
>>>
>>> Then implement get( ) as:
>>>
>>> get <- function(x,...,value.if.not.found ) {
>>>
>>>   if( missing(value.if.not.found) ) {
>>> a <- getifexists(x,... )
>>> if (!a$found) error("x not found")
>>>   } else {
>>> a <- getifexists(x,...,value.if.not.found )
>>>   }
>>>   return(a$value)
>>> }
>>>
>>> Note that value.if.not.found has no default value in above.
>>> It behaves exactly like current get does if value.if.not.found
>>> is not specified, and if it is specified, it would be faster
>>> in the common situation mentioned below:
>>>      if(exists(x,...)) { get(x,...) }
>>>
>>> John
>>>
>>> P.S. if you like dromedaries call it valueIfNotFound ...
>>>
>>>  ..
>&g

Re: [Rd] RFC: getifexists() {was [Bug 16065] "exists" ...}

2015-01-08 Thread Michael Lawrence
On Thu, Jan 8, 2015 at 11:57 AM,  wrote:

> On Thu, 8 Jan 2015, Michael Lawrence wrote:
>
>  If we do add an argument to get(), then it should be named consistently
>> with the ifnotfound argument of mget(). As mentioned, the possibility of a
>> NULL value is problematic. One solution is a sentinel value that indicates
>> an unbound value (like R_UnboundValue).
>>
>
> A null default is fine -- it's a default; if it isn't right for a
> particular case you can provide something else.
>
>
>> But another idea (and one pretty similar to John's) is to follow the
>> SYMSXP
>> design at the C level, where there is a structure that points to the name
>> and a value. We already have SYMSXPs at the R level of course (name
>> objects) but they do not provide access to the value, which is typically
>> R_UnboundValue. But this does not even need to be implemented with SYMSXP.
>> The design would allow something like:
>>
>> binding <- getBinding("x", env)
>> if (hasValue(binding)) {
>>  x <- value(binding) # throws an error if none
>>  message(name(binding), "has value", x)
>> }
>>
>> That I think it is a bit verbose but readable and could be made fast. And
>> I
>> think binding objects would be useful in other ways, as they are
>> essentially a "named object". For example, when iterating over an
>> environment.
>>
>
> This would need a lot more thought. Directly exposing the internals is
> definitely not something we want to do as we may well want to change
> that design. But there are lots of other corner issues that would have
> to be thought through before going forward, such as what happens if an
> rm occurs between obtaining a binding object and doing something with
> it. Serialization would also need thinking through. This doesn't seem
> like a worthwhile place to spend our efforts to me.
>
>

Just wanted to be clear that I was not suggesting to expose any internals.
We could implement the behavior using SYMSXP, or not. Nor would the binding
need to be mutable. The binding would be considered independent of the
environment from which it was retrieved. As Pete has mentioned, it could be
a useful abstraction to have in general.


> Adding getIfExists, or .get, or get0, or whatever seems fine. Adding
> an argument to get() with missing giving current behavior may be OK
> too. Rewriting exists and get as .Primitives may be sufficient though.
>
> Best,
>
> luke
>
>
>  Michael
>>
>>
>>
>>
>> On Thu, Jan 8, 2015 at 6:03 AM, John Nolan  wrote:
>>
>>  Adding an optional argument to get (and mget) like
>>>
>>> val <- get(name, where, ..., value.if.not.found=NULL )   (*)
>>>
>>> would be useful for many.  HOWEVER, it is possible that there could be
>>> some confusion here: (*) can give a NULL because either x exists and
>>> has value NULL, or because x doesn't exist.   If that matters, the user
>>> would need to be careful about specifying a value.if.not.found that
>>> cannot
>>> be confused with a valid value of x.
>>>
>>> To avoid this difficulty, perhaps we want both: have Martin's
>>> getifexists(
>>> )
>>> return a list with two values:
>>>   - a boolean variable 'found'  # = value returned by exists( )
>>>   - a variable 'value'
>>>
>>> Then implement get( ) as:
>>>
>>> get <- function(x,...,value.if.not.found ) {
>>>
>>>   if( missing(value.if.not.found) ) {
>>> a <- getifexists(x,... )
>>> if (!a$found) error("x not found")
>>>   } else {
>>> a <- getifexists(x,...,value.if.not.found )
>>>   }
>>>   return(a$value)
>>> }
>>>
>>> Note that value.if.not.found has no default value in above.
>>> It behaves exactly like current get does if value.if.not.found
>>> is not specified, and if it is specified, it would be faster
>>> in the common situation mentioned below:
>>>      if(exists(x,...)) { get(x,...) }
>>>
>>> John
>>>
>>> P.S. if you like dromedaries call it valueIfNotFound ...
>>>
>>>  ..
>>>  John P. Nolan
>>>  Math/Stat Department
>>>  227 Gray Hall,   American University
>>>  4400 Massachusetts Avenue, NW
>>>  Washington, DC 20016-8050
>>>
>>>  jpno...@american.edu   voice: 202.885.3140
>>>  web: academic2.american.edu/~jpnolan
&g

Re: [Rd] RFC: getifexists() {was [Bug 16065] "exists" ...}

2015-01-09 Thread Martin Maechler
> Michael Lawrence 
> on Thu, 8 Jan 2015 14:02:26 -0800 writes:

> On Thu, Jan 8, 2015 at 11:57 AM,  wrote:
>> On Thu, 8 Jan 2015, Michael Lawrence wrote:
>> 
>> If we do add an argument to get(), then it should be named consistently
>>> with the ifnotfound argument of mget(). As mentioned, the possibility 
of a
>>> NULL value is problematic. One solution is a sentinel value that 
indicates
>>> an unbound value (like R_UnboundValue).
>>> 
>> 
>> A null default is fine -- it's a default; if it isn't right for a
>> particular case you can provide something else.
>> 
>> 
>>> But another idea (and one pretty similar to John's) is to follow the
>>> SYMSXP
>>> design at the C level, where there is a structure that points to the 
name
>>> and a value. We already have SYMSXPs at the R level of course (name
>>> objects) but they do not provide access to the value, which is typically
>>> R_UnboundValue. But this does not even need to be implemented with 
SYMSXP.
>>> The design would allow something like:
>>> 
>>> binding <- getBinding("x", env)
>>> if (hasValue(binding)) {
>>>   x <- value(binding) # throws an error if none
>>>   message(name(binding), "has value", x)
>>> }
>>> 
>>> That I think it is a bit verbose but readable and could be made fast. 
And
>>> I
>>> think binding objects would be useful in other ways, as they are
>>> essentially a "named object". For example, when iterating over an
>>> environment.
>>> 
>> 
>> This would need a lot more thought. Directly exposing the internals is
>> definitely not something we want to do as we may well want to change
>> that design. But there are lots of other corner issues that would have
>> to be thought through before going forward, such as what happens if an
>> rm occurs between obtaining a binding object and doing something with
>> it. Serialization would also need thinking through. This doesn't seem
>> like a worthwhile place to spend our efforts to me.
>> 
>> 

> Just wanted to be clear that I was not suggesting to expose any internals.
> We could implement the behavior using SYMSXP, or not. Nor would the 
binding
> need to be mutable. The binding would be considered independent of the
> environment from which it was retrieved. As Pete has mentioned, it could 
be
> a useful abstraction to have in general.

It could be, indeed.   Luke's advice (above) and my own gut
feeling do tell me that this is a much larger step than solving
the "getIfExists()" problem.  
In the R development cycle I'd think that it should go to the
next (2015-2016) "3.3" cycle, rather than the current "3.2" one
with goal in April.

>> Adding getIfExists, or .get, or get0, or whatever seems fine. Adding
>> an argument to get() with missing giving current behavior may be OK
>> too. Rewriting exists and get as .Primitives may be sufficient though.

Thank you, Luke.  Given that, Duncan's and the other inputs,
I think we should go for a new function -- .Internal() for now.

To Pete's point about arguments, I did drop 'frame' on purpose 
and indeed we could try to do away with 'where/pos' as well and
have the environment only specified by 'envir'.

Name: I like  get0() for its brevity and prefer it to .get().

Let me expose my current implementation on R-devel ... and start
using it in the 'methods' package so we (Pete H. :-) can start
measuring its impact.

Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] RFC: getifexists() {was [Bug 16065] "exists" ...}

2015-01-09 Thread Martin Maechler
> Martin Maechler 
> on Fri, 9 Jan 2015 14:00:38 +0100 writes:

> Michael Lawrence 
> on Thu, 8 Jan 2015 14:02:26 -0800 writes:

>> On Thu, Jan 8, 2015 at 11:57 AM,  wrote:
>>> On Thu, 8 Jan 2015, Michael Lawrence wrote:
>>> 
>>> If we do add an argument to get(), then it should be named consistently
 with the ifnotfound argument of mget(). 

You are right... I forgot to say so earlier in the thread.

The definition now is

get0 <- function (x, envir = pos.to.env(-1L), mode = "any", inherits = TRUE,
  ifnotfound = NULL)
.Internal(get0(x, envir, mode, inherits, ifnotfound))



 As mentioned, the possibility of a
 NULL value is problematic. One solution is a sentinel value that 
indicates
 an unbound value (like R_UnboundValue).
 
>>> 
>>> A null default is fine -- it's a default; if it isn't right for a
>>> particular case you can provide something else.
>>> 

[..]

>>> Adding getIfExists, or .get, or get0, or whatever seems fine. Adding
>>> an argument to get() with missing giving current behavior may be OK
>>> too. Rewriting exists and get as .Primitives may be sufficient though.

> Thank you, Luke.  Given that, Duncan's and the other inputs,
> I think we should go for a new function -- .Internal() for now.

> To Pete's point about arguments, I did drop 'frame' on purpose 
> and indeed we could try to do away with 'where/pos' as well and
> have the environment only specified by 'envir'.

> Name: I like  get0() for its brevity and prefer it to .get().

> Let me expose my current implementation on R-devel ... and start
> using it in the 'methods' package so we (Pete H. :-) can start
> measuring its impact.

I have now committed  get0() to R-devel  (svn rev 67386) 

which is already using it in quite a few places:
in 'base', notably in base/R/namespace.R   where it may speedup, also
in 'methods' in quite a few places also in the hope of some S4
speedup.

{{Now I feel having deserved some weekend break ...}}

Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] RFC: getifexists() {was [Bug 16065] "exists" ...}

2015-01-09 Thread Peter Haverty
Fantastic. I'm eager to try it out.  Thanks for seeing this through.

Regards,

Pete


Peter M. Haverty, Ph.D.
Genentech, Inc.
phave...@gene.com

On Fri, Jan 9, 2015 at 7:37 AM, Martin Maechler 
wrote:

> > Martin Maechler 
> > on Fri, 9 Jan 2015 14:00:38 +0100 writes:
>
> > Michael Lawrence 
> > on Thu, 8 Jan 2015 14:02:26 -0800 writes:
>
> >> On Thu, Jan 8, 2015 at 11:57 AM,  wrote:
> >>> On Thu, 8 Jan 2015, Michael Lawrence wrote:
> >>>
> >>> If we do add an argument to get(), then it should be named
> consistently
>  with the ifnotfound argument of mget().
>
> You are right... I forgot to say so earlier in the thread.
>
> The definition now is
>
> get0 <- function (x, envir = pos.to.env(-1L), mode = "any", inherits =
> TRUE,
>   ifnotfound = NULL)
> .Internal(get0(x, envir, mode, inherits, ifnotfound))
>
>
>
>  As mentioned, the possibility of a
>  NULL value is problematic. One solution is a sentinel value that
> indicates
>  an unbound value (like R_UnboundValue).
> 
> >>>
> >>> A null default is fine -- it's a default; if it isn't right for a
> >>> particular case you can provide something else.
> >>>
>
> [..]
>
> >>> Adding getIfExists, or .get, or get0, or whatever seems fine.
> Adding
> >>> an argument to get() with missing giving current behavior may be OK
> >>> too. Rewriting exists and get as .Primitives may be sufficient
> though.
>
> > Thank you, Luke.  Given that, Duncan's and the other inputs,
> > I think we should go for a new function -- .Internal() for now.
>
> > To Pete's point about arguments, I did drop 'frame' on purpose
> > and indeed we could try to do away with 'where/pos' as well and
> > have the environment only specified by 'envir'.
>
> > Name: I like  get0() for its brevity and prefer it to .get().
>
> > Let me expose my current implementation on R-devel ... and start
> > using it in the 'methods' package so we (Pete H. :-) can start
> > measuring its impact.
>
> I have now committed  get0() to R-devel  (svn rev 67386)
>
> which is already using it in quite a few places:
> in 'base', notably in base/R/namespace.R   where it may speedup, also
> in 'methods' in quite a few places also in the hope of some S4
> speedup.
>
> {{Now I feel having deserved some weekend break ...}}
>
> Martin
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] RFC: getifexists() {was [Bug 16065] "exists" ...}

2015-01-09 Thread Peter Haverty
Here are some quick measurements of Martin's accomplishment with "get0":

In loading the package GenomicRanges, 30K calls to "exists" have been
skipped.  (However 99K still remain!)
Overall, the current usage of "get0" seems to save us 10% in package
loading time (no error bars on that measurement).
microbenchmark says that

env = asNamespace("base"); get0("match", env)

is a 6X speedup over the same call to "get", which is pretty neat by
itself.  It might be good to just generally use get0.
Unless of course, when one really doesn't need any "exists" checking and
NULL results are fine, then the .Primitive "[[" is 30X faster than "get".
Thanks everyone for your thoughts, code and time on this topic!





Pete


Peter M. Haverty, Ph.D.
Genentech, Inc.
phave...@gene.com

On Fri, Jan 9, 2015 at 7:37 AM, Martin Maechler 
wrote:

> > Martin Maechler 
> > on Fri, 9 Jan 2015 14:00:38 +0100 writes:
>
> > Michael Lawrence 
> > on Thu, 8 Jan 2015 14:02:26 -0800 writes:
>
> >> On Thu, Jan 8, 2015 at 11:57 AM,  wrote:
> >>> On Thu, 8 Jan 2015, Michael Lawrence wrote:
> >>>
> >>> If we do add an argument to get(), then it should be named
> consistently
>  with the ifnotfound argument of mget().
>
> You are right... I forgot to say so earlier in the thread.
>
> The definition now is
>
> get0 <- function (x, envir = pos.to.env(-1L), mode = "any", inherits =
> TRUE,
>   ifnotfound = NULL)
> .Internal(get0(x, envir, mode, inherits, ifnotfound))
>
>
>
>  As mentioned, the possibility of a
>  NULL value is problematic. One solution is a sentinel value that
> indicates
>  an unbound value (like R_UnboundValue).
> 
> >>>
> >>> A null default is fine -- it's a default; if it isn't right for a
> >>> particular case you can provide something else.
> >>>
>
> [..]
>
> >>> Adding getIfExists, or .get, or get0, or whatever seems fine.
> Adding
> >>> an argument to get() with missing giving current behavior may be OK
> >>> too. Rewriting exists and get as .Primitives may be sufficient
> though.
>
> > Thank you, Luke.  Given that, Duncan's and the other inputs,
> > I think we should go for a new function -- .Internal() for now.
>
> > To Pete's point about arguments, I did drop 'frame' on purpose
> > and indeed we could try to do away with 'where/pos' as well and
> > have the environment only specified by 'envir'.
>
> > Name: I like  get0() for its brevity and prefer it to .get().
>
> > Let me expose my current implementation on R-devel ... and start
> > using it in the 'methods' package so we (Pete H. :-) can start
> > measuring its impact.
>
> I have now committed  get0() to R-devel  (svn rev 67386)
>
> which is already using it in quite a few places:
> in 'base', notably in base/R/namespace.R   where it may speedup, also
> in 'methods' in quite a few places also in the hope of some S4
> speedup.
>
> {{Now I feel having deserved some weekend break ...}}
>
> Martin
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] RFC: getifexists() {was [Bug 16065] "exists" ...}

2015-01-09 Thread Hervé Pagès

Hi,

On 01/08/2015 07:02 AM, Martin Maechler wrote:



Adding an optional argument to get (and mget) like
val <- get(name, where, ..., value.if.not.found=NULL )   (*)



would be useful for many.  HOWEVER, it is possible that there could be
some confusion here: (*) can give a NULL because either x exists and
has value NULL, or because x doesn't exist.   If that matters, the user
would need to be careful about specifying a value.if.not.found that cannot
be confused with a valid value of x.


Exactly -- well, of course: That problem { NULL can be the legit value of what 
you
want to get() } was the only reason to have a 'value.if.not' argument at all.

Note that this is not about a universal replacement of
the  if(exists(..)) { .. get(..) } idiom,


FWIW, if(exists(..)) { x <- get(..) } is not safe because it's not
atomic. I've seen situations where exists() returns TRUE but then
get() fails to find the symbol (even if called immediately after
exists()).

After scratching my head for a while I found out that the symbol was
removed by some finalizer function defined somewhere (not on the 
environment exists() and gets() were looking at, of course). And

since garbage collection is triggered between the moment exists() sees
the symbol and get() tries to get it, the finalizer was executed and
the symbol removed.

After that, I started to systematically use x <- try(get(...)) instead
(which is atomic).

Cheers,
H.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel