All, So that suggests that .GlobalEnv[["X"]] is more efficient than get("X", pos=1L). What about .GlobalEnv[["X"]] <- value, compared to assign("X", value)? Dave
On Wed, Dec 3, 2014 at 3:30 PM, Peter Haverty <haverty.pe...@gene.com> wrote: > Thanks Winston! I'm amazed that "[[" beats calling the .Internal > directly. I guess the difference between .Primitive vs. .Internal is > pretty significant for things on this time scale. > > NULL meaning NULL and NULL meaning undefined would lead to the same path > for much of my code. I'll be swapping out many exists and get calls later > today. Thanks! > > I do still think it would be very useful to have some way to discriminate > the two NULL cases. I'm reminded of how perl does the same thing. It's > been a while, but it was something like > > if (defined(x{'c'})) { print x{'c'}; } # This is still two lookups, but it > has the "defined" concept. > > or maybe even > > if (defined( foo = x{'c'} ) ) { print foo; } > > > Thanks again for the timings! > > > Pete > > ____________________ > Peter M. Haverty, Ph.D. > Genentech, Inc. > phave...@gene.com > > On Wed, Dec 3, 2014 at 12:48 PM, Winston Chang <winstoncha...@gmail.com> > wrote: > > > I've looked at related speed issues in the past, and have a couple > > related points to add. (I've put the info below at > > http://rpubs.com/wch/46428.) > > > > There's a significant amount of overhead just from calling the R > > function get(). This is true even when you skip the pos argument and > > provide envir. For example, if you call get(), it takes much more time > > than .Internal(get()), which is what get() does. > > > > If you already know that the object exists in an environment, it's > > faster to use e$x, and slightly faster still to use e[["x"]]: > > > > e <- new.env() > > e$a <- 1 > > > > # Accessing objects in environments > > microbenchmark( > > get("a", e, inherits = FALSE), > > get("a", envir = e, inherits = FALSE), > > .Internal(get("a", e, "any", FALSE)), > > e$a, > > e[["a"]], > > .Primitive("[[")(e, "a"), > > > > unit = "us" > > ) > > #> median name > > #> 1 1.0300 get("a", e, inherits = FALSE) > > #> 2 0.9425 get("a", envir = e, inherits = FALSE) > > #> 3 0.3080 .Internal(get("a", e, "any", FALSE)) > > #> 4 0.2305 e$a > > #> 5 0.1740 e[["a"]] > > #> 6 0.2905 .Primitive("[[")(e, "a") > > > > > > A similar thing happens with exists(): the R function wrapper adds > > significant overhead on top of .Internal(exists()). It's also faster > > to use $ and [[, then test for NULL, but of course this won't > > distinguish between objects that don't exist, and those that do exist > > but have a NULL value: > > > > # Test for existence of `a` (which exists), and `c` (which doesn't) > > microbenchmark( > > exists('a', e, inherits = FALSE), > > exists('a', envir = e, inherits = FALSE), > > .Internal(exists('a', e, 'any', FALSE)), > > 'a' %in% ls(e, all.names = TRUE), > > is.null(e[['a']]), > > is.null(e$a), > > > > exists('c', e, inherits = FALSE), > > exists('c', envir = e, inherits = FALSE), > > .Internal(exists('c', e, 'any', FALSE)), > > 'c' %in% ls(e, all.names = TRUE), > > is.null(e[['c']]), > > is.null(e$c), > > > > unit = "us" > > ) > > #> median name > > #> 1 1.2015 exists("a", e, inherits = FALSE) > > #> 2 1.0545 exists("a", envir = e, inherits = FALSE) > > #> 3 0.3615 .Internal(exists("a", e, "any", FALSE)) > > #> 4 7.6345 "a" %in% ls(e, all.names = TRUE) > > #> 5 0.3055 is.null(e[["a"]]) > > #> 6 0.3270 is.null(e$a) > > #> 7 1.1890 exists("c", e, inherits = FALSE) > > #> 8 1.0370 exists("c", envir = e, inherits = FALSE) > > #> 9 0.3465 .Internal(exists("c", e, "any", FALSE)) > > #> 10 7.5475 "c" %in% ls(e, all.names = TRUE) > > #> 11 0.2675 is.null(e[["c"]]) > > #> 12 0.3010 is.null(e$c) > > > > > > -Winston > > > > On Tue, Dec 2, 2014 at 8:46 PM, Peter Haverty <haverty.pe...@gene.com> > > wrote: > > > Hi All, > > > > > > I've been looking into speeding up the loading of packages that use a > lot > > > of S4. After profiling I noticed the "exists" function accounts for a > > > surprising fraction of the time. I have some thoughts about speeding > up > > > exists (below). More to the point of this post, Martin Mächler noted > that > > > 'exists' and 'get' are often used in conjunction. Both functions are > > > different usages of the do_get C function, so it's a pity to run that > > twice. > > > > > > "get" gives an error when a symbol is not found, so you can't just do a > > > 'get'. With R's C library, one might do > > > > > > SEXP x = findVarInFrame3(symbol,env); > > > if (x != R_UnboundValue) { > > > // do stuff with x > > > } > > > > > > It would be very convenient to have something like this at the R level. > > We > > > don't want to do any tryCatch stuff or to add args to get (That would > > kill > > > any speed advantage. The overhead for handling redundant args accounts > > for > > > 30% of the time used by "exists"). Michael Lawrence and I worked out > > that > > > we need a function that returns either the desired object, or something > > > that represents R_UnboundValue. We also need a very cheap way to check > if > > > something equals this new R_UnboundValue. This might look like > > > > > > if (defined(x <- fetch(symbol, env))) { > > > do_stuff_with_x(x) > > > } > > > > > > A few more thoughts about "exists": > > > > > > Moving the bit of R in the exists function to C saves 10% of the time. > > > Dropping the redundant pos and frame args entirely saves 30% of the > time > > > used by this function. I suggest that the arguments of both get and > > > exists should > > > be simplified to (x, envir, mode, inherits). The existing C code > handles > > > numeric, character, and environment input for where. The arg frame is > > > rarely used (0/128 exists calls in the methods package). Users that > need > > to > > > can call sys.frame themselves. get already lacks a frame argument and > the > > > manpage for exists notes that envir is only there for backwards > > > compatibility. Let's deprecate the extra args in exists and get and > > perhaps > > > move the extra argument handling to C in the interim. Similarly, the > > > "assign" function does nothing with the "immediate" argument. > > > > > > I'd be interested to hear if there is any support for a "fetch"-like > > > function (and/or deprecating some unused arguments). > > > > > > All the best, > > > Pete > > > > > > > > > > > > Pete > > > > > > ____________________ > > > Peter M. Haverty, Ph.D. > > > Genentech, Inc. > > > phave...@gene.com > > > > > > [[alternative HTML version deleted]] > > > > > > > > > ______________________________________________ > > > R-devel@r-project.org mailing list > > > https://stat.ethz.ch/mailman/listinfo/r-devel > > > > > > > [[alternative HTML version deleted]] > > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > > [[alternative HTML version deleted]] ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel