I ran into an interesting issue with `evalq` (and also
`eval(quote(...))`):

     f <- function() {
       list(
         sys.parent(1),
         evalq(sys.parent(1)),
         evalq((function() sys.parent(2))()),  # add an anon fun layer
         evalq((function() sys.parent(1))())
       )
     }
     res <- f()
     str(res)
     ## List of 4
     ##  $ : int 0         # sys.parent(1)
     ##  $ : int 2         # evalq(sys.parent(1))
     ##  $ : int 0         # evalq((function() sys.parent(2))())
     ##  $ : int 1         # evalq((function() sys.parent(1))())

In order of least to most surprising:

1. `sys.parent(1)` and `evalq(sys.parent(1))` are not the same
2. `evalq(sys.parent(1))` and `evalq((function() sys.parent(2))())`
   are not the same
3. `evalq((function() sys.parent(1))())` returns a lower frame number
   than `evalq(sys.parent(1))`

The root cause of this is that the `evalq` **closure** sets a context,
but then the C-level `do_eval` it invokes sets another one[1] with the
new `evalq` context as the calling frame (`cptr->sysparent`)[2].  This
then interacts with how `sys.parent` resolves parents when a target
frame appears more than once in the context stack.  `sys.parent`
returns the oldest context that matches[3], and in this case `f`'s
frame appears twice because `evalq` adds it via `do_eval`.

One option is to change what `sysparent` of the `evalq` `envir`.
For example, if we set it to be the same as it would be for commands
outside the `evalq` we get:

     str(res)
     ## List of 4
     ##  $ : int 0         # sys.parent(1)
     ##  $ : int 0         # evalq(sys.parent(1))
     ##  $ : int 0         # evalq((function() sys.parent(2))())
     ##  $ : int 1         # evalq((function() sys.parent(1))())

There is precedent for doing this in S3 generics and their methods
where method `sysparent` is set to be that of the generic.  Now
`evalq` no longer interferes with the resolution of calling frames.
It seems reasonable to set evaluation environments without affecting
what the calling frame is. Indeed that happens when we do something like
`environment(fun) <- blah` as the calling frame is unaffected when `fun` is
invoked.

I attach a patch that implements this change.  The patch is a
hack-job intended solely for illustrative purposes, though it does
pass `make check-all` on a current version of r-devel.  I also ran the
`rlang` tests as those probably push the envelope in this area.  There
only one failed with 2,613 passing.  The failed one is for a
deprecated function that was specifically checking for the repeated
`evalq` contexts[7].

I also attach a document with additional examples and commentary for
those interested.

Best,

Brodie.

PS: for a loosely related issue see #15531[8].

[1]: https://github.com/wch/r-source/blob/tags/R-4-0-0/src/main/eval.c#L3329
[2]: https://github.com/wch/r-source/blob/tags/R-4-0-0/src/main/context.c#L260
[3]: https://github.com/wch/r-source/blob/tags/R-4-0-0/src/main/context.c#L433
[4]: https://github.com/wch/r-source/blob/tags/R-4-0-0/src/main/eval.c#L1815
[5]: https://cran.r-project.org/doc/manuals/r-devel/R-ints.html#Contexts
[6]: https://bugs.r-project.org/bugzilla/show_bug.cgi?id=15531
[7]: 
https://github.com/r-lib/rlang/blob/v0.4.6/tests/testthat/test-retired.R#L437
[8]: https://bugs.r-project.org/bugzilla/show_bug.cgi?id=15531


Index: src/library/base/R/eval.R
===================================================================
--- src/library/base/R/eval.R   (revision 78619)
+++ src/library/base/R/eval.R   (working copy)
@@ -23,7 +23,7 @@
     function(expr, envir = parent.frame(),
             enclos = if(is.list(envir) || is.pairlist(envir))
                        parent.frame() else baseenv())
-    .Internal(eval(expr, envir, enclos))
+    .Internal(eval(expr, envir, enclos, parent.frame(2L)))
 
 eval.parent <- function(expr, n = 1) {
     p <- parent.frame(n + 1)
@@ -33,7 +33,7 @@
 evalq <-
     function (expr, envir = parent.frame(), enclos = if (is.list(envir) ||
     is.pairlist(envir)) parent.frame() else baseenv())
-      .Internal(eval(substitute(expr), envir, enclos))
+     .Internal(eval(substitute(expr), envir, enclos, parent.frame(2L)))
 
 new.env <- function (hash = TRUE, parent = parent.frame(), size = 29L)
     .Internal(new.env(hash, parent, size))
Index: src/library/base/baseloader.R
===================================================================
--- src/library/base/baseloader.R       (revision 78619)
+++ src/library/base/baseloader.R       (working copy)
@@ -97,7 +97,7 @@
 
     ..lazyLoad(basedb, baseenv())
 
-}), .Internal(new.env(FALSE, baseenv(), 29L)), baseenv()))
+}), .Internal(new.env(FALSE, baseenv(), 29L)), baseenv(), baseenv()))
 
 ## keep in sync with R/zzz.R
 as.numeric <- as.double
Index: src/main/eval.c
===================================================================
--- src/main/eval.c     (revision 78619)
+++ src/main/eval.c     (working copy)
@@ -3267,7 +3267,7 @@
 
 SEXP attribute_hidden do_eval(SEXP call, SEXP op, SEXP args, SEXP rho)
 {
-    SEXP encl, x;
+    SEXP encl, x, caller;
     volatile SEXP expr, env, tmp;
 
     int frame;
@@ -3277,6 +3277,7 @@
     expr = CAR(args);
     env = CADR(args);
     encl = CADDR(args);
+    caller = CADDDR(args);
     SEXPTYPE tEncl = TYPEOF(encl);
     if (isNull(encl)) {
        /* This is supposed to be defunct, but has been kept here
@@ -3330,7 +3331,7 @@
     if (TYPEOF(expr) == LANGSXP || TYPEOF(expr) == SYMSXP || isByteCode(expr)) 
{
        PROTECT(expr);
        begincontext(&cntxt, CTXT_RETURN, R_GlobalContext->call,
-                    env, rho, args, op);
+                    env, caller, args, op);
        if (!SETJMP(cntxt.cjmpbuf))
            expr = eval(expr, env);
        else {
@@ -3350,7 +3351,7 @@
        PROTECT(expr);
        tmp = R_NilValue;
        begincontext(&cntxt, CTXT_RETURN, R_GlobalContext->call,
-                    env, rho, args, op);
+                    env, caller, args, op);
        if (!SETJMP(cntxt.cjmpbuf)) {
            int n = LENGTH(expr);
            for(int i = 0 ; i < n ; i++) {
Index: src/main/names.c
===================================================================
--- src/main/names.c    (revision 78619)
+++ src/main/names.c    (working copy)
@@ -703,7 +703,7 @@
 {"dyn.unload", do_dynunload,   0,      111,    1,      {PP_FUNCALL, PREC_FN,   
0}},
 {"ls",         do_ls,          1,      11,     3,      {PP_FUNCALL, PREC_FN,   
0}},
 {"typeof",     do_typeof,      1,      11,     1,      {PP_FUNCALL, PREC_FN,   
0}},
-{"eval",       do_eval,        0,      211,    3,      {PP_FUNCALL, PREC_FN,   
0}},
+{"eval",       do_eval,        0,      211,    4,      {PP_FUNCALL, PREC_FN,   
0}},
 {"returnValue",   do_returnValue,0,     11,     1,      {PP_FUNCALL, PREC_FN,  
0}},
 {"sys.parent", do_sys,         1,      11,     -1,     {PP_FUNCALL, PREC_FN,   
0}},
 {"sys.call",   do_sys,         2,      11,     -1,     {PP_FUNCALL, PREC_FN,   
0}},
Let's run our prior example with more supporting data (pre-patch).  We
use a single `evalq` so that the frames we will inspect now are
consistent:

    f <- function() {
      c(list(sys.parent(1)),
        evalq(
          list(
            sys.parent(1),
            (function() sys.parent(2))(),
            (function() sys.parent(1))(),
            status=list(
              sys.funs=sapply(sys.calls(), '[[', 1),
              sys.frames=sys.frames(), sys.parents=sys.parents()
            )
    ) ) ) }
    res <- f()

First, the output of the various `sys.parent` which is unchanged:

    str(res[1:4])
     ## List of 4
     ##  $ : int 0         # sys.parent(1)
     ##  $ : int 2         # evalq(sys.parent(1))
     ##  $ : int 0         # evalq((function() sys.parent(2))())
     ##  $ : int 1         # evalq((function() sys.parent(1))())

And the additional data we recorded:

    str(res[['status']])
    ## List of 3
    ##  $ sys.funs   :List of 3
    ##   ..$ : symbol f
    ##   ..$ : symbol evalq
    ##   ..$ : symbol evalq
    ##  $ sys.frames :Dotted pair list of 3
    ##   ..$ :<environment: 0x7f962b9c78f0>      # <<< Original
    ##   ..$ :<environment: 0x7f962b9c7bc8>
    ##   ..$ :<environment: 0x7f962b9c78f0>      # <<< Duplicate
    ##  $ sys.parents: int [1:3] 0 1 2

Notice how `f`'s evaluation environment (<0x7f962b9c78f0>) shows up
twice, first as itself at position 1 in the frame stack, and then
again at position 3 because we `evalq`ed in that environment.

The reason `(f() sys.parent(1))()` returns the evaluation
environment of `f`, frame #1, and by extension `(f() sys.parent(2))()`
returns frame #0 (i.e. global env) is because `R_sysparent` looks for
every frame (`cptr->cloenv`) in the context stack that matches the
`cptr->sysparent` frame, **and** returns the oldest one[1].

The `cptr->sysparent` for the anonymous function in `(f()
sys.parent(1))` is frame #3 (<0x7f962b9c78f0>), but `sys.parent` finds
the earlier reference to it at position #1, and returns that.  The
`sysparent` for `sys.parent(1)` is instead frame #2
(<0x7f962b9c7bc8>), the `evalq` closure evaluation environment.  Since
that is unique `sys.parent` returns that frame number.

That this happens seems like an unfortunate implementation detail
(e.g. it would not happen if `evalq` where a special).  If that frame
were not eligible to be returned there would be no inconsistency.  Are
there situations where retrieving that particular frame is desirable?
I'm having a hard time thinking of any.

If we use a new frame for the `envir` of `evalq` we get:

     f <- function() {
       e <- new.env()
       list(
         sys.parent(1),
         evalq(sys.parent(1), e),
         evalq((function() sys.parent(2))(), e),
         evalq((function() sys.parent(1))(), e)
       )
     }
     res <- f()
     str(res)
     ## List of 4
     ##  $ : int 0         # sys.parent(1)
     ##  $ : int 2         # evalq(sys.parent(1), e)
     ##  $ : int 2         # evalq((function() sys.parent(2))(), e)
     ##  $ : int 3         # evalq((function() sys.parent(1))(), e)

At least the last three are now internally consistent, but two of them
point to the `evalq` closure frame, which is not particularly useful.
With the patch applied we get:

    ## List of 4
    ##  $ : int 0         # sys.parent(1)
    ##  $ : int 0         # evalq(sys.parent(1), e)
    ##  $ : int 0         # evalq((function() sys.parent(2))(), e)
    ##  $ : int 3         # evalq((function() sys.parent(1))(), e)

The last is the new environment we created.  This makes sense as that
is the environment that `(function() sys.parent(1))` is called in.

We'll run a few more tests with and without the patch applied.  First
the original example, but with an additional layer added to remove the
global environment from the results as that is the fallback frame
which could be hiding bugs:

    f <- function() {
      list(
        sys.parent(1),
        evalq(sys.parent(1)),
        evalq((function() sys.parent(2))()),  # add an anon fun layer
        evalq((function() sys.parent(1))())
      )
    }
    g <- function() f()
    res <- g()
    str(res)
    ## R.4.0.1 (beta)         With Patch
    ## List of 4              List of 4
    ##  $ : int 1              $ : int 1
    ##  $ : int 3              $ : int 1
    ##  $ : int 1              $ : int 1
    ##  $ : int 2              $ : int 2

Now explicitly setting an evaluation environment:

    f <- function() {
      e <- new.env()
      list(
        sys.parent(1),
        evalq(sys.parent(1), e),
        evalq((function() sys.parent(2))(), e),
        evalq((function() sys.parent(1))(), e)
      )
    }
    res <- g()
    str(res)
    ## R.4.0.1 (beta)         With Patch
    ## List of 4              List of 4
    ##  $ : int 1              $ : int 1
    ##  $ : int 3              $ : int 1
    ##  $ : int 3              $ : int 1
    ##  $ : int 4              $ : int 4

Setting an evaluation environment that already exists in the context
stack.  This triggers the `sys.parent` behavior of finding the
earliest instance of the `cptr->sysparent` frame in the stack:

    f <- function(e) {
      list(
        sys.parent(1),
        evalq(sys.parent(1), e),
        evalq((function() sys.parent(2))(), e),
        evalq((function() sys.parent(1))(), e)
      )
    }
    g <- function(e) f(e)
    h <- function() g(environment())
    i <- function() h()
    res <- i()
    str(res)
    ## R.4.0.1 (beta)         With Patch
    ## List of 4              List of 4
    ##  $ : int 3              $ : int 3
    ##  $ : int 5              $ : int 3
    ##  $ : int 1              $ : int 1
    ##  $ : int 2              $ : int 2

    f <- function(e) {
      list(
        sys.parent(1),
        evalq(sys.parent(1), e),
        evalq((function() sys.parent(2))(), e),
        evalq((function() sys.parent(1))(), e),
        evalq(sys.frames(), e)
      )
    }
    g <- function(e) f(e)
    h <- function() g(environment())
    i <- function() h()
    res <- i()

The seemingly surprising one here is that 
`evalq((function() sys.parent(1))(), e)` returns an earlier frame than
the first two.  This happens because the `sys.parent(1)` inside the
anonymous function resolves to `e`, and `e` shows up both as the
`evalq` `envir` and as the frame of `h` (frame #2).  Again,
`sys.parent` returns the oldest frame that matches the
`cptr->sysparent` of the frame it is evaluated in, so it returns frame
#2 instead of the frame set by `do_eval`.  The first and second
expressions return the calling frame of `f`, which is `g`'s evaluation
environment.  Since that environment is unique on the stack
`sys.parent` stops there.

The patch I propose prevents `evalq` (and `eval`) from affecting what
the calling frame is.  This makes sense to me because `evalq` is not a
function call.  Internally though R explicitly sets function contexts
for `evalq`, as demonstrated by the fact that it's possible to set
`on.exit` and `return` that don't affect the environment the `evalq`
call is evaluated in.  While it is useful to be able to set `on.exit`
on e.g. a `local` call, this is an undocumented implementation.  Given
that this is undocumented it doesn't seem sufficient to require that
`evalq` and `eval` affect the calling frame.


[1]: https://github.com/wch/r-source/blob/tags/R-4-0-0/src/main/context.c#L433
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to