Re: [Rd] removeSource() vs. function literals
On Fri, 31 Mar 2023 08:49:53 +0200 Lionel Henry wrote: > If you can afford a dependency on rlang, `rlang::zap_srcref()` deals > with this. It's recursive over expression vectors, calls (including > calls to `function` and their hidden srcref arg), and function > objects. Thanks for the suggestion! I hope that the source reference argument in the `function` calls is the last way a source reference could sneak by in an expression obtained using substitute(). > It's implemented in C for efficiency as we found it to be a > bottleneck in some applications (IIRC caching). I'd be happy to > upstream this in base if R core is interested. A removeSource() that handles all the corner cases would be a nice improvement. -- Best regards, Ivan __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] removeSource() vs. function literals
Thanks for the comments and sorry I didn't reply sooner! On Thu, 30 Mar 2023 12:38:24 -0400 Duncan Murdoch wrote: > You'd need to recurse through all expressions in the object. Some of > those expressions might be environments, so your changes could leak > out of the function you're working on. In my efforts to get arbitrary objects to hash consistently, I already walk them recursively, but I do stop at environments. (In theory, it could be possible to create mock environments with the same relationships between each other and then "fix up" and hash their contents, but it's hard to do right. Imagine environments e1 and e2 where e1$other <- e2 and e2$other <- e1.) I think that removeSource() already walks language objects recursively, it just doesn't remove source references from unevaluated function expressions. > Things are simpler if you know the expression is the unmodified > result of parsing source code, but if you know that, wouldn't you > usually be able to control things by setting keep.source = FALSE? I receive the expression object from substitute(). The idea is to hash the expression, locate and hash its dependencies and then see if there's already a file named like the resulting hash. In theory, the user could be constructing elaborate scary-looking expressions and then calling my function on them, but I think I can be reasonably certain I get the calls straight from the parser. Unfortunately, this doesn't put me in a position to be controlling options(keep.source=...). > Maybe a workable solution is something like parse(deparse(expr, > control = "exact"), keep.source = FALSE). Thanks for this idea! At some point I was considering hashing text representations of objects, but then I got serialize()-hashing working and forgot about it. -- Best regards, Ivan __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] removeSource() vs. function literals
On 3/31/23 08:49, Lionel Henry via R-devel wrote: If you can afford a dependency on rlang, `rlang::zap_srcref()` deals with this. It's recursive over expression vectors, calls (including calls to `function` and their hidden srcref arg), and function objects. It's implemented in C for efficiency as we found it to be a bottleneck in some applications (IIRC caching). I'd be happy to upstream this in base if R core is interested. That would be very helpful. When having to implement caching, I have been hit by this issue several times in the past, too (before rlang::zap_srcref() existed). Regards, Denes Best, Lionel On 3/30/23, Duncan Murdoch wrote: On 30/03/2023 10:32 a.m., Ivan Krylov wrote: Dear R-devel, In a package of mine, I use removeSource on expression objects in order to make expressions that are semantically the same serialize to the same byte sequences: https://github.com/cran/depcache/blob/854d68a/R/fixup.R#L8-L34 Today I learned that expressions containing function definitions also contain the source references for the functions, not as an attribute, but as a separate argument to the `function` call: str(quote(function() NULL)[[4]]) # 'srcref' int [1:8] 1 11 1 25 11 25 1 1 # - attr(*, "srcfile")=Classes 'srcfilecopy', 'srcfile' # This means that removeSource() on an expression that would define a function when evaluated doesn't actually remove the source reference from the object. Do you think it would be appropriate to teach removeSource() to remove such source references? What could be a good way to implement that? if (is.call(fn) && identical(fn[[1]], 'function')) fn[[4]] <- NULL sounds too arbitrary. if (inherits(fn, 'srcref')) return(NULL) sounds too broad. I don't think there's a simple way to do that. Functions can define functions within themselves. If you're talking about code that was constructed by messing with language objects, it could contain both function objects and calls to `function` to construct them. You'd need to recurse through all expressions in the object. Some of those expressions might be environments, so your changes could leak out of the function you're working on. Things are simpler if you know the expression is the unmodified result of parsing source code, but if you know that, wouldn't you usually be able to control things by setting keep.source = FALSE? Maybe a workable solution is something like parse(deparse(expr, control = "exact"), keep.source = FALSE). Wouldn't work on environments or various exotic types, but would probably warn you if it wasn't working. Duncan Murdoch __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] removeSource() vs. function literals
If you can afford a dependency on rlang, `rlang::zap_srcref()` deals with this. It's recursive over expression vectors, calls (including calls to `function` and their hidden srcref arg), and function objects. It's implemented in C for efficiency as we found it to be a bottleneck in some applications (IIRC caching). I'd be happy to upstream this in base if R core is interested. Best, Lionel On 3/30/23, Duncan Murdoch wrote: > On 30/03/2023 10:32 a.m., Ivan Krylov wrote: >> Dear R-devel, >> >> In a package of mine, I use removeSource on expression objects in order >> to make expressions that are semantically the same serialize to the >> same byte sequences: >> https://github.com/cran/depcache/blob/854d68a/R/fixup.R#L8-L34 >> >> Today I learned that expressions containing function definitions also >> contain the source references for the functions, not as an attribute, >> but as a separate argument to the `function` call: >> >> str(quote(function() NULL)[[4]]) >> # 'srcref' int [1:8] 1 11 1 25 11 25 1 1 >> # - attr(*, "srcfile")=Classes 'srcfilecopy', 'srcfile' >> # >> >> This means that removeSource() on an expression that would define a >> function when evaluated doesn't actually remove the source reference >> from the object. >> >> Do you think it would be appropriate to teach removeSource() to remove >> such source references? What could be a good way to implement that? >> if (is.call(fn) && identical(fn[[1]], 'function')) fn[[4]] <- NULL >> sounds too arbitrary. if (inherits(fn, 'srcref')) return(NULL) sounds >> too broad. >> > > I don't think there's a simple way to do that. Functions can define > functions within themselves. If you're talking about code that was > constructed by messing with language objects, it could contain both > function objects and calls to `function` to construct them. You'd need > to recurse through all expressions in the object. Some of those > expressions might be environments, so your changes could leak out of the > function you're working on. > > Things are simpler if you know the expression is the unmodified result > of parsing source code, but if you know that, wouldn't you usually be > able to control things by setting keep.source = FALSE? > > Maybe a workable solution is something like parse(deparse(expr, control > = "exact"), keep.source = FALSE). Wouldn't work on environments or > various exotic types, but would probably warn you if it wasn't working. > > Duncan Murdoch > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] removeSource() vs. function literals
On 30/03/2023 10:32 a.m., Ivan Krylov wrote: Dear R-devel, In a package of mine, I use removeSource on expression objects in order to make expressions that are semantically the same serialize to the same byte sequences: https://github.com/cran/depcache/blob/854d68a/R/fixup.R#L8-L34 Today I learned that expressions containing function definitions also contain the source references for the functions, not as an attribute, but as a separate argument to the `function` call: str(quote(function() NULL)[[4]]) # 'srcref' int [1:8] 1 11 1 25 11 25 1 1 # - attr(*, "srcfile")=Classes 'srcfilecopy', 'srcfile' # This means that removeSource() on an expression that would define a function when evaluated doesn't actually remove the source reference from the object. Do you think it would be appropriate to teach removeSource() to remove such source references? What could be a good way to implement that? if (is.call(fn) && identical(fn[[1]], 'function')) fn[[4]] <- NULL sounds too arbitrary. if (inherits(fn, 'srcref')) return(NULL) sounds too broad. I don't think there's a simple way to do that. Functions can define functions within themselves. If you're talking about code that was constructed by messing with language objects, it could contain both function objects and calls to `function` to construct them. You'd need to recurse through all expressions in the object. Some of those expressions might be environments, so your changes could leak out of the function you're working on. Things are simpler if you know the expression is the unmodified result of parsing source code, but if you know that, wouldn't you usually be able to control things by setting keep.source = FALSE? Maybe a workable solution is something like parse(deparse(expr, control = "exact"), keep.source = FALSE). Wouldn't work on environments or various exotic types, but would probably warn you if it wasn't working. Duncan Murdoch __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] removeSource() vs. function literals
Dear R-devel, In a package of mine, I use removeSource on expression objects in order to make expressions that are semantically the same serialize to the same byte sequences: https://github.com/cran/depcache/blob/854d68a/R/fixup.R#L8-L34 Today I learned that expressions containing function definitions also contain the source references for the functions, not as an attribute, but as a separate argument to the `function` call: str(quote(function() NULL)[[4]]) # 'srcref' int [1:8] 1 11 1 25 11 25 1 1 # - attr(*, "srcfile")=Classes 'srcfilecopy', 'srcfile' # This means that removeSource() on an expression that would define a function when evaluated doesn't actually remove the source reference from the object. Do you think it would be appropriate to teach removeSource() to remove such source references? What could be a good way to implement that? if (is.call(fn) && identical(fn[[1]], 'function')) fn[[4]] <- NULL sounds too arbitrary. if (inherits(fn, 'srcref')) return(NULL) sounds too broad. -- Best regards, Ivan __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel