Re: [Rd] unlink() on "~" removes the home directory
On 2/26/20 14:47, Gábor Csárdi wrote: !!! DON'T TRY THE CODE IN THIS EMAIL AT HOME !!! Ok I'll try it at work on my boss's computer, sounds a lot safer. H. Well, unlink() does what it is supposed to do, so you could argue that there is nothing wrong with it. Also, nobody would call unlink() on "~", right? The situation is not so simple, however. E.g. if you happen to have a directory called "~", and you iterate over all files and directories to selectively remove some of them, then your code might end up calling unlink on the local "~" directory, and then your home is gone. But you would not create a directory named "~", that is just asking for trouble. Well, surely, _intentionally_ you would not do that. Unintentionally, you might. E.g. something like this is enough: # Create a subpath within a base directory badfun <- function(base = ".", path) { dir.create(file.path(base, path), recursive = TRUE, showWarnings = FALSE) } badfun(path = "~/foo") (If you did run this, be very careful how you remove the directory called "~"!) A real example is `R CMD build` which deletes the home directory of the current user if the root of the package contains a non-empty "~" directory. Luckily this is now fixed in R-devel, so R 4.0.0 will do better. (R 3.6.3 will not.) See https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_wch_r-2Dsource_commit_1d4f7aa1dac427ea2213d1f7cd7b5c16e896af22=DwICAg=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=C3VCGF44o7jATPOlC8aZhaT4YGU1JtcOixJKZgu6KyI=iWNt-0G2gZa99bnOqNBMOHph0NyVoJdsIwuA07GhJZQ= I have seen several bug reports about various packages (that call R CMD build) removing the home directory, so this indeed happens in practice to a number of people. The commit above will fix `R CMD build`, but it would be great to "fix" this in general. It seems pretty hard to prevent users from creating of a "~" directory. But preventing unlink() from deleting "~" does not actually seem too hard. If unlink() could just refuse removing "~" (when expand = TRUE), that would be great. It seems to me that the current behavior is very-very rarely intended, and its consequences are potentially disastrous. If unlink("~", recursive = TRUE) errors, you can still remove a local "~" file/dir with unlink("./~", ...). And you can still remove your home directory if you really want to do that, with unlink(path.expand("~"), ...). So no functionality is lost. Also, if anyone is aware of packages/functions that tend to create "~" directories or files, please let me know. I would be happy to submit a patch for the new unlink("~") behavior. Thanks, Gabor __ R-devel@r-project.org mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel=DwICAg=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=C3VCGF44o7jATPOlC8aZhaT4YGU1JtcOixJKZgu6KyI=FeZWU9uN-HwDNkSBOmbYXiGqu8q8-U6DI-ddyUn7HHw= -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax:(206) 667-1319 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] unlink() on "~" removes the home directory
!!! DON'T TRY THE CODE IN THIS EMAIL AT HOME !!! Well, unlink() does what it is supposed to do, so you could argue that there is nothing wrong with it. Also, nobody would call unlink() on "~", right? The situation is not so simple, however. E.g. if you happen to have a directory called "~", and you iterate over all files and directories to selectively remove some of them, then your code might end up calling unlink on the local "~" directory, and then your home is gone. But you would not create a directory named "~", that is just asking for trouble. Well, surely, _intentionally_ you would not do that. Unintentionally, you might. E.g. something like this is enough: # Create a subpath within a base directory badfun <- function(base = ".", path) { dir.create(file.path(base, path), recursive = TRUE, showWarnings = FALSE) } badfun(path = "~/foo") (If you did run this, be very careful how you remove the directory called "~"!) A real example is `R CMD build` which deletes the home directory of the current user if the root of the package contains a non-empty "~" directory. Luckily this is now fixed in R-devel, so R 4.0.0 will do better. (R 3.6.3 will not.) See https://github.com/wch/r-source/commit/1d4f7aa1dac427ea2213d1f7cd7b5c16e896af22 I have seen several bug reports about various packages (that call R CMD build) removing the home directory, so this indeed happens in practice to a number of people. The commit above will fix `R CMD build`, but it would be great to "fix" this in general. It seems pretty hard to prevent users from creating of a "~" directory. But preventing unlink() from deleting "~" does not actually seem too hard. If unlink() could just refuse removing "~" (when expand = TRUE), that would be great. It seems to me that the current behavior is very-very rarely intended, and its consequences are potentially disastrous. If unlink("~", recursive = TRUE) errors, you can still remove a local "~" file/dir with unlink("./~", ...). And you can still remove your home directory if you really want to do that, with unlink(path.expand("~"), ...). So no functionality is lost. Also, if anyone is aware of packages/functions that tend to create "~" directories or files, please let me know. I would be happy to submit a patch for the new unlink("~") behavior. Thanks, Gabor __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] RIOT 2020
If people want to create a new interpreter (for R or any other data-driven programming language), or do something closely related (such as adapt an existing interpreter), I think a better strategy would be to focus on real time computing. I note that Oracle who appears to be sponsoring this event, also acquired ChorusOS (via their acquisition of Sun). I think they should release ChorusOS under an open source license, and consider investing into that. On Wed, Feb 26, 2020 at 5:41 AM Stepan wrote: > > I hope you don’t mind us using this mailing list for a small > advertisement, but we think it is most relevant for this group: > > We'd like to invite you to RIOT 2020 - the 5rd workshop on R > Implementation, Optimization and Tooling [1]. It will take place > co-located with, and during, useR! 2020 in St. Louis on July 8th. RIOT > is an excellent venue for deep technical discussions about R > implementations, tools, optimizations and R extension, and will be very > interesting for anyone interested in what’s under the hood of R. > > Regards, > Stepan Sindelar, Lukas Stadler (Oracle Labs), Jan Vitek (Northeastern), > Alexander Bertram (BeDataDriven) > > [1] http://riotworkshop.github.io/ > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Profiling: attributing costs to place of invocation (instead of place of evaluation)?
There's a patch under review: https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17595 Best, Lionel On 2/26/20, Kirill Müller wrote: > Hi > > > Consider the following example: > > f <- function(expr) g(expr) > g <- function(expr) { > h(expr) > } > h <- function(expr) { > expr # evaluation happens here > i(expr) > } > i <- function(expr) { > expr # already evaluated, no costs here > invisible() > } > > rprof <- tempfile() > Rprof(rprof) > f(replicate(1e2, sample.int(1e4))) > Rprof(NULL) > cat(readLines(rprof), sep = "\n") > #> sample.interval=2 > #> "sample.int" "FUN" "lapply" "sapply" "replicate" "h" "g" "f" > #> "sample.int" "FUN" "lapply" "sapply" "replicate" "h" "g" "f" > #> "sample.int" "FUN" "lapply" "sapply" "replicate" "h" "g" "f" > > The evaluation of the slow replicate() call is deferred to the execution > of h(), but there's no replicate() call in h's definition. This makes > parsing the profile much more difficult than necessary. > > I have pasted an experimental patch below (off of 3.6.2) that produces > the following output: > > cat(readLines(rprof), sep = "\n") > #> sample.interval=2 > #> "sample.int" "FUN" "lapply" "sapply" "replicate" "f" > #> "sample.int" "FUN" "lapply" "sapply" "replicate" "f" > #> "sample.int" "FUN" "lapply" "sapply" "replicate" "f" > > This attributes the cost to the replicate() call to f(), where the call > is actually defined. From my experience, this will give a much better > understanding of the actual costs of each part of the code. The SIGPROF > handler looks at sysparent and cloenv before deciding if an element of > the call stack is to be included in the profile. > > Is there interest in integrating a variant of this patch, perhaps with > an optional argument to Rprof()? > > Thanks! > > > Best regards > > Kirill > > > Index: src/main/eval.c > === > --- src/main/eval.c (revision 77857) > +++ src/main/eval.c (working copy) > @@ -218,7 +218,10 @@ > if (R_Line_Profiling) > lineprof(buf, R_getCurrentSrcref()); > > + SEXP sysparent = NULL; > + > for (cptr = R_GlobalContext; cptr; cptr = cptr->nextcontext) { > + if (sysparent != NULL && cptr->cloenv != sysparent && > cptr->sysparent != sysparent) continue; > if ((cptr->callflag & (CTXT_FUNCTION | CTXT_BUILTIN)) > && TYPEOF(cptr->call) == LANGSXP) { > SEXP fun = CAR(cptr->call); > @@ -292,6 +295,8 @@ > else > lineprof(buf, cptr->srcref); > } > + > + sysparent = cptr->sysparent; > } > } > } > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Profiling: attributing costs to place of invocation (instead of place of evaluation)?
Hi Consider the following example: f <- function(expr) g(expr) g <- function(expr) { h(expr) } h <- function(expr) { expr # evaluation happens here i(expr) } i <- function(expr) { expr # already evaluated, no costs here invisible() } rprof <- tempfile() Rprof(rprof) f(replicate(1e2, sample.int(1e4))) Rprof(NULL) cat(readLines(rprof), sep = "\n") #> sample.interval=2 #> "sample.int" "FUN" "lapply" "sapply" "replicate" "h" "g" "f" #> "sample.int" "FUN" "lapply" "sapply" "replicate" "h" "g" "f" #> "sample.int" "FUN" "lapply" "sapply" "replicate" "h" "g" "f" The evaluation of the slow replicate() call is deferred to the execution of h(), but there's no replicate() call in h's definition. This makes parsing the profile much more difficult than necessary. I have pasted an experimental patch below (off of 3.6.2) that produces the following output: cat(readLines(rprof), sep = "\n") #> sample.interval=2 #> "sample.int" "FUN" "lapply" "sapply" "replicate" "f" #> "sample.int" "FUN" "lapply" "sapply" "replicate" "f" #> "sample.int" "FUN" "lapply" "sapply" "replicate" "f" This attributes the cost to the replicate() call to f(), where the call is actually defined. From my experience, this will give a much better understanding of the actual costs of each part of the code. The SIGPROF handler looks at sysparent and cloenv before deciding if an element of the call stack is to be included in the profile. Is there interest in integrating a variant of this patch, perhaps with an optional argument to Rprof()? Thanks! Best regards Kirill Index: src/main/eval.c === --- src/main/eval.c (revision 77857) +++ src/main/eval.c (working copy) @@ -218,7 +218,10 @@ if (R_Line_Profiling) lineprof(buf, R_getCurrentSrcref()); + SEXP sysparent = NULL; + for (cptr = R_GlobalContext; cptr; cptr = cptr->nextcontext) { + if (sysparent != NULL && cptr->cloenv != sysparent && cptr->sysparent != sysparent) continue; if ((cptr->callflag & (CTXT_FUNCTION | CTXT_BUILTIN)) && TYPEOF(cptr->call) == LANGSXP) { SEXP fun = CAR(cptr->call); @@ -292,6 +295,8 @@ else lineprof(buf, cptr->srcref); } + + sysparent = cptr->sysparent; } } } __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel