[Rd] Comments requested on changedFiles function
In a number of places internal to R, we need to know which files have changed (e.g. after building a vignette). I've just written a general purpose function changedFiles that I'll probably commit to R-devel. Comments on the design (or bug reports) would be appreciated. The source for the function and the Rd page for it are inline below. - changedFiles.R: changedFiles - function(snapshot, timestamp = tempfile(timestamp), file.info = NULL, md5sum = FALSE, full.names = FALSE, ...) { dosnapshot - function(args) { fullnames - do.call(list.files, c(full.names = TRUE, args)) names - do.call(list.files, c(full.names = full.names, args)) if (isTRUE(file.info) || (is.character(file.info) length(file.info))) { info - file.info(fullnames) rownames(info) - names if (isTRUE(file.info)) file.info - c(size, isdir, mode, mtime) } else info - data.frame(row.names=names) if (md5sum) info - data.frame(info, md5sum = tools::md5sum(fullnames)) list(info = info, timestamp = timestamp, file.info = file.info, md5sum = md5sum, full.names = full.names, args = args) } if (missing(snapshot) || !inherits(snapshot, changedFilesSnapshot)) { if (length(timestamp) == 1) file.create(timestamp) if (missing(snapshot)) snapshot - . pre - dosnapshot(list(path = snapshot, ...)) pre$pre - pre$info pre$info - NULL pre$wd - getwd() class(pre) - changedFilesSnapshot return(pre) } if (missing(timestamp)) timestamp - snapshot$timestamp if (missing(file.info) || isTRUE(file.info)) file.info - snapshot$file.info if (identical(file.info, FALSE)) file.info - NULL if (missing(md5sum))md5sum - snapshot$md5sum if (missing(full.names)) full.names - snapshot$full.names pre - snapshot$pre savewd - getwd() on.exit(setwd(savewd)) setwd(snapshot$wd) args - snapshot$args newargs - list(...) args[names(newargs)] - newargs post - dosnapshot(args)$info prenames - rownames(pre) postnames - rownames(post) added - setdiff(postnames, prenames) deleted - setdiff(prenames, postnames) common - intersect(prenames, postnames) if (length(file.info)) { preinfo - pre[common, file.info] postinfo - post[common, file.info] changes - preinfo != postinfo } else changes - matrix(logical(0), nrow = length(common), ncol = 0, dimnames = list(common, character(0))) if (length(timestamp)) changes - cbind(changes, Newer = file_test(-nt, common, timestamp)) if (md5sum) { premd5 - pre[common, md5sum] postmd5 - post[common, md5sum] changes - cbind(changes, md5sum = premd5 != postmd5) } changes1 - changes[rowSums(changes, na.rm = TRUE) 0, , drop = FALSE] changed - rownames(changes1) structure(list(added = added, deleted = deleted, changed = changed, unchanged = setdiff(common, changed), changes = changes), class = changedFiles) } print.changedFilesSnapshot - function(x, ...) { cat(changedFiles snapshot:\n timestamp = \, x$timestamp, \\n file.info = , if (length(x$file.info)) paste(paste0('', x$file.info, ''), collapse=,), \n md5sum = , x$md5sum, \n args = , deparse(x$args, control = NULL), \n, sep=) x } print.changedFiles - function(x, ...) { if (length(x$added)) cat(Files added:\n, paste0( , x$added, collapse=\n), \n, sep=) if (length(x$deleted)) cat(Files deleted:\n, paste0( , x$deleted, collapse=\n), \n, sep=) changes - x$changes changes - changes[rowSums(changes, na.rm = TRUE) 0, , drop=FALSE] changes - changes[, colSums(changes, na.rm = TRUE) 0, drop=FALSE] if (nrow(changes)) { cat(Files changed:\n) print(changes) } x } -- --- changedFiles.Rd: \name{changedFiles} \alias{changedFiles} \alias{print.changedFiles} \alias{print.changedFilesSnapshot} \title{ Detect which files have changed } \description{ On the first call, \code{changedFiles} takes a snapshot of a selection of files. In subsequent calls, it takes another snapshot, and returns an object containing data on the differences between the two snapshots. The snapshots need not be the same directory; this could be used to compare two directories. } \usage{ changedFiles(snapshot, timestamp = tempfile(timestamp), file.info = NULL, md5sum = FALSE, full.names = FALSE, ...) } \arguments{ \item{snapshot}{ The path to record, or a previous snapshot. See the Details. } \item{timestamp}{ The name of a file to write at the time the initial snapshot is taken. In subsequent calls, modification times of files will be compared to this file, and newer files will be reported as changed. Set to \code{NULL} to skip this test. } \item{file.info}{ A vector of columns from the
Re: [Rd] libR.so: cannot open shared object file
On 04/09/2013 19:58, Geoff Jentry wrote: Can you add some details? Suppose i have the package Model.tar.gz and my writable are is in user/area, what i have to do next to install the package? What I was picturing was something like this (forgive me if syntax isn't 100%): mkdir user/area/myRLib R CMD INSTALL --library=user/area/myRLib Model.tar.gz and then in R: library(Model, lib.loc=user/area/myRLib) Note though Brian Ripley's response to me where he indicates that this is handled automatically. Yes, install.packages(Model.tar.gz) should suffice. -J __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] libR.so: cannot open shared object file
Can you add some details? Suppose i have the package Model.tar.gz and my writable are is in user/area, what i have to do next to install the package? What I was picturing was something like this (forgive me if syntax isn't 100%): mkdir user/area/myRLib R CMD INSTALL --library=user/area/myRLib Model.tar.gz and then in R: library(Model, lib.loc=user/area/myRLib) Note though Brian Ripley's response to me where he indicates that this is handled automatically. -J __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Comments requested on changedFiles function
Hi Duncan, I think this functionality would be much easier to use and understand if you split it up the functionality of taking snapshots and comparing them into separate functions. In addition, the 'timestamp' functionality seems both confusing and brittle to me. I think it would be better to store file modification times in the snapshot and use those instead of an external file. Maybe: # Take a snapshot of the files. takeFileSnapshot(directory, file.info = TRUE, md5sum = FALSE, full.names = FALSE, recursive = TRUE, ...) # Take a snapshot using the same options as used for snapshot. retakeFileSnapshot(snapshot, directory = snapshot$directory) { takeFileSnapshot)(directory, file.info = snapshot$file.info, md5sum = snapshot$md5sum, etc) } compareFileSnapshots(snapshot1, snapshot2) - or - getNewFiles(snapshat1, snapshot2) # These names are probably too generic getDeletedFiles(snapshot1, snapshot2) getUpdatedFiles(snapshot1, snapshot2) -or- setdiff(snapshot1, snapshot2) # Unclear how this should treat updated files This approach does have the difficulty that users could attempt to compare snapshots that were taken with different options and that can't be compared, but that should be an easy error to detect. Karl On Wed, Sep 4, 2013 at 10:53 AM, Duncan Murdoch murdoch.dun...@gmail.comwrote: In a number of places internal to R, we need to know which files have changed (e.g. after building a vignette). I've just written a general purpose function changedFiles that I'll probably commit to R-devel. Comments on the design (or bug reports) would be appreciated. The source for the function and the Rd page for it are inline below. - changedFiles.R: changedFiles - function(snapshot, timestamp = tempfile(timestamp), file.info = NULL, md5sum = FALSE, full.names = FALSE, ...) { dosnapshot - function(args) { fullnames - do.call(list.files, c(full.names = TRUE, args)) names - do.call(list.files, c(full.names = full.names, args)) if (isTRUE(file.info) || (is.character(file.info) length( file.info))) { info - file.info(fullnames) rownames(info) - names if (isTRUE(file.info)) file.info - c(size, isdir, mode, mtime) } else info - data.frame(row.names=names) if (md5sum) info - data.frame(info, md5sum = tools::md5sum(fullnames)) list(info = info, timestamp = timestamp, file.info = file.info, md5sum = md5sum, full.names = full.names, args = args) } if (missing(snapshot) || !inherits(snapshot, changedFilesSnapshot)) { if (length(timestamp) == 1) file.create(timestamp) if (missing(snapshot)) snapshot - . pre - dosnapshot(list(path = snapshot, ...)) pre$pre - pre$info pre$info - NULL pre$wd - getwd() class(pre) - changedFilesSnapshot return(pre) } if (missing(timestamp)) timestamp - snapshot$timestamp if (missing(file.info) || isTRUE(file.info)) file.info - snapshot$ file.info if (identical(file.info, FALSE)) file.info - NULL if (missing(md5sum))md5sum - snapshot$md5sum if (missing(full.names)) full.names - snapshot$full.names pre - snapshot$pre savewd - getwd() on.exit(setwd(savewd)) setwd(snapshot$wd) args - snapshot$args newargs - list(...) args[names(newargs)] - newargs post - dosnapshot(args)$info prenames - rownames(pre) postnames - rownames(post) added - setdiff(postnames, prenames) deleted - setdiff(prenames, postnames) common - intersect(prenames, postnames) if (length(file.info)) { preinfo - pre[common, file.info] postinfo - post[common, file.info] changes - preinfo != postinfo } else changes - matrix(logical(0), nrow = length(common), ncol = 0, dimnames = list(common, character(0))) if (length(timestamp)) changes - cbind(changes, Newer = file_test(-nt, common, timestamp)) if (md5sum) { premd5 - pre[common, md5sum] postmd5 - post[common, md5sum] changes - cbind(changes, md5sum = premd5 != postmd5) } changes1 - changes[rowSums(changes, na.rm = TRUE) 0, , drop = FALSE] changed - rownames(changes1) structure(list(added = added, deleted = deleted, changed = changed, unchanged = setdiff(common, changed), changes = changes), class = changedFiles) } print.changedFilesSnapshot - function(x, ...) { cat(changedFiles snapshot:\n timestamp = \, x$timestamp, \\n file.info = , if (length(x$file.info)) paste(paste0('', x$file.info, ''), collapse=,), \n md5sum = , x$md5sum, \n args = , deparse(x$args, control = NULL), \n, sep=) x } print.changedFiles - function(x, ...) { if (length(x$added)) cat(Files added:\n, paste0( , x$added, collapse=\n), \n, sep=) if
Re: [Rd] Comments requested on changedFiles function
On 13-09-04 8:02 PM, Karl Millar wrote: Hi Duncan, I think this functionality would be much easier to use and understand if you split it up the functionality of taking snapshots and comparing them into separate functions. Yes, that's another possibility. Some more comment below... In addition, the 'timestamp' functionality seems both confusing and brittle to me. I think it would be better to store file modification times in the snapshot and use those instead of an external file. Maybe: You can do that, using file.info = mtime, but the file.info snapshots are quite a bit slower than using the timestamp file (when looking at a big recursive directory of files). # Take a snapshot of the files. takeFileSnapshot(directory, file.info http://file.info = TRUE, md5sum = FALSE, full.names = FALSE, recursive = TRUE, ...) # Take a snapshot using the same options as used for snapshot. retakeFileSnapshot(snapshot, directory = snapshot$directory) { takeFileSnapshot)(directory, file.info http://file.info = snapshot$file.info http://file.info, md5sum = snapshot$md5sum, etc) } compareFileSnapshots(snapshot1, snapshot2) - or - getNewFiles(snapshat1, snapshot2) # These names are probably too generic getDeletedFiles(snapshot1, snapshot2) getUpdatedFiles(snapshot1, snapshot2) -or- setdiff(snapshot1, snapshot2) # Unclear how this should treat updated files This approach does have the difficulty that users could attempt to compare snapshots that were taken with different options and that can't be compared, but that should be an easy error to detect. I don't want to add too many new functions. The general R style is to have functions that do a lot, rather than have a lot of different functions to achieve different parts of related tasks. This is better for interactive use (fewer functions to remember, a simpler help system to navigate), though it probably results in less readable code. I can see an argument for two functions (a get and a compare), but I don't think there are many cases where doing two gets and comparing the snapshots would be worth the extra runtime. (It's extra because file.info is only a little faster than list.files, and it would be unavoidable to call both twice in that version. Using the timestamp file avoids one of those calls, and replaces the other with file_test, which takes a similar amount of time. So overall it's about 20-25% faster.) It also makes the code a bit more complicated, i.e. three calls (get, get, compare) instead of two (get, compare). Thanks for your comments. Duncan Murdoch Karl On Wed, Sep 4, 2013 at 10:53 AM, Duncan Murdoch murdoch.dun...@gmail.com mailto:murdoch.dun...@gmail.com wrote: In a number of places internal to R, we need to know which files have changed (e.g. after building a vignette). I've just written a general purpose function changedFiles that I'll probably commit to R-devel. Comments on the design (or bug reports) would be appreciated. The source for the function and the Rd page for it are inline below. - changedFiles.R: changedFiles - function(snapshot, timestamp = tempfile(timestamp), file.info http://file.info = NULL, md5sum = FALSE, full.names = FALSE, ...) { dosnapshot - function(args) { fullnames - do.call(list.files, c(full.names = TRUE, args)) names - do.call(list.files, c(full.names = full.names, args)) if (isTRUE(file.info http://file.info) || (is.character(file.info http://file.info) length(file.info http://file.info))) { info - file.info http://file.info(fullnames) rownames(info) - names if (isTRUE(file.info http://file.info)) file.info http://file.info - c(size, isdir, mode, mtime) } else info - data.frame(row.names=names) if (md5sum) info - data.frame(info, md5sum = tools::md5sum(fullnames)) list(info = info, timestamp = timestamp, file.info http://file.info = file.info http://file.info, md5sum = md5sum, full.names = full.names, args = args) } if (missing(snapshot) || !inherits(snapshot, changedFilesSnapshot)) { if (length(timestamp) == 1) file.create(timestamp) if (missing(snapshot)) snapshot - . pre - dosnapshot(list(path = snapshot, ...)) pre$pre - pre$info pre$info - NULL pre$wd - getwd() class(pre) - changedFilesSnapshot return(pre) } if (missing(timestamp)) timestamp - snapshot$timestamp if (missing(file.info http://file.info) || isTRUE(file.info http://file.info)) file.info http://file.info - snapshot$file.info http://file.info if (identical(file.info http://file.info, FALSE)) file.info http://file.info - NULL if (missing(md5sum))md5sum -
Re: [Rd] Comments requested on changedFiles function
On Wed, Sep 4, 2013 at 1:53 PM, Duncan Murdoch murdoch.dun...@gmail.com wrote: In a number of places internal to R, we need to know which files have changed (e.g. after building a vignette). I've just written a general purpose function changedFiles that I'll probably commit to R-devel. Comments on the design (or bug reports) would be appreciated. The source for the function and the Rd page for it are inline below. This looks like a useful function. Thanks for writing it. I have only one (picky) comment below. - changedFiles.R: changedFiles - function(snapshot, timestamp = tempfile(timestamp), file.info = NULL, md5sum = FALSE, full.names = FALSE, ...) { dosnapshot - function(args) { fullnames - do.call(list.files, c(full.names = TRUE, args)) names - do.call(list.files, c(full.names = full.names, args)) if (isTRUE(file.info) || (is.character(file.info) length(file.info))) { info - file.info(fullnames) rownames(info) - names if (isTRUE(file.info)) file.info - c(size, isdir, mode, mtime) } else info - data.frame(row.names=names) if (md5sum) info - data.frame(info, md5sum = tools::md5sum(fullnames)) list(info = info, timestamp = timestamp, file.info = file.info, md5sum = md5sum, full.names = full.names, args = args) } if (missing(snapshot) || !inherits(snapshot, changedFilesSnapshot)) { if (length(timestamp) == 1) file.create(timestamp) if (missing(snapshot)) snapshot - . pre - dosnapshot(list(path = snapshot, ...)) pre$pre - pre$info pre$info - NULL pre$wd - getwd() class(pre) - changedFilesSnapshot return(pre) } if (missing(timestamp)) timestamp - snapshot$timestamp if (missing(file.info) || isTRUE(file.info)) file.info - snapshot$file.info if (identical(file.info, FALSE)) file.info - NULL if (missing(md5sum))md5sum - snapshot$md5sum if (missing(full.names)) full.names - snapshot$full.names pre - snapshot$pre savewd - getwd() on.exit(setwd(savewd)) setwd(snapshot$wd) args - snapshot$args newargs - list(...) args[names(newargs)] - newargs post - dosnapshot(args)$info prenames - rownames(pre) postnames - rownames(post) added - setdiff(postnames, prenames) deleted - setdiff(prenames, postnames) common - intersect(prenames, postnames) if (length(file.info)) { preinfo - pre[common, file.info] postinfo - post[common, file.info] changes - preinfo != postinfo } else changes - matrix(logical(0), nrow = length(common), ncol = 0, dimnames = list(common, character(0))) if (length(timestamp)) changes - cbind(changes, Newer = file_test(-nt, common, timestamp)) if (md5sum) { premd5 - pre[common, md5sum] postmd5 - post[common, md5sum] changes - cbind(changes, md5sum = premd5 != postmd5) } changes1 - changes[rowSums(changes, na.rm = TRUE) 0, , drop = FALSE] changed - rownames(changes1) structure(list(added = added, deleted = deleted, changed = changed, unchanged = setdiff(common, changed), changes = changes), class = changedFiles) } print.changedFilesSnapshot - function(x, ...) { cat(changedFiles snapshot:\n timestamp = \, x$timestamp, \\n file.info = , if (length(x$file.info)) paste(paste0('', x$file.info, ''), collapse=,), \n md5sum = , x$md5sum, \n args = , deparse(x$args, control = NULL), \n, sep=) x } print.changedFiles - function(x, ...) { if (length(x$added)) cat(Files added:\n, paste0( , x$added, collapse=\n), \n, sep=) if (length(x$deleted)) cat(Files deleted:\n, paste0( , x$deleted, collapse=\n), \n, sep=) changes - x$changes changes - changes[rowSums(changes, na.rm = TRUE) 0, , drop=FALSE] changes - changes[, colSums(changes, na.rm = TRUE) 0, drop=FALSE] if (nrow(changes)) { cat(Files changed:\n) print(changes) } x } -- --- changedFiles.Rd: \name{changedFiles} \alias{changedFiles} \alias{print.changedFiles} \alias{print.changedFilesSnapshot} \title{ Detect which files have changed } \description{ On the first call, \code{changedFiles} takes a snapshot of a selection of files. In subsequent calls, it takes another snapshot, and returns an object containing data on the differences between the two snapshots. The snapshots need not be the same directory; this could be used to compare two directories. } \usage{ changedFiles(snapshot, timestamp = tempfile(timestamp), file.info = NULL, md5sum = FALSE, full.names = FALSE, ...) } \arguments{ \item{snapshot}{ The path to record, or a previous snapshot. See the Details. } \item{timestamp}{ The name of a
[Rd] Why does duplicate() make deep copies?
Some experimentation with the below function should convince you that the runtime of the bit inside sys.time is proportional to size*number*times. I think it should only be proportional to number*times. The function is only manipulating a list of references to vectors and not trying to make changes to the vectors themselves. overcopying - function(size, number, times) { #Make a list of NUMBER vectors of SIZE, then reorder the list. The #vectors themselves are never touched, only their references are #moved around. If R implements copy on write correctly the #elapsed time should be ~number*times. L - replicate(number, list(vector(numeric, size)), simplify=FALSE) system.time(for (i in 1:times) { L[sample(number)] - L }) } I see that duplicate.c makes a recursive copy of each element when it encounters a VECSXP or a LISTSXP, which it seems there should be no need for (it should be sufficient to make a shallow copy and ensure NAMED is set on the elements.) Why is R making apparently unnecessary deep copies? Peter [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel