On Wed, Sep 4, 2013 at 1:53 PM, Duncan Murdoch <murdoch.dun...@gmail.com> wrote: > In a number of places internal to R, we need to know which files have > changed (e.g. after building a vignette). I've just written a general > purpose function "changedFiles" that I'll probably commit to R-devel. > Comments on the design (or bug reports) would be appreciated. > > The source for the function and the Rd page for it are inline below.
This looks like a useful function. Thanks for writing it. I have only one (picky) comment below. > ----- changedFiles.R: > changedFiles <- function(snapshot, timestamp = tempfile("timestamp"), > file.info = NULL, > md5sum = FALSE, full.names = FALSE, ...) { > dosnapshot <- function(args) { > fullnames <- do.call(list.files, c(full.names = TRUE, args)) > names <- do.call(list.files, c(full.names = full.names, args)) > if (isTRUE(file.info) || (is.character(file.info) && > length(file.info))) { > info <- file.info(fullnames) > rownames(info) <- names > if (isTRUE(file.info)) > file.info <- c("size", "isdir", "mode", "mtime") > } else > info <- data.frame(row.names=names) > if (md5sum) > info <- data.frame(info, md5sum = tools::md5sum(fullnames)) > list(info = info, timestamp = timestamp, file.info = file.info, > md5sum = md5sum, full.names = full.names, args = args) > } > if (missing(snapshot) || !inherits(snapshot, "changedFilesSnapshot")) { > if (length(timestamp) == 1) > file.create(timestamp) > if (missing(snapshot)) snapshot <- "." > pre <- dosnapshot(list(path = snapshot, ...)) > pre$pre <- pre$info > pre$info <- NULL > pre$wd <- getwd() > class(pre) <- "changedFilesSnapshot" > return(pre) > } > > if (missing(timestamp)) timestamp <- snapshot$timestamp > if (missing(file.info) || isTRUE(file.info)) file.info <- > snapshot$file.info > if (identical(file.info, FALSE)) file.info <- NULL > if (missing(md5sum)) md5sum <- snapshot$md5sum > if (missing(full.names)) full.names <- snapshot$full.names > > pre <- snapshot$pre > savewd <- getwd() > on.exit(setwd(savewd)) > setwd(snapshot$wd) > > args <- snapshot$args > newargs <- list(...) > args[names(newargs)] <- newargs > post <- dosnapshot(args)$info > prenames <- rownames(pre) > postnames <- rownames(post) > > added <- setdiff(postnames, prenames) > deleted <- setdiff(prenames, postnames) > common <- intersect(prenames, postnames) > > if (length(file.info)) { > preinfo <- pre[common, file.info] > postinfo <- post[common, file.info] > changes <- preinfo != postinfo > } > else changes <- matrix(logical(0), nrow = length(common), ncol = 0, > dimnames = list(common, character(0))) > if (length(timestamp)) > changes <- cbind(changes, Newer = file_test("-nt", common, > timestamp)) > if (md5sum) { > premd5 <- pre[common, "md5sum"] > postmd5 <- post[common, "md5sum"] > changes <- cbind(changes, md5sum = premd5 != postmd5) > } > changes1 <- changes[rowSums(changes, na.rm = TRUE) > 0, , drop = FALSE] > changed <- rownames(changes1) > structure(list(added = added, deleted = deleted, changed = changed, > unchanged = setdiff(common, changed), changes = changes), class = > "changedFiles") > } > > print.changedFilesSnapshot <- function(x, ...) { > cat("changedFiles snapshot:\n timestamp = \"", x$timestamp, "\"\n > file.info = ", > if (length(x$file.info)) paste(paste0('"', x$file.info, '"'), > collapse=","), > "\n md5sum = ", x$md5sum, "\n args = ", deparse(x$args, control = > NULL), "\n", sep="") > x > } > > print.changedFiles <- function(x, ...) { > if (length(x$added)) cat("Files added:\n", paste0(" ", x$added, > collapse="\n"), "\n", sep="") > if (length(x$deleted)) cat("Files deleted:\n", paste0(" ", x$deleted, > collapse="\n"), "\n", sep="") > changes <- x$changes > changes <- changes[rowSums(changes, na.rm = TRUE) > 0, , drop=FALSE] > changes <- changes[, colSums(changes, na.rm = TRUE) > 0, drop=FALSE] > if (nrow(changes)) { > cat("Files changed:\n") > print(changes) > } > x > } > ---------------------- > > --- changedFiles.Rd: > \name{changedFiles} > \alias{changedFiles} > \alias{print.changedFiles} > \alias{print.changedFilesSnapshot} > \title{ > Detect which files have changed > } > \description{ > On the first call, \code{changedFiles} takes a snapshot of a selection of > files. In subsequent > calls, it takes another snapshot, and returns an object containing data on > the > differences between the two snapshots. The snapshots need not be the same > directory; > this could be used to compare two directories. > } > \usage{ > changedFiles(snapshot, timestamp = tempfile("timestamp"), file.info = NULL, > md5sum = FALSE, full.names = FALSE, ...) > } > \arguments{ > \item{snapshot}{ > The path to record, or a previous snapshot. See the Details. > } > \item{timestamp}{ > The name of a file to write at the time the initial snapshot > is taken. In subsequent calls, modification times of files will be compared > to > this file, and newer files will be reported as changed. Set to \code{NULL} > to skip this test. > } > \item{file.info}{ > A vector of columns from the result of the \code{file.info} function, or a > logical value. If > \code{TRUE}, columns \code{c("size", "isdir", "mode", "mtime")} will be > used. Set to > \code{FALSE} or \code{NULL} to skip this test. See the Details. > } > \item{md5sum}{ > A logical value indicating whether MD5 summaries should be taken as part of > the snapshot. > } > \item{full.names}{ > A logical value indicating whether full names (as in > \code{\link{list.files}}) should be > recorded. > } > \item{\dots}{ > Additional parameters to pass to \code{\link{list.files}} to control the set > of files > in the snapshots. > } > } > \details{ > This function works in two modes. If the \code{snapshot} argument is > missing or is > not of S3 class \code{"changedFilesSnapshot"}, it is used as the \code{path} > argument > to \code{\link{list.files}} to obtain a list of files. If it is of class > \code{"changedFilesSnapshot"}, then it is taken to be the baseline file > and a new snapshot is taken and compared with it. In the latter case, > missing > arguments default to match those from the initial snapshot. > > If the \code{timestamp} argument is length 1, a file with that name is > created > in the current directory during the initial snapshot, and > \code{\link{file_test}} > is used to compare the age of all files to it during subsequent calls. > > If the \code{file.info} argument is \code{TRUE} or it contains a non-empty > character vector, the indicated columns from the result of a call to > \code{\link{file.info}} will be recorded and compared. > > If \code{md5sum} is \code{TRUE}, the \code{tools::\link{md5sum}} function > will be called to record the 32 byte MD5 checksum for each file, and these > values > will be compared. > } > \value{ > In the initial snapshot phase, an object of class > \code{"changedFilesSnapshot"} is returned. This > is a list containing the fields > \item{pre}{a dataframe whose rownames are the filenames, and whose columns > contain the > requested snapshot data} > \item{timestamp, file.info, md5sum, full.names}{a record of the arguments in > the initial call} > \item{args}{other arguments passed via \code{...} to > \code{\link{list.files}}.} > > In the comparison phase, an object of class \code{"changedFiles"}. This is a > list containing > \item{added, deleted, changed, unchanged}{character vectors of filenames > from the before > and after snapshots, with obvious meanings} > \item{changes}{a logical matrix with a row for each common file, and a > column for each > comparison test. \code{TRUE} indicates a change in that test.} > > \code{\link{print}} methods are defined for each of these types. The > \code{\link{print}} method for \code{"changedFilesSnapshot"} objects > displays the arguments used to produce it, while the one for > \code{"changedFiles"} displays the \code{added}, \code{deleted} > and \code{changed} fields if non-empty, and a submatrix of the > \code{changes} > matrix containing all of the \code{TRUE} values. > } > \author{ > Duncan Murdoch > } > \seealso{ > \code{\link{file.info}}, \code{\link{file_test}}, \code{\link{md5sum}}. > } > \examples{ > # Create some files in a temporary directory > dir <- tempfile() > dir.create(dir) Should a different name than 'dir' be used since 'dir' is a base function? Further, if someone is not very familiar with R (or just not in "R mode" at the time of reading), they might think that 'dir.create' is calling the create member of the object named 'dir' that you just made. Scott > writeBin(1, file.path(dir, "file1")) > writeBin(2, file.path(dir, "file2")) > dir.create(file.path(dir, "dir")) > > # Take a snapshot > snapshot <- changedFiles(dir, file.info=TRUE, md5sum=TRUE) > > # Change one of the files > writeBin(3, file.path(dir, "file2")) > > # Display the detected changes > changedFiles(snapshot) > changedFiles(snapshot)$changes > } > \keyword{utilities} > \keyword{file} > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel -- Scott Kostyshak Economics PhD Candidate Princeton University ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel