[Rd] Comments requested on changedFiles function

2013-09-04 Thread Duncan Murdoch
In a number of places internal to R, we need to know which files have 
changed (e.g. after building a vignette).  I've just written a general 
purpose function changedFiles that I'll probably commit to R-devel.  
Comments on the design (or bug reports) would be appreciated.


The source for the function and the Rd page for it are inline below.

- changedFiles.R:
changedFiles - function(snapshot, timestamp = tempfile(timestamp), 
file.info = NULL,

 md5sum = FALSE, full.names = FALSE, ...) {
dosnapshot - function(args) {
fullnames - do.call(list.files, c(full.names = TRUE, args))
names - do.call(list.files, c(full.names = full.names, args))
if (isTRUE(file.info) || (is.character(file.info)  
length(file.info))) {

info - file.info(fullnames)
rownames(info) - names
if (isTRUE(file.info))
file.info - c(size, isdir, mode, mtime)
} else
info - data.frame(row.names=names)
if (md5sum)
info - data.frame(info, md5sum = tools::md5sum(fullnames))
list(info = info, timestamp = timestamp, file.info = file.info,
 md5sum = md5sum, full.names = full.names, args = args)
}
if (missing(snapshot) || !inherits(snapshot, changedFilesSnapshot)) {
if (length(timestamp) == 1)
file.create(timestamp)
if (missing(snapshot)) snapshot - .
pre - dosnapshot(list(path = snapshot, ...))
pre$pre - pre$info
pre$info - NULL
pre$wd - getwd()
class(pre) - changedFilesSnapshot
return(pre)
}

if (missing(timestamp)) timestamp - snapshot$timestamp
if (missing(file.info) || isTRUE(file.info)) file.info - 
snapshot$file.info

if (identical(file.info, FALSE)) file.info - NULL
if (missing(md5sum))md5sum - snapshot$md5sum
if (missing(full.names)) full.names - snapshot$full.names

pre - snapshot$pre
savewd - getwd()
on.exit(setwd(savewd))
setwd(snapshot$wd)

args - snapshot$args
newargs - list(...)
args[names(newargs)] - newargs
post - dosnapshot(args)$info
prenames - rownames(pre)
postnames - rownames(post)

added - setdiff(postnames, prenames)
deleted - setdiff(prenames, postnames)
common - intersect(prenames, postnames)

if (length(file.info)) {
preinfo - pre[common, file.info]
postinfo - post[common, file.info]
changes - preinfo != postinfo
}
else changes - matrix(logical(0), nrow = length(common), ncol = 0,
   dimnames = list(common, character(0)))
if (length(timestamp))
changes - cbind(changes, Newer = file_test(-nt, common, 
timestamp))

if (md5sum) {
premd5 - pre[common, md5sum]
postmd5 - post[common, md5sum]
changes - cbind(changes, md5sum = premd5 != postmd5)
}
changes1 - changes[rowSums(changes, na.rm = TRUE)  0, , drop = FALSE]
changed - rownames(changes1)
structure(list(added = added, deleted = deleted, changed = changed,
unchanged = setdiff(common, changed), changes = changes), class 
= changedFiles)

}

print.changedFilesSnapshot - function(x, ...) {
cat(changedFiles snapshot:\n timestamp = \, x$timestamp, \\n 
file.info = ,
if (length(x$file.info)) paste(paste0('', x$file.info, ''), 
collapse=,),
\n md5sum = , x$md5sum, \n args = , deparse(x$args, control 
= NULL), \n, sep=)

x
}

print.changedFiles - function(x, ...) {
if (length(x$added)) cat(Files added:\n,  paste0(  , x$added, 
collapse=\n), \n, sep=)
if (length(x$deleted)) cat(Files deleted:\n,  paste0(  , 
x$deleted, collapse=\n), \n, sep=)

changes - x$changes
changes - changes[rowSums(changes, na.rm = TRUE)  0, , drop=FALSE]
changes - changes[, colSums(changes, na.rm = TRUE)  0, drop=FALSE]
if (nrow(changes)) {
cat(Files changed:\n)
print(changes)
}
x
}
--

--- changedFiles.Rd:
\name{changedFiles}
\alias{changedFiles}
\alias{print.changedFiles}
\alias{print.changedFilesSnapshot}
\title{
Detect which files have changed
}
\description{
On the first call, \code{changedFiles} takes a snapshot of a selection 
of files.  In subsequent
calls, it takes another snapshot, and returns an object containing data 
on the
differences between the two snapshots.  The snapshots need not be the 
same directory;

this could be used to compare two directories.
}
\usage{
changedFiles(snapshot, timestamp = tempfile(timestamp), file.info = NULL,
 md5sum = FALSE, full.names = FALSE, ...)
}
\arguments{
  \item{snapshot}{
The path to record, or a previous snapshot.  See the Details.
}
  \item{timestamp}{
The name of a file to write at the time the initial snapshot
is taken.  In subsequent calls, modification times of files will be 
compared to

this file, and newer files will be reported as changed.  Set to \code{NULL}
to skip this test.
}
  \item{file.info}{
A vector of columns from the 

Re: [Rd] libR.so: cannot open shared object file

2013-09-04 Thread Prof Brian Ripley

On 04/09/2013 19:58, Geoff Jentry wrote:

Can you add some details?
Suppose i have the package Model.tar.gz and my writable are is in
user/area, what i have to do next to install the package?


What I was picturing was something like this (forgive me if syntax isn't
100%):

mkdir user/area/myRLib
R CMD INSTALL --library=user/area/myRLib Model.tar.gz

and then in R:
library(Model, lib.loc=user/area/myRLib)

Note though Brian Ripley's response to me where he indicates that this
is handled automatically.


Yes,  install.packages(Model.tar.gz) should suffice.



-J

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] libR.so: cannot open shared object file

2013-09-04 Thread Geoff Jentry

Can you add some details?
Suppose i have the package Model.tar.gz and my writable are is in user/area, 
what i have to do next to install the package?


What I was picturing was something like this (forgive me if syntax isn't 
100%):


mkdir user/area/myRLib
R CMD INSTALL --library=user/area/myRLib Model.tar.gz

and then in R:
library(Model, lib.loc=user/area/myRLib)

Note though Brian Ripley's response to me where he indicates that this is 
handled automatically.


-J

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Comments requested on changedFiles function

2013-09-04 Thread Karl Millar
Hi Duncan,

I think this functionality would be much easier to use and understand if
you split it up the functionality of taking snapshots and comparing them
into separate functions.  In addition, the 'timestamp' functionality seems
both confusing and brittle to me.  I think it would be better to store file
modification times in the snapshot and use those instead of an external
file.  Maybe:

# Take a snapshot of the files.
takeFileSnapshot(directory, file.info = TRUE, md5sum = FALSE, full.names =
FALSE, recursive = TRUE, ...)

# Take a snapshot using the same options as used for snapshot.
retakeFileSnapshot(snapshot, directory = snapshot$directory) {
   takeFileSnapshot)(directory, file.info = snapshot$file.info, md5sum =
snapshot$md5sum, etc)
}

compareFileSnapshots(snapshot1, snapshot2)
- or -
getNewFiles(snapshat1, snapshot2)   # These names are probably too
generic
getDeletedFiles(snapshot1, snapshot2)
getUpdatedFiles(snapshot1, snapshot2)
-or-
setdiff(snapshot1, snapshot2)  # Unclear how this should treat updated files


This approach does have the difficulty that users could attempt to compare
snapshots that were taken with different options and that can't be
compared, but that should be an easy error to detect.

Karl


On Wed, Sep 4, 2013 at 10:53 AM, Duncan Murdoch murdoch.dun...@gmail.comwrote:

 In a number of places internal to R, we need to know which files have
 changed (e.g. after building a vignette).  I've just written a general
 purpose function changedFiles that I'll probably commit to R-devel.
  Comments on the design (or bug reports) would be appreciated.

 The source for the function and the Rd page for it are inline below.

 - changedFiles.R:
 changedFiles - function(snapshot, timestamp = tempfile(timestamp),
 file.info = NULL,
  md5sum = FALSE, full.names = FALSE, ...) {
 dosnapshot - function(args) {
 fullnames - do.call(list.files, c(full.names = TRUE, args))
 names - do.call(list.files, c(full.names = full.names, args))
 if (isTRUE(file.info) || (is.character(file.info)  length(
 file.info))) {
 info - file.info(fullnames)
 rownames(info) - names
 if (isTRUE(file.info))
 file.info - c(size, isdir, mode, mtime)
 } else
 info - data.frame(row.names=names)
 if (md5sum)
 info - data.frame(info, md5sum = tools::md5sum(fullnames))
 list(info = info, timestamp = timestamp, file.info = file.info,
  md5sum = md5sum, full.names = full.names, args = args)
 }
 if (missing(snapshot) || !inherits(snapshot, changedFilesSnapshot)) {
 if (length(timestamp) == 1)
 file.create(timestamp)
 if (missing(snapshot)) snapshot - .
 pre - dosnapshot(list(path = snapshot, ...))
 pre$pre - pre$info
 pre$info - NULL
 pre$wd - getwd()
 class(pre) - changedFilesSnapshot
 return(pre)
 }

 if (missing(timestamp)) timestamp - snapshot$timestamp
 if (missing(file.info) || isTRUE(file.info)) file.info - snapshot$
 file.info
 if (identical(file.info, FALSE)) file.info - NULL
 if (missing(md5sum))md5sum - snapshot$md5sum
 if (missing(full.names)) full.names - snapshot$full.names

 pre - snapshot$pre
 savewd - getwd()
 on.exit(setwd(savewd))
 setwd(snapshot$wd)

 args - snapshot$args
 newargs - list(...)
 args[names(newargs)] - newargs
 post - dosnapshot(args)$info
 prenames - rownames(pre)
 postnames - rownames(post)

 added - setdiff(postnames, prenames)
 deleted - setdiff(prenames, postnames)
 common - intersect(prenames, postnames)

 if (length(file.info)) {
 preinfo - pre[common, file.info]
 postinfo - post[common, file.info]
 changes - preinfo != postinfo
 }
 else changes - matrix(logical(0), nrow = length(common), ncol = 0,
dimnames = list(common, character(0)))
 if (length(timestamp))
 changes - cbind(changes, Newer = file_test(-nt, common,
 timestamp))
 if (md5sum) {
 premd5 - pre[common, md5sum]
 postmd5 - post[common, md5sum]
 changes - cbind(changes, md5sum = premd5 != postmd5)
 }
 changes1 - changes[rowSums(changes, na.rm = TRUE)  0, , drop = FALSE]
 changed - rownames(changes1)
 structure(list(added = added, deleted = deleted, changed = changed,
 unchanged = setdiff(common, changed), changes = changes), class =
 changedFiles)
 }

 print.changedFilesSnapshot - function(x, ...) {
 cat(changedFiles snapshot:\n timestamp = \, x$timestamp, \\n
 file.info = ,
 if (length(x$file.info)) paste(paste0('', x$file.info, ''),
 collapse=,),
 \n md5sum = , x$md5sum, \n args = , deparse(x$args, control =
 NULL), \n, sep=)
 x
 }

 print.changedFiles - function(x, ...) {
 if (length(x$added)) cat(Files added:\n,  paste0(  , x$added,
 collapse=\n), \n, sep=)
 if 

Re: [Rd] Comments requested on changedFiles function

2013-09-04 Thread Duncan Murdoch

On 13-09-04 8:02 PM, Karl Millar wrote:

Hi Duncan,

I think this functionality would be much easier to use and understand if
you split it up the functionality of taking snapshots and comparing them
into separate functions.


Yes, that's another possibility.  Some more comment below...


 In addition, the 'timestamp' functionality

seems both confusing and brittle to me.  I think it would be better to
store file modification times in the snapshot and use those instead of
an external file.  Maybe:


You can do that, using file.info = mtime, but the file.info snapshots 
are quite a bit slower than using the timestamp file (when looking at a 
big recursive directory of files).




# Take a snapshot of the files.
takeFileSnapshot(directory, file.info http://file.info = TRUE, md5sum
= FALSE, full.names = FALSE, recursive = TRUE, ...)

# Take a snapshot using the same options as used for snapshot.
retakeFileSnapshot(snapshot, directory = snapshot$directory) {
takeFileSnapshot)(directory, file.info http://file.info =
snapshot$file.info http://file.info, md5sum = snapshot$md5sum, etc)
}

compareFileSnapshots(snapshot1, snapshot2)
- or -
getNewFiles(snapshat1, snapshot2)   # These names are probably too
generic
getDeletedFiles(snapshot1, snapshot2)
getUpdatedFiles(snapshot1, snapshot2)
-or-
setdiff(snapshot1, snapshot2)  # Unclear how this should treat updated files


This approach does have the difficulty that users could attempt to
compare snapshots that were taken with different options and that can't
be compared, but that should be an easy error to detect.


I don't want to add too many new functions.  The general R style is to 
have functions that do a lot, rather than have a lot of different 
functions to achieve different parts of related tasks.  This is better 
for interactive use (fewer functions to remember, a simpler help system 
to navigate), though it probably results in less readable code.


I can see an argument for two functions (a get and a compare), but I 
don't think there are many cases where doing two gets and comparing the 
snapshots would be worth the extra runtime.  (It's extra because 
file.info is only a little faster than list.files, and it would be 
unavoidable to call both twice in that version.  Using the timestamp 
file avoids one of those calls, and replaces the other with file_test, 
which takes a similar amount of time.  So overall it's about 20-25% 
faster.)  It also makes the code a bit more complicated, i.e. three 
calls (get, get, compare) instead of two (get, compare).


Thanks for your comments.

Duncan Murdoch




Karl


On Wed, Sep 4, 2013 at 10:53 AM, Duncan Murdoch
murdoch.dun...@gmail.com mailto:murdoch.dun...@gmail.com wrote:

In a number of places internal to R, we need to know which files
have changed (e.g. after building a vignette).  I've just written a
general purpose function changedFiles that I'll probably commit to
R-devel.  Comments on the design (or bug reports) would be appreciated.

The source for the function and the Rd page for it are inline below.

- changedFiles.R:
changedFiles - function(snapshot, timestamp =
tempfile(timestamp), file.info http://file.info = NULL,
  md5sum = FALSE, full.names = FALSE, ...) {
 dosnapshot - function(args) {
 fullnames - do.call(list.files, c(full.names = TRUE, args))
 names - do.call(list.files, c(full.names = full.names, args))
 if (isTRUE(file.info http://file.info) ||
(is.character(file.info http://file.info)  length(file.info
http://file.info))) {
 info - file.info http://file.info(fullnames)
 rownames(info) - names
 if (isTRUE(file.info http://file.info))
file.info http://file.info - c(size, isdir, mode, mtime)
 } else
 info - data.frame(row.names=names)
 if (md5sum)
 info - data.frame(info, md5sum = tools::md5sum(fullnames))
 list(info = info, timestamp = timestamp, file.info
http://file.info = file.info http://file.info,
  md5sum = md5sum, full.names = full.names, args = args)
 }
 if (missing(snapshot) || !inherits(snapshot,
changedFilesSnapshot)) {
 if (length(timestamp) == 1)
 file.create(timestamp)
 if (missing(snapshot)) snapshot - .
 pre - dosnapshot(list(path = snapshot, ...))
 pre$pre - pre$info
 pre$info - NULL
 pre$wd - getwd()
 class(pre) - changedFilesSnapshot
 return(pre)
 }

 if (missing(timestamp)) timestamp - snapshot$timestamp
 if (missing(file.info http://file.info) || isTRUE(file.info
http://file.info)) file.info http://file.info -
snapshot$file.info http://file.info
 if (identical(file.info http://file.info, FALSE)) file.info
http://file.info - NULL
 if (missing(md5sum))md5sum - 

Re: [Rd] Comments requested on changedFiles function

2013-09-04 Thread Scott Kostyshak
On Wed, Sep 4, 2013 at 1:53 PM, Duncan Murdoch murdoch.dun...@gmail.com wrote:
 In a number of places internal to R, we need to know which files have
 changed (e.g. after building a vignette).  I've just written a general
 purpose function changedFiles that I'll probably commit to R-devel.
 Comments on the design (or bug reports) would be appreciated.

 The source for the function and the Rd page for it are inline below.

This looks like a useful function. Thanks for writing it. I have only
one (picky) comment below.

 - changedFiles.R:
 changedFiles - function(snapshot, timestamp = tempfile(timestamp),
 file.info = NULL,
  md5sum = FALSE, full.names = FALSE, ...) {
 dosnapshot - function(args) {
 fullnames - do.call(list.files, c(full.names = TRUE, args))
 names - do.call(list.files, c(full.names = full.names, args))
 if (isTRUE(file.info) || (is.character(file.info) 
 length(file.info))) {
 info - file.info(fullnames)
 rownames(info) - names
 if (isTRUE(file.info))
 file.info - c(size, isdir, mode, mtime)
 } else
 info - data.frame(row.names=names)
 if (md5sum)
 info - data.frame(info, md5sum = tools::md5sum(fullnames))
 list(info = info, timestamp = timestamp, file.info = file.info,
  md5sum = md5sum, full.names = full.names, args = args)
 }
 if (missing(snapshot) || !inherits(snapshot, changedFilesSnapshot)) {
 if (length(timestamp) == 1)
 file.create(timestamp)
 if (missing(snapshot)) snapshot - .
 pre - dosnapshot(list(path = snapshot, ...))
 pre$pre - pre$info
 pre$info - NULL
 pre$wd - getwd()
 class(pre) - changedFilesSnapshot
 return(pre)
 }

 if (missing(timestamp)) timestamp - snapshot$timestamp
 if (missing(file.info) || isTRUE(file.info)) file.info -
 snapshot$file.info
 if (identical(file.info, FALSE)) file.info - NULL
 if (missing(md5sum))md5sum - snapshot$md5sum
 if (missing(full.names)) full.names - snapshot$full.names

 pre - snapshot$pre
 savewd - getwd()
 on.exit(setwd(savewd))
 setwd(snapshot$wd)

 args - snapshot$args
 newargs - list(...)
 args[names(newargs)] - newargs
 post - dosnapshot(args)$info
 prenames - rownames(pre)
 postnames - rownames(post)

 added - setdiff(postnames, prenames)
 deleted - setdiff(prenames, postnames)
 common - intersect(prenames, postnames)

 if (length(file.info)) {
 preinfo - pre[common, file.info]
 postinfo - post[common, file.info]
 changes - preinfo != postinfo
 }
 else changes - matrix(logical(0), nrow = length(common), ncol = 0,
dimnames = list(common, character(0)))
 if (length(timestamp))
 changes - cbind(changes, Newer = file_test(-nt, common,
 timestamp))
 if (md5sum) {
 premd5 - pre[common, md5sum]
 postmd5 - post[common, md5sum]
 changes - cbind(changes, md5sum = premd5 != postmd5)
 }
 changes1 - changes[rowSums(changes, na.rm = TRUE)  0, , drop = FALSE]
 changed - rownames(changes1)
 structure(list(added = added, deleted = deleted, changed = changed,
 unchanged = setdiff(common, changed), changes = changes), class =
 changedFiles)
 }

 print.changedFilesSnapshot - function(x, ...) {
 cat(changedFiles snapshot:\n timestamp = \, x$timestamp, \\n
 file.info = ,
 if (length(x$file.info)) paste(paste0('', x$file.info, ''),
 collapse=,),
 \n md5sum = , x$md5sum, \n args = , deparse(x$args, control =
 NULL), \n, sep=)
 x
 }

 print.changedFiles - function(x, ...) {
 if (length(x$added)) cat(Files added:\n,  paste0(  , x$added,
 collapse=\n), \n, sep=)
 if (length(x$deleted)) cat(Files deleted:\n,  paste0(  , x$deleted,
 collapse=\n), \n, sep=)
 changes - x$changes
 changes - changes[rowSums(changes, na.rm = TRUE)  0, , drop=FALSE]
 changes - changes[, colSums(changes, na.rm = TRUE)  0, drop=FALSE]
 if (nrow(changes)) {
 cat(Files changed:\n)
 print(changes)
 }
 x
 }
 --

 --- changedFiles.Rd:
 \name{changedFiles}
 \alias{changedFiles}
 \alias{print.changedFiles}
 \alias{print.changedFilesSnapshot}
 \title{
 Detect which files have changed
 }
 \description{
 On the first call, \code{changedFiles} takes a snapshot of a selection of
 files.  In subsequent
 calls, it takes another snapshot, and returns an object containing data on
 the
 differences between the two snapshots.  The snapshots need not be the same
 directory;
 this could be used to compare two directories.
 }
 \usage{
 changedFiles(snapshot, timestamp = tempfile(timestamp), file.info = NULL,
  md5sum = FALSE, full.names = FALSE, ...)
 }
 \arguments{
   \item{snapshot}{
 The path to record, or a previous snapshot.  See the Details.
 }
   \item{timestamp}{
 The name of a 

[Rd] Why does duplicate() make deep copies?

2013-09-04 Thread Peter Meilstrup
Some experimentation with the below function should convince you that the
runtime of the bit inside sys.time is proportional to size*number*times. I
think it should only be proportional to number*times. The function is only
manipulating a list of references to vectors and not trying to make changes
to the vectors themselves.

overcopying - function(size, number, times) {
  #Make a list of NUMBER vectors of SIZE, then reorder the list.  The
  #vectors themselves are never touched, only their references are
  #moved around.  If R implements copy on write correctly the
  #elapsed time should be ~number*times.
  L - replicate(number, list(vector(numeric, size)), simplify=FALSE)
  system.time(for (i in 1:times) {
L[sample(number)] - L
  })
}

I see that duplicate.c makes a recursive copy of each element when it
encounters a VECSXP or a LISTSXP, which it seems there should be no need
for (it should be sufficient to make a shallow copy and ensure NAMED is set
on the elements.)

Why is R making apparently unnecessary deep copies?

Peter

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel