Re: [Rd] difficulties with setMethod([ and ...

2010-05-18 Thread Martin Maechler
 Tony Plate tpl...@acm.org
 on Mon, 17 May 2010 20:51:12 -0600 writes:

 Jim, yes, I have dealt with that particular challenge that list(...) 
 throws an error for a call like f(x,,,) where the empty args match to a 
 ... formal argument.   Here's some fragments of code that I used to cope 
 with this:

 # to find the empty anon args, must work with the unevaluated dot args
 dot.args.uneval - match.call(expand.dots=FALSE)$...
 if (length(dot.args.uneval))
 missing.dot.args - sapply(dot.args.uneval, function(arg) 
 is.symbol(arg)  as.character(arg)==)
 else
 missing.dot.args - logical(0)
 ...
 # Now we can work with evaluated dot args.
 # Can't do dot.args - list(...) because that will
 # stop with an error for missing args.
 dot.args - mapply(dot.args.uneval, missing.dot.args, 
 FUN=function(arg, m) if (!m) eval(arg) else NULL)

I don't have much time at the moment, to delve into Jim's code,
nor to analyze what exactly Tony's does.

Some notes however which I deem important:

1) My experiece in writing many S4 methods for [  -- with the
   Matrix package, but also for 'Rmpfr' -- is that you 
   really need to work  with  nargs()
   rather than with things like  length(list(...))

2) If you really want to be compatible to the very rich
   semantics of S and R subsetting, you need to spend more time
   than you anticipate.

   - negative subscripts, names, logicals
   
   - A[i]  for an array A   where i can be a vector and then the
 array is treated as if it had no dim() attribute
   - A[i]  for an array A   where i is a *matrix* with k columns
   where  k - length(dim(A))  --- (k = 2 for matrices)
   - A[]
   

  Are you sure you would not try to use
setClass('myExample', contains = array, representation = ...)
  rather than your
setClass('myExample', representation(x = array, ...))
  ?
  You would get all the [ (and other array methods) for free,
  and would only need to specify those methods where 'myExample'
  really differed from array-subsetting.

3) Lots of well-testedsetMethod([, )  examples
   are in the sources of the Matrix package.

   There, BTW, I found it useful to use

  ## for 'i' in x[i] or A[i,] : (numeric = {double, integer})
  setClassUnion(index, members =  c(numeric, logical, character))

 and then, e.g.,  a simple example method ..

  setMethod([, signature(x = denseMatrix, i = index, j = missing,
   drop = logical),
function (x, i, j, ..., drop) {
if((na - nargs()) == 3)
r - as(x, matrix)[i, drop=drop]
else if(na == 4)
r - as(x, matrix)[i, , drop=drop]
else stop(invalid nargs()= ,na)
if(is.null(dim(r))) r else as(r, geClass(x))
})
   
  The examples in the Rmpfr package are much less and simpler.

  To find the methods, for both, use  
  fgrep 'setMethod([' R/*R
  if you are on a decent OS and in side the package source directory.
 
--
Martin Maechler, ETH Zurich

 Let me know if you need any further explanation.

 Several warnings:
 * I was using this code with S3 generics and methods.
 * There are quite possibly better ways of detecting empty unevaluated 
 arguments than 'is.symbol(arg)  as.character(arg)=='.
 * You'll probably want to be careful that the eval() in the last line is 
 using the appropriate environment for your application.

 I didn't read your code in detail, so apologies if the above is 
 off-the-point, but your verbal description of the problem and the coding 
 style and comments in the [ method for myExample triggered my memory.

 -- Tony Plate

 On 05/17/2010 07:48 PM, James Bullard wrote:
 Apologies if I am not understanding something about how things are being
 handled when using S4 methods, but I have been unable to find an answer 
to
 my problem for some time now.
 
 Briefly, I am associating the generic '[' with a class which I wrote
 (here: myExample). The underlying back-end allows me to read contiguous
 slabs, e.g., 1:10, but not c(1, 10). I want to shield the user from this
 infelicity, so I grab the slab and then subset in memory. The main 
problem
 is with datasets with dim(.)  2. In this case, the '...' argument 
doesn't
 seem to be in a reasonable state. When it is indeed missing then it
 properly reports that fact, however, when it is not missing it reports
 that it is not missing, but then the call to: list(...) throws an 
argument
 is missing exception.
 
 I cannot imagine that this has not occurred before, so I am expecting
 someone might be able to point me to some example code. I have attached
 some code demonstrating my general problem ((A) and (B) below) as well as
 the outline of the sub-selection code. I have to say that coding this has
 proven non-trivial and 

[Rd] BIC() in stats {was [R-sig-ME] how to extract the BIC value}

2010-05-18 Thread Martin Maechler
 MM == Martin Maechler maech...@stat.math.ethz.ch
 on Tue, 18 May 2010 12:37:21 +0200 writes:

 GaGr == Gabor Grothendieck ggrothendi...@gmail.com
 on Mon, 17 May 2010 09:45:00 -0400 writes:

GaGr BIC seems like something that would logically go into stats in the
GaGr core of R, as AIC is already, and then various packages could define
GaGr methods for it.

MM Well, if you look at help(AIC):

 Usage:

 AIC(object, ..., k = 2)

 Arguments:

 object: a fitted model object, for which there exists a ‘logLik’
 method to extract the corresponding log-likelihood, or an
 object inheriting from class ‘logLik’.

 ...: optionally more fitted model objects.

 k: numeric, the _penalty_ per parameter to be used; the default
 ‘k = 2’ is the classical AIC.

MM you may note that the original authors of AIC where always
MM allowing the AIC() function (and its methods) to compute the BIC,
MM simply by using 'k = log(n)' where of course n  must be correct.

MM I do like the concept that BIC is just a variation of AIC and
MM AFAIK, AIC was really first (historically).

MM Typically (and with lme4), the 'n' needed is already part of the 
logLik()
MM attributes :

 AIC((ll - logLik(fm1)), k = log(attr(ll,nobs)))
MM REML 
MM 1774.786 

MM indeed gives the BIC (where the REML name may or may not be a
MM bit overkill)


MM A stats-package based  BIC function could then simply be defined as

 BIC - function (object, ...) UseMethod(BIC)

 BIC.default - function (object, ...) BIC(logLik(object), ...)

 BIC.logLik - function (object, ...) 
AIC(object, ..., k = log(attr(object,nobs)))

 {well, modulo the fact that ... should really allow to do
  this for *several* models simultaneously}

In addition to that (and more replying to Doug Bates):

Given nlme's tradition of explicitly providing BIC(), and in
analogue to the S3 semantics of the AIC() methods,

- I think lme4 (and lme4a on R-forge) should end up having
  working  AIC() and BIC() directly for fitted models, instead of
  having to use
 AIC(logLik(.))   andBIC(logLik(.))

  The reason that even the first of this currently does *not*
  work is that lme4 imports AIC from stats but should do so
  from stats4.
  -- I'm about to change that for 'lme4' (and 'lme4a').

  However, for the BIC case, ... see below


- I tend to agree with Gabor (for once! :-)  that
  basic BIC methods (S3, alas) should move from  nlme to stats.
  
  For this reason, I'm breaking the rule of do not cross-post
  for once, and am hereby diverting this thread to R-devel

Martin


MM --
MM Martin Maechler, ETH Zurich

GaGr On Mon, May 17, 2010 at 9:29 AM, Douglas Bates ba...@stat.wisc.edu 
wrote:
 On Mon, May 17, 2010 at 5:54 AM, Andy Fugard (Work)
 andy.fug...@sbg.ac.at wrote:
 Greetings,
 
 Assuming you're using lmer, here's an example which does what you need:
 
 (fm1    - lmer(Reaction ~ Days + (Days|Subject), sleepstudy))
 Linear mixed model fit by REML
 Formula: Reaction ~ Days + (Days | Subject)
   Data: sleepstudy
  AIC  BIC logLik deviance REMLdev
  1756 1775 -871.8     1752    1744
 Random effects:
  Groups   Name        Variance Std.Dev. Corr
  Subject  (Intercept) 612.092  24.7405
          Days         35.072   5.9221  0.066
  Residual             654.941  25.5918
 Number of obs: 180, groups: Subject, 18
 
 Fixed effects:
            Estimate Std. Error t value
 (Intercept)  251.405      6.825   36.84
 Days          10.467      1.546    6.77
 
 Correlation of Fixed Effects:
     (Intr)
 Days -0.138
 
 (fm1fit - summary(fm1)@AICtab)
      AIC      BIC    logLik deviance  REMLdev
  1755.628 1774.786 -871.8141 1751.986 1743.628
 
 fm1fit$BIC
 [1] 1774.786
 
 That's one way of doing it but it relies on a particular
 representation of the object returned by summary, and that is subject
 to change.
 
 I had thought that it would work to use
 
 BIC(logLik(fm1))
 
 but that doesn't because the BIC function is imported from the nlme
 package but not later exported.  The situation is rather tricky - at
 one point I defined a generic for BIC in the lme4 package but that led
 to conflicts when multiple packages defined different versions.  The
 order in which the packages were loaded became important in
 determining which version was used.
 
 We agreed to use the generic from the nlme package, which is what is
 now done.  However, I don't want to make the entire nlme package
 visible when you have loaded lme4 because of resulting conflicts.
 
 I can get the result as
 
 (fm1 - lmer(Reaction ~ Days + (Days|Subject), sleepstudy))
 Linear mixed model fit by REML
 Formula: Reaction ~ Days + (Days | Subject)
   Data: 

Re: [Rd] 00LOCK and nfs

2010-05-18 Thread Kasper Daniel Hansen
This is a follow-up to an old thread with kind of solution to the
00LOCK problem on NFS.

I have made a patch to install.packages to accept a new argument
  locktype = c(lock, no-lock, pkglock)
which is passed to R CMD INSTALL.  This addition might have
independent interest aside from the NFS problem, as it exposes
functionality from R CMD INSTALL to install.packages and the very
convenient update.packages.  Patches are at
  http://www.biostat.jhsph.edu/~khansen/packages2.R-patch
  http://www.biostat.jhsph.edu/~khansen/install.packages.Rd-patch
(patches to files in the utils package) and both
  R-devel (R version 2.12.0 Under development (unstable) (2010-05-17 r52025))
and
  R-2.11 (R version 2.11.0 Patched (2010-05-17 r52025))
passed make check-all with these two patches applied.  I thought about
adding a note describing my findings below to the details section, but
decided against it.

Regarding the 00LOCK problem.  In my testing, using the patches above
and setting locktype = pkglock, makes it possible to deal with the
NFS problem.  Specifically, I have not been able to make
update.packages() fail midway, due to a un-removable 00LOCK file
(which is not too surprising, as I now have a per-package lock).

However, sometimes (but far less frequently than before), a
00LOCK-pkgname directory remains after update/install.packages.
Sometimes this 00LOCK-pkgname directory does not contain any .nfs*
files (!?) and sometimes it does. For this reason, I still precede any
install/update.packages with a check for the existence of a
00LOCK-pkgname directory and an attempt to remove it.

The difference between using locktype = pkglock and not is
specifically that without, it was possible for update.packages to fail
midway even though there were no 00LOCK* files at the start of the
update process.

Originally I hypothesized that the presence of the .nfs* files in the
00LOCK directory had to do with synchronization issues between the
file server and the compute node.  In order to approach this I tried
to insert a
  system(sleep 10)
at the beginning of
  do_cleanup
in
  tools/R/install.R
but that did not work.

Since the pkglock approach described above seems to solve this issue
for me, I have not pursued the synchronization issue further.

Kasper

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] BIC() in stats {was [R-sig-ME] how to extract the BIC value}

2010-05-18 Thread Martin Maechler

Adding to my own statements  (below) :

 MM == Martin Maechler maech...@stat.math.ethz.ch
 on Tue, 18 May 2010 13:05:27 +0200 writes:

 MM == Martin Maechler maech...@stat.math.ethz.ch
 on Tue, 18 May 2010 12:37:21 +0200 writes:

 GaGr == Gabor Grothendieck ggrothendi...@gmail.com
 on Mon, 17 May 2010 09:45:00 -0400 writes:

GaGr BIC seems like something that would logically go into stats in the
GaGr core of R, as AIC is already, and then various packages could define
GaGr methods for it.

MM Well, if you look at help(AIC):

 Usage:

 AIC(object, ..., k = 2)

 Arguments:

 object: a fitted model object, for which there exists a ‘logLik’
 method to extract the corresponding log-likelihood, or an
 object inheriting from class ‘logLik’.

 ...: optionally more fitted model objects.

 k: numeric, the _penalty_ per parameter to be used; the default
 ‘k = 2’ is the classical AIC.

MM you may note that the original authors of AIC where always
MM allowing the AIC() function (and its methods) to compute the BIC,
MM simply by using 'k = log(n)' where of course n  must be correct.

MM I do like the concept that BIC is just a variation of AIC and
MM AFAIK, AIC was really first (historically).

MM Typically (and with lme4), the 'n' needed is already part of the 
logLik()
MM attributes :

 AIC((ll - logLik(fm1)), k = log(attr(ll,nobs)))
MM REML 
MM 1774.786 

MM indeed gives the BIC (where the REML name may or may not be a
MM bit overkill)


MM A stats-package based  BIC function could then simply be defined as

 BIC - function (object, ...) UseMethod(BIC)

 BIC.default - function (object, ...) BIC(logLik(object), ...)

 BIC.logLik - function (object, ...) 
 AIC(object, ..., k = log(attr(object,nobs)))

MM {well, modulo the fact that ... should really allow to do
MM this for *several* models simultaneously}

MM In addition to that (and more replying to Doug Bates):

MM Given nlme's tradition of explicitly providing BIC(), and in
MM analogue to the S3 semantics of the AIC() methods,

MM - I think lme4 (and lme4a on R-forge) should end up having
MM working  AIC() and BIC() directly for fitted models, instead of
MM having to use
MM AIC(logLik(.))   andBIC(logLik(.))

MM The reason that even the first of this currently does *not*
MM work is that lme4 imports AIC from stats but should do so
MM from stats4.
MM -- I'm about to change that for 'lme4' (and 'lme4a').

MM However, for the BIC case, ... see below


MM - I tend to agree with Gabor (for once! :-)  that
MM basic BIC methods (S3, alas) should move from  nlme to stats.
  
MM For this reason, I'm breaking the rule of do not cross-post
MM for once, and am hereby diverting this thread to R-devel

What I *did* find is that the  stats4   package has already had
all necessary BIC methods -- S4, not S3.

So for lme4 (and R-forge's lme4a),
I've only needed to change the NAMESPACE file to have both

  importFrom(stats4, AIC, BIC, logLik)# so S4 methods are used!

and later 

  export(AIC, BIC, 
 .)

and also add 'stats4' to the 'Imports: ' line in DESCRIPTION.

So both (development versions of) lme4 and lme4a  now have
working AIC() and BIC(),
and I guess Doug could release a new version of lme4 (not ..a)
pretty soon.

I got private e-mail suggestions for extensive S3 methods for
AIC, BIC and logLik.  
I think these should happen more in public (i.e. here, on
R-devel), and while I still advocate that a BIC S3 generic +
simple default methods should be added (as above),
I'd be happy if others joined into the discussion,
(and possibly provided simple patches).

Martin Maechler, ETH Zurich

GaGr On Mon, May 17, 2010 at 9:29 AM, Douglas Bates ba...@stat.wisc.edu 
wrote:
 On Mon, May 17, 2010 at 5:54 AM, Andy Fugard (Work)
 andy.fug...@sbg.ac.at wrote:
 Greetings,
 
 Assuming you're using lmer, here's an example which does what you 
need:
 
 (fm1    - lmer(Reaction ~ Days + (Days|Subject), sleepstudy))
 Linear mixed model fit by REML
 Formula: Reaction ~ Days + (Days | Subject)
   Data: sleepstudy
  AIC  BIC logLik deviance REMLdev
  1756 1775 -871.8     1752    1744
 Random effects:
  Groups   Name        Variance Std.Dev. Corr
  Subject  (Intercept) 612.092  24.7405
          Days         35.072   5.9221  0.066
  Residual             654.941  25.5918
 Number of obs: 180, groups: Subject, 18
 
 Fixed effects:
            Estimate Std. Error t value
 (Intercept)  251.405      6.825   36.84
 Days          10.467      1.546    6.77
 
 Correlation of Fixed Effects:
     (Intr)
 Days -0.138
 
 (fm1fit - summary(fm1)@AICtab)
      AIC      BIC    logLik deviance  REMLdev
  1755.628 1774.786 -871.8141 1751.986 1743.628
 
 fm1fit$BIC
   

Re: [Rd] difficulties with setMethod([ and ...

2010-05-18 Thread Jim Bullard
On Tue, 18 May 2010 10:22:03 +0200, Martin Maechler
maech...@stat.math.ethz.ch wrote:
 Tony Plate tpl...@acm.org
 on Mon, 17 May 2010 20:51:12 -0600 writes:
 
  Jim, yes, I have dealt with that particular challenge that
list(...) 
  throws an error for a call like f(x,,,) where the empty args match
  to a
  ... formal argument.   Here's some fragments of code that I used
to
  cope
  with this:
 
  # to find the empty anon args, must work with the unevaluated dot
  args
  dot.args.uneval - match.call(expand.dots=FALSE)$...
  if (length(dot.args.uneval))
  missing.dot.args - sapply(dot.args.uneval, function(arg) 
  is.symbol(arg)  as.character(arg)==)
  else
  missing.dot.args - logical(0)
  ...
  # Now we can work with evaluated dot args.
  # Can't do dot.args - list(...) because that will
  # stop with an error for missing args.
  dot.args - mapply(dot.args.uneval, missing.dot.args, 
  FUN=function(arg, m) if (!m) eval(arg) else NULL)
 
 I don't have much time at the moment, to delve into Jim's code,
 nor to analyze what exactly Tony's does.
 
 Some notes however which I deem important:
 
 1) My experiece in writing many S4 methods for [  -- with the
Matrix package, but also for 'Rmpfr' -- is that you 
really need to work  with  nargs()
rather than with things like  length(list(...))
 
 2) If you really want to be compatible to the very rich
semantics of S and R subsetting, you need to spend more time
than you anticipate.
 
- negative subscripts, names, logicals

- A[i]  for an array A   where i can be a vector and then the
  array is treated as if it had no dim() attribute
- A[i]  for an array A   where i is a *matrix* with k columns
  where  k - length(dim(A))  --- (k = 2 for matrices)
- A[]

 

I might not get all of the way, but random access and logical subsetting
seem very doable. 


   Are you sure you would not try to use
 setClass('myExample', contains = array, representation = ...)
   rather than your
 setClass('myExample', representation(x = array, ...))
   ?
   You would get all the [ (and other array methods) for free,
   and would only need to specify those methods where 'myExample'
   really differed from array-subsetting.

My example is completely contrived. My back end are hdf5 files and so the
'[' method calls C functions, which at the moment grab contiguous blocks. 


 
 3) Lots of well-testedsetMethod([, )  examples
are in the sources of the Matrix package.
 
There, BTW, I found it useful to use
 
   ## for 'i' in x[i] or A[i,] : (numeric = {double, integer})
   setClassUnion(index, members =  c(numeric, logical,
character))
 
  and then, e.g.,  a simple example method ..
 
   setMethod([, signature(x = denseMatrix, i = index, j =
missing,
  drop = logical),
   function (x, i, j, ..., drop) {
   if((na - nargs()) == 3)
   r - as(x, matrix)[i, drop=drop]
   else if(na == 4)
   r - as(x, matrix)[i, , drop=drop]
   else stop(invalid nargs()= ,na)
   if(is.null(dim(r))) r else as(r, geClass(x))
   })

   The examples in the Rmpfr package are much less and simpler.
 
   To find the methods, for both, use  
   fgrep 'setMethod([' R/*R
   if you are on a decent OS and in side the package source directory.
  

Thanks, I had briefly scanned the matrix package - and the pointer to
nargs is extremely helpful.


thanks again, jim


 --
 Martin Maechler, ETH Zurich
 
  Let me know if you need any further explanation.
 
  Several warnings:
  * I was using this code with S3 generics and methods.
  * There are quite possibly better ways of detecting empty
  unevaluated
  arguments than 'is.symbol(arg)  as.character(arg)=='.
  * You'll probably want to be careful that the eval() in the last
  line is
  using the appropriate environment for your application.
 
  I didn't read your code in detail, so apologies if the above is 
  off-the-point, but your verbal description of the problem and the
  coding
  style and comments in the [ method for myExample triggered my
  memory.
 
  -- Tony Plate
 
  On 05/17/2010 07:48 PM, James Bullard wrote:
  Apologies if I am not understanding something about how things
are
  being
  handled when using S4 methods, but I have been unable to find an
  answer to
  my problem for some time now.
  
  Briefly, I am associating the generic '[' with a class which I
wrote
  (here: myExample). The underlying back-end allows me to read
  contiguous
  slabs, e.g., 1:10, but not c(1, 10). I want to shield the user
from
  this
  infelicity, so I grab the slab and then subset in memory. The
main
  problem
  is with datasets with dim(.)  2. In this case, the '...'
argument
 

Re: [Rd] [R] avoiding reinstall already installed *package*

2010-05-18 Thread Martin Maechler
On Tue, May 18, 2010 at 22:38, William Dunlap wdun...@tibco.com wrote:

  -Original Message-
  From: r-help-boun...@r-project.org
  [mailto:r-help-boun...@r-project.org] On Behalf Of Martin Maechler
  Sent: Tuesday, May 18, 2010 1:25 PM
  To: milton ruser
  Cc: r-h...@r-project.org
  Subject: Re: [R] avoiding reinstall already installed *package*
 
  On Tue, May 18, 2010 at 18:06, milton ruser
  milton.ru...@gmail.com wrote:
 
   Hi Martin,
  
   thanks for your reply, and very thanks for your kind tips
  about package
   and library
   So, I was trying to understand *why* we load packages using
  library().
  
 
  I've started to use and suggest using   require(.) instead
  {as my efforts to introduce  use() or usePackage() *and* deprecating
   library()  where met with strong opposition}

 I hate to get into arguments over function names, but
 I would have thought that require(pkg) would throw
 an error if the required pkg was not available.  It seems
 like require() can be used when pkg is not really required
 but library(pkg) is easiest when pkg is required to
 continue:

   { require(noSuchPackage); functionFromNoSuchPackage() }
  Loading required package: noSuchPackage
  Error: could not find function functionFromNoSuchPackage
  In addition: Warning message:
  In library(package, lib.loc = lib.loc, character.only = TRUE,
 logical.return = TRUE,  :
there is no package called 'noSuchPackage'
   { library(noSuchPackage); functionFromNoSuchPackage() }
  Error in library(noSuchPackage) :
there is no package called 'noSuchPackage'


Well, both require() and library() pretty soon lead to an error if the
package is not available... but I agree that you'd prefer to get the more
helpful error message immediately rather than belatedly.

If that's an issue, the typical use of require() would be

if(require(...)) {
  ...
  ...
} else {
   stop(. not available; do this or that )
}

but instead of stop(..) which can provide a context dependent, customized
error message, you can also work around the absence of the package in other
ways.

a   usePackage()  function would typically use best of library() and
require() ,
maybe  not allowing   usePackage(MASS)
but requiring  usePackage(MASS)
but also working logically e.g. for
   mylme - lme4
   usePackage(mylme)
i.e. not allowing non-standard evaluation.

Bill Dunlap
 Spotfire, TIBCO Software
 wdunlap tibco.com

 
 
   I suggest that developers killl the problem on its root,
  deleting library
   function :-)
   Good to know already installed packages will not be reinstalled.
  
   cheers
  
   milton
  
   On Tue, May 18, 2010 at 12:49 PM, Martin Maechler 
   maech...@stat.math.ethz.ch wrote:
  
   { I've modified the subject; I can't stand it hitting square into
my face ... }
  
mr == milton ruser milton.ru...@gmail.com
on Tue, 18 May 2010 12:36:23 -0300 writes:
  
  mr Dear R-experts,
  mr I am installing new libraries using
  mr install.packages(ggplot2,dependencies=T).
  mr But I perceive that many dependencies are already
  installed. As I
   am using
  mr a low-band internet, how can avoid reinstall
  installed libraries?
  
   There's no problem with installed libraries, as ...
   they DO NOT EXIST.
  
   These are *PACKAGES* !
   Why do you think are you talking about the function
  
install.packages()  
   
  
   ---
   To answer the question you did want to ask:
  
   Do not be afraid:  Depedencies are only installed when needed,
   i.e., no package will be downloaded and installed if it already
   is there.
  
   Martin Maechler, ETH Zurich
  
  mr cheers
  
  mr milton
  
  mr [[alternative HTML version deleted]]
  
   (another thing you should learn to avoid, please)
  
  
  
 
[[alternative HTML version deleted]]
 
  __
  r-h...@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 



[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] R in sandbox/jail (long question)

2010-05-18 Thread Assaf Gordon

Hello,

I have a setup similar to Rweb (  http://www.math.montana.edu/Rweb/ ):
I get R scripts from users and need to execute them in in a safe manner (they 
are executed automatically, without human inspection).

I would like to limit the user's script to reading from STDIN and writing to 
STDOUT/ERR.
Specifically, preventing any kind of interaction with the underlying operating 
system (files, sockets, system(), etc.).

I've found this old thread:
http://r.789695.n4.nabble.com/R-in-a-sandbox-jail-td921991.html
But for technical reasons I'd prefer not to setup a chroot jail.

I have written a patch that adds a --sandbox parameter.
When this parameter is used, the user's script can't create any kind of connection object 
or run system().

My plan is to run R like this:

cat INPUT | R --vanila --slave --sandbox --file SCRIPT.R  OUTPUT

Where 'INPUT' is my chosen input and 'SCRIPT.R' is the script submitted by the 
user.
If the script tries to create a conncetion or run a disabled function, an error 
is printed.

This is the patch:
http://cancan.cshl.edu/labmembers/gordon/files/R_2.11.0_sandbox.patch

So my questions are:
1. Would you be willing to consider this feature for inclusion ?
2. Are there any other 'dangerous' functions I need to intercept ( .Internal 
perhaps ?)

All comments and suggestions are welcomed,
thanks,
  -gordon

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] pretty.Date(): new halfmonth time step

2010-05-18 Thread Felix Andrews
Hi R-devel / R-core

In the new pretty() methods for Date and POSIXct
https://svn.r-project.org/R/trunk/src/library/grDevices/R/prettyDate.R
there is currently a pretty time step listed as 15 DSTdays... but
this actually doesn't line up well with months.

Much better to implement directly what this is trying to do: i.e. to
have a halfmonth time step. This is just the union of two monthly
sequences, one on the 1st of each month and another on the 15th of
each month.

With this in place we have:

prettyDate(as.Date(c(2002-02-02, 2002-05-01)))
# [1] 2002-02-15 2002-03-01 2002-03-15 2002-04-01 2002-04-15
2002-05-01

The proposed patch is attached.

Regards
-Felix


-- 
Felix Andrews / 安福立
Postdoctoral Fellow
Integrated Catchment Assessment and Management (iCAM) Centre
Fenner School of Environment and Society [Bldg 48a]
The Australian National University
Canberra ACT 0200 Australia
M: +61 410 400 963
T: + 61 2 6125 4670
E: felix.andr...@anu.edu.au
CRICOS Provider No. 00120C
-- 
http://www.neurofractal.org/felix/
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R in sandbox/jail (long question)

2010-05-18 Thread Duncan Murdoch

On 18/05/2010 10:38 PM, Assaf Gordon wrote:

Hello,

I have a setup similar to Rweb (  http://www.math.montana.edu/Rweb/ ):
I get R scripts from users and need to execute them in in a safe manner (they 
are executed automatically, without human inspection).

I would like to limit the user's script to reading from STDIN and writing to 
STDOUT/ERR.
Specifically, preventing any kind of interaction with the underlying operating 
system (files, sockets, system(), etc.).

I've found this old thread:
http://r.789695.n4.nabble.com/R-in-a-sandbox-jail-td921991.html
But for technical reasons I'd prefer not to setup a chroot jail.

I have written a patch that adds a --sandbox parameter.
When this parameter is used, the user's script can't create any kind of connection object 
or run system().
  


That sounds too restrictive.  R uses connections internally in various 
places, with no reference to the file system.  It also uses them when 
reading its own files.  So if you stop a user from creating connections, 
you'll somehow need to distinguish between user-created ones and 
internally necessary ones:  not easy.



My plan is to run R like this:

cat INPUT | R --vanila --slave --sandbox --file SCRIPT.R  OUTPUT

Where 'INPUT' is my chosen input and 'SCRIPT.R' is the script submitted by the 
user.
If the script tries to create a conncetion or run a disabled function, an error 
is printed.

This is the patch:
http://cancan.cshl.edu/labmembers/gordon/files/R_2.11.0_sandbox.patch

So my questions are:
1. Would you be willing to consider this feature for inclusion ?
2. Are there any other 'dangerous' functions I need to intercept ( .Internal 
perhaps ?)
  


.Internal is needed by tons of base functions.  So again, you'll need to 
distinguish where the call is coming from, and that's not easy.


Duncan Murdoch

All comments and suggestions are welcomed,
thanks,
   -gordon

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel