Re: [Rd] Problem using model.frame with argument subset in own function

2009-09-10 Thread Greg B. Hill

Gavin,

I ran into the same cryptic invalid subscript type 'closure' message in
a slightly less complicated scenario, and wanted to post the cause in 
my case (the root cause is probably the same either way).

Similarly to your case, I was subsetting a data frame. I had a list
of variable names corresponding to columns in the frame. 
Unfortunately the variable name I had assigned to this list, var, 
coincided with the name of a base package function in R for variance.

When I attempted to subset df[, var], I got the 'closure' error message,
but if I renamed the list of variable names so the collision didn't occur,
e.g. df[, vars] instead of df[, var], it worked as expected.

Sincerely, 
Greg B. Hill


Gavin Simpson wrote:
 
 Dear List,
 
 I am writing a formula method for a function in a package I maintain. I
 want the method to return a data.frame that potentially only contains
 some of the variables in 'data', as specified by the formula.
 
 The problem I am having is in writing the function and wrapping it
 around model.frame. Consider the following data frame:
 
 dat - data.frame(A = runif(10), B = runif(10), C = runif(10))
 
 And the wrapper function:
 
 foo - function(formula, data = NULL, ..., subset = NULL,
 na.action = na.pass) {
 mt - terms(formula, data = data, simplify = TRUE)
 mf - model.frame(formula(mt), data = data, subset = subset,
   na.action = na.action)
 ## real function would do more stuff here and pass mf on to
 ## other functions
 mf
 }
 
 This is how I envisage the function being called. The real world use
 would have a data.frame with tens or hundreds of components where only a
 few need to be excluded. Hence wanting formulas of the form below to
 work.
 
 foo(~ . - B, data = dat)
 
 The aim is to return only columns A and C in an object returned by
 model.frame. However, when I run the above, I get the following error:
 
 foo(~ A + B, data = dat)
 Error in xj[i] : invalid subscript type 'closure'
 
 I've tracked this down to the line in model.frame.default
 
 subset - eval(substitute(subset), data, env)
 
 After evaluating this line, subset contains:
 
 Browse[1] subset
 function (x, ...) 
 UseMethod(subset)
 environment: namespace:base
 
 Not NULL, and hence the error later on when calling the internal
 model.frame code.
 
 So the question is, what am I doing wrong?
 
 If I leave the subset argument out of the definition of foo and rely
 upon the default in model.frame.default, the function works as
 expected. 
 
 Perhaps the question should be, how do I modify foo() to allow it to
 have a formal subset argument, passed to model.frame?
 
 Any other suggestions gratefully accepted.
 
 Thanks in advance,
 
 G
 -- 
 %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
  Dr. Gavin Simpson [t] +44 (0)20 7679 0522
  ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
  Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
  Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
  UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
 %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel
 
 

-- 
View this message in context: 
http://www.nabble.com/Problem-using-model.frame-with-argument-subset-in-own-function-tp24880908p25373059.html
Sent from the R devel mailing list archive at Nabble.com.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Problem using model.frame with argument subset in own function

2009-08-09 Thread Douglas Bates
On Sat, Aug 8, 2009 at 1:31 PM, Gavin Simpsongavin.simp...@ucl.ac.uk wrote:
 Dear List,

 I am writing a formula method for a function in a package I maintain. I
 want the method to return a data.frame that potentially only contains
 some of the variables in 'data', as specified by the formula.

The usual way to call model.frame (the method that Thomas Lumley has
called the standard, non-standard evaluation) is to match the call to
foo, replace the name of the function being called with
as.name(model.frame) and force an evaluation in the parent frame.
it looks like

mf - match.call()
if (missing(data)) data - environment(formula)
## evaluate and install the model frame
m - match(c(formula, data, subset, weights, na.action, offset),
   names(mf), 0)
mf - mf[c(1, m)]
mf$drop.unused.levels - TRUE
mf[[1]] - as.name(model.frame)
fr - eval(mf, parent.frame())

The point of all of this manipulation is to achieve the kind of result
you need where the subset argument is evaluated in the correct
environmnent.

 The problem I am having is in writing the function and wrapping it
 around model.frame. Consider the following data frame:

 dat - data.frame(A = runif(10), B = runif(10), C = runif(10))

 And the wrapper function:

 foo - function(formula, data = NULL, ..., subset = NULL,
                na.action = na.pass) {
    mt - terms(formula, data = data, simplify = TRUE)
    mf - model.frame(formula(mt), data = data, subset = subset,
                      na.action = na.action)
    ## real function would do more stuff here and pass mf on to
    ## other functions
    mf
 }

 This is how I envisage the function being called. The real world use
 would have a data.frame with tens or hundreds of components where only a
 few need to be excluded. Hence wanting formulas of the form below to
 work.

 foo(~ . - B, data = dat)

 The aim is to return only columns A and C in an object returned by
 model.frame. However, when I run the above, I get the following error:

 foo(~ A + B, data = dat)
 Error in xj[i] : invalid subscript type 'closure'

 I've tracked this down to the line in model.frame.default

    subset - eval(substitute(subset), data, env)

 After evaluating this line, subset contains:

 Browse[1] subset
 function (x, ...)
 UseMethod(subset)
 environment: namespace:base

 Not NULL, and hence the error later on when calling the internal
 model.frame code.

 So the question is, what am I doing wrong?

 If I leave the subset argument out of the definition of foo and rely
 upon the default in model.frame.default, the function works as
 expected.

 Perhaps the question should be, how do I modify foo() to allow it to
 have a formal subset argument, passed to model.frame?

 Any other suggestions gratefully accepted.

 Thanks in advance,

 G
 --
 %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
  Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
  ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
  Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
  Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
  UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
 %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Problem using model.frame with argument subset in own function

2009-08-09 Thread Gavin Simpson
On Sun, 2009-08-09 at 11:32 -0500, Douglas Bates wrote:
 On Sat, Aug 8, 2009 at 1:31 PM, Gavin Simpsongavin.simp...@ucl.ac.uk wrote:
  Dear List,
 
  I am writing a formula method for a function in a package I maintain. I
  want the method to return a data.frame that potentially only contains
  some of the variables in 'data', as specified by the formula.
 
 The usual way to call model.frame (the method that Thomas Lumley has
 called the standard, non-standard evaluation) is to match the call to
 foo, replace the name of the function being called with
 as.name(model.frame) and force an evaluation in the parent frame.
 it looks like
 

Thanks Doug. I also received an off-list reply from Brian Ripley
suggesting two alternative approaches.

The bit I was missing was how to manipulate other aspects of the call -
it hadn't clicked that the arguments of the function can be manipulated
by altering the components of the matched call.

In the end I came up with something like:

mf - match.call()
mf[[1]] - as.name(model.frame)
mt - terms(formula, data = data, simplify = TRUE)
mf[[2]] - formula(mt, data = data)
mf$na.action - substitute(na.action)
dots - list(...)
mf[[names(dots)]] - NULL
mf - eval(mf,parent.frame())
tran.default(mf, ...)

which seems to be working in the tests I have been running, allowing me
to pass along some components of the call to model.frame, whilst
reserving ... for the default methods arguments, and also get the
simplified formula.

All the best,

G

 mf - match.call()
 if (missing(data)) data - environment(formula)
 ## evaluate and install the model frame
 m - match(c(formula, data, subset, weights, na.action, 
 offset),
names(mf), 0)
 mf - mf[c(1, m)]
 mf$drop.unused.levels - TRUE
 mf[[1]] - as.name(model.frame)
 fr - eval(mf, parent.frame())
 
 The point of all of this manipulation is to achieve the kind of result
 you need where the subset argument is evaluated in the correct
 environmnent.
 
  The problem I am having is in writing the function and wrapping it
  around model.frame. Consider the following data frame:
 
  dat - data.frame(A = runif(10), B = runif(10), C = runif(10))
 
  And the wrapper function:
 
  foo - function(formula, data = NULL, ..., subset = NULL,
 na.action = na.pass) {
 mt - terms(formula, data = data, simplify = TRUE)
 mf - model.frame(formula(mt), data = data, subset = subset,
   na.action = na.action)
 ## real function would do more stuff here and pass mf on to
 ## other functions
 mf
  }
 
  This is how I envisage the function being called. The real world use
  would have a data.frame with tens or hundreds of components where only a
  few need to be excluded. Hence wanting formulas of the form below to
  work.
 
  foo(~ . - B, data = dat)
 
  The aim is to return only columns A and C in an object returned by
  model.frame. However, when I run the above, I get the following error:
 
  foo(~ A + B, data = dat)
  Error in xj[i] : invalid subscript type 'closure'
 
  I've tracked this down to the line in model.frame.default
 
 subset - eval(substitute(subset), data, env)
 
  After evaluating this line, subset contains:
 
  Browse[1] subset
  function (x, ...)
  UseMethod(subset)
  environment: namespace:base
 
  Not NULL, and hence the error later on when calling the internal
  model.frame code.
 
  So the question is, what am I doing wrong?
 
  If I leave the subset argument out of the definition of foo and rely
  upon the default in model.frame.default, the function works as
  expected.
 
  Perhaps the question should be, how do I modify foo() to allow it to
  have a formal subset argument, passed to model.frame?
 
  Any other suggestions gratefully accepted.
 
  Thanks in advance,
 
  G
  --
  %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
   Dr. Gavin Simpson [t] +44 (0)20 7679 0522
   ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
   Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
   Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
   UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
  %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 
  __
  R-devel@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-devel
 
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-devel@r-project.org mailing list

Re: [Rd] problem using model.frame()

2005-08-19 Thread Gavin Simpson
On Thu, 2005-08-18 at 09:00 -0400, Gabor Grothendieck wrote:
 I think this one is a hard call.  Designing software is a
 series of tradeoffs. Its nice to maintain consistency with
 the R base, but in case of extensions (rather than changing
 behavior) as in this case, the argument against the change
 carries less weight.
 
 The main problems with extensions are (1) that one has to
 remember which functions/packages have which extensions if
 one is to use them and (2) they can interfere with other
 future extensions.
 
 On the other hand, if one is using a particular package a
 lot then convenience features like this may be attractive.
 Also, packages are where authors have the freedom to try out 
 new ideas and new functionality without being constrained.
 
 Perhaps, if the extension in question is added there could be 
 a warning in the help file that this is a convenience feature 
 of this particular package and is not generally available 
 throughout R.

Thanks again Gabor for another useful contribution to this debate. Also
thanks to Martin, Gabor and Jari for their comments, ideas, suggestions
and viewpoints.

I still like y1 ~ y2 (both data frames), but during my bike ride to work
this morning I considered both sides of the argument and my position has
moved towards the R way of doing things - far be it for little old me to
go against years of S-formula tradition. So I'll revert the code back to
accepting y1 ~ ., data = y2 and leave it to throw an error for the rhs
being a data frame case.

Once again, thank you for helping me work through this dilemma.

All the best,

Gav

 On 8/18/05, Gavin Simpson [EMAIL PROTECTED] wrote:
  On Thu, 2005-08-18 at 07:57 +0300, Jari Oksanen wrote:
   On 18 Aug 2005, at 1:49, Gavin Simpson wrote:
  
On Wed, 2005-08-17 at 20:24 +0200, Martin Maechler wrote:
GS == Gavin Simpson [EMAIL PROTECTED]
on Tue, 16 Aug 2005 18:44:23 +0100 writes:
   
GS On Tue, 2005-08-16 at 12:35 -0400, Gabor Grothendieck
GS wrote:
On 8/16/05, Gavin Simpson [EMAIL PROTECTED]
wrote:  On Tue, 2005-08-16 at 11:25 -0400, Gabor
Grothendieck wrote:   It can handle data frames like
this:
   
model.frame(y1)   or   model.frame(~., y1)
   
Thanks Gabor,
   
Yes, I know that works, but I want the function
coca.formula to accept a  formula like this y2 ~ y1,
with both y1 and y2 being data frames. It is
   
The expressions I gave work generally (i.e. lm, glm,
...), not just in model.matrix, so would it be ok if the
user just does this?
   
yourfunction(y2 ~., y1)
   
GS Thanks again Gabor for your comments,
   
GS I'd prefer the y1 ~ y2 as data frames - as this is the
GS most natural way of doing things. I'd like to have (y2
GS ~., y1) as well, and (y2 ~ spp1 + spp2 + spp3, y1) also
GS work - silently without any trouble.
   
I'm sorry, Gavin, I tend to disagree quite a bit.
   
The formula notation has quite a history in the S language, and
AFAIK never was the idea to use data.frames as formula
components, but rather as environments in which formula
components are looked up --- exactly as Gabor has explained.
   
Hi Martin, thanks for your comments,
   
But then one could have a matrix of variables on the rhs of the formula
and it would work - whether this is a documented feature or un-intended
side-effect of matrices being stored as vectors with dims, I don't
know.
   
And whilst the formula may have a long history, a number of packages
have extended the interface to implement a specific feature, which
don't
work with standard functions like lm, glm and friends. I don't see how
what I wanted to achieve is greatly different to that or using a
matrix.
   
To break with such a deeply rooted principle,
you should have very very good reasons, because you're breaking
the concepts on which all other uses of formulae are based.
And this would potentially lead to much confusion of your users,
at least in the way they should learn to think about what
formulae mean.
   
In the end I managed to treat y1 ~ y2 (both data frames) as a special
case, which allows the existing formula notation to work as well, so I
can use y1 ~ y2, y1 ~ ., data = y2, or y1 ~ var + var2, data = y2. This
is what I wanted all along, to extend my interface (not do anything to
R's formulae), but to also work in the traditional sense.
   
The model I am writing code for really is modelling the relationship
between two matrices of data. In one version of the method, there is
real equivalence between both sides of the formula so it would seem odd
to treat the two sides of the formula differently. At least to me ;-)
  
   It seems that I may be responsible for one of these extensions (lhs as
   a data.frame in cca and rda in vegan package). There the response (lhs)
   is multivariate or a multispecies 

Re: [Rd] problem using model.frame()

2005-08-17 Thread Martin Maechler
 GS == Gavin Simpson [EMAIL PROTECTED]
 on Tue, 16 Aug 2005 18:44:23 +0100 writes:

GS On Tue, 2005-08-16 at 12:35 -0400, Gabor Grothendieck
GS wrote:
 On 8/16/05, Gavin Simpson [EMAIL PROTECTED]
 wrote:  On Tue, 2005-08-16 at 11:25 -0400, Gabor
 Grothendieck wrote:   It can handle data frames like
 this:
  
   model.frame(y1)   or   model.frame(~., y1)
  
  Thanks Gabor,
  
  Yes, I know that works, but I want the function
 coca.formula to accept a  formula like this y2 ~ y1,
 with both y1 and y2 being data frames. It is
 
 The expressions I gave work generally (i.e. lm, glm,
 ...), not just in model.matrix, so would it be ok if the
 user just does this?
 
 yourfunction(y2 ~., y1)

GS Thanks again Gabor for your comments,

GS I'd prefer the y1 ~ y2 as data frames - as this is the
GS most natural way of doing things. I'd like to have (y2
GS ~., y1) as well, and (y2 ~ spp1 + spp2 + spp3, y1) also
GS work - silently without any trouble.

I'm sorry, Gavin, I tend to disagree quite a bit.

The formula notation has quite a history in the S language, and
AFAIK never was the idea to use data.frames as formula
components, but rather as environments in which formula
components are looked up --- exactly as Gabor has explained.

To break with such a deeply rooted principle, 
you should have very very good reasons, because you're breaking
the concepts on which all other uses of formulae are based.
And this would potentially lead to much confusion of your users,
at least in the way they should learn to think about what
formulae mean.

Martin


 If it really is important to do it the way you describe,
 are the data frames necessarily numeric? If so you could
 preprocess your formula by placing as.matrix around all
 the variables representing data frames using something
 like this:
 
 https://www.stat.math.ethz.ch/pipermail/r-help/2004-December/061485.html

GS Yes, they are numeric matrices (as data frames). I've
GS looked at this, but I'd prefer to not have to do too
GS much messing with the formula.

 Of course, if they are necessarily numeric maybe they can
 be matrices in the first place?

GS Because read.table etc. produce data.frames and this is
GS the natural way to work with data in R.

but it is also slightly inefficient if they are numeric.
There are places for data frames and for matrices.

Why should it be a problem to use 
M - as.matrix(read.table(..))
?

For large files, it could be quite a bit more efficient,
needing a bit more of code, to
use scan() to read the numeric data directly :

  h1 - scan(..., n=1) ## read variable names
  nc - length(h1)
  a - matrix(scan(, what = numeric(), ...),  
  ncol = nc, dimnames = list(NULL, h1))

maybe this would be useful to be packaged into
a small utility with usage

  read.matrix(...,  type = numeric(), ...)  


GS Following your suggestions, I altered my code to
GS evaluate the rhs of the formula and check if it was of
GS class data.frame. If it is then I stop processing and
GS return it as a data.frame as this point. If not, it
GS eventually gets passed on to model.frame() for it to
GS deal with it.

GS So far - limited testing - it seems to do what I wanted
GS all along. I'm sure there's a gotcha in there somewhere
GS but at least the code runs so I can check for problems
GS against my examples.

GS Right, back to writing documentation...

GS G

  more intuitive, to my mind at least for this particular
 example and  analysis, to specify the formula with a
 data frame on the rhs.
  
  model.frame doesn't work with the formula ~ y1 if the
 object y1, in  the environment when model.frame
 evaluates the formula, is a data.frame.   It works if y1
 is a matrix, however. I'd like to work around this 
 problem, say by creating an environment in which y1 is
 modified to be a  matrix, if possible. Can this be done?
  
  At the moment I have something working by grabbing the
 bits of the  formula and then using get() to grab the
 named object. Of course, this  won't work if someone
 wants to use R's formula interface with the  following
 formula y2 ~ var1 + var2 + var3, data = y1, or to use the
  subset argument common to many formula
 implementations. I'd like to have  the function work in
 as general a manner as possible, so I'm fishing  around
 for potential solutions.
  
  All the best,
  
  Gav
  
  
   On 8/16/05, Gavin Simpson [EMAIL PROTECTED]
 wrote:Hi I'm having a problem with model.frame,
 encapsulated in this example:
   
y1 -
 matrix(c(3,1,0,1,0,1,1,0,0,0,1,0,0,0,1,1,0,1,1,1),   
 nrow = 5, byrow = TRUE)y1 - as.data.frame(y1)  
  rownames(y1) - paste(site, 1:5, sep = 

Re: [Rd] problem using model.frame()

2005-08-17 Thread Gabor Grothendieck
If its just a matter of specifying two data frames how about just
letting the user specify them as the first two arguments without
injecting formulas into it so that any of these are allowed but
data frames are still not allowed in formulas other than in the
data argument:

yourfunction(df1, df2)
yourfunction(y ~ sp1 + sp2)
yourfunction(y ~., df)

This could easily be implemented by having yourfunction be
generic in which case the first one would dispatch
yourfunction.data.frame and the second and third would
dispatch yourfunction.formula .  

On 8/17/05, Gavin Simpson [EMAIL PROTECTED] wrote:
 On Wed, 2005-08-17 at 20:24 +0200, Martin Maechler wrote:
   GS == Gavin Simpson [EMAIL PROTECTED]
   on Tue, 16 Aug 2005 18:44:23 +0100 writes:
 
  GS On Tue, 2005-08-16 at 12:35 -0400, Gabor Grothendieck
  GS wrote:
   On 8/16/05, Gavin Simpson [EMAIL PROTECTED]
   wrote:  On Tue, 2005-08-16 at 11:25 -0400, Gabor
   Grothendieck wrote:   It can handle data frames like
   this:

 model.frame(y1)   or   model.frame(~., y1)
   
Thanks Gabor,
   
Yes, I know that works, but I want the function
   coca.formula to accept a  formula like this y2 ~ y1,
   with both y1 and y2 being data frames. It is
  
   The expressions I gave work generally (i.e. lm, glm,
   ...), not just in model.matrix, so would it be ok if the
   user just does this?
  
   yourfunction(y2 ~., y1)
 
  GS Thanks again Gabor for your comments,
 
  GS I'd prefer the y1 ~ y2 as data frames - as this is the
  GS most natural way of doing things. I'd like to have (y2
  GS ~., y1) as well, and (y2 ~ spp1 + spp2 + spp3, y1) also
  GS work - silently without any trouble.
 
  I'm sorry, Gavin, I tend to disagree quite a bit.
 
  The formula notation has quite a history in the S language, and
  AFAIK never was the idea to use data.frames as formula
  components, but rather as environments in which formula
  components are looked up --- exactly as Gabor has explained.
 
 Hi Martin, thanks for your comments,
 
 But then one could have a matrix of variables on the rhs of the formula
 and it would work - whether this is a documented feature or un-intended
 side-effect of matrices being stored as vectors with dims, I don't know.
 
 And whilst the formula may have a long history, a number of packages
 have extended the interface to implement a specific feature, which don't
 work with standard functions like lm, glm and friends. I don't see how
 what I wanted to achieve is greatly different to that or using a matrix.
 
  To break with such a deeply rooted principle,
  you should have very very good reasons, because you're breaking
  the concepts on which all other uses of formulae are based.
  And this would potentially lead to much confusion of your users,
  at least in the way they should learn to think about what
  formulae mean.
 
 In the end I managed to treat y1 ~ y2 (both data frames) as a special
 case, which allows the existing formula notation to work as well, so I
 can use y1 ~ y2, y1 ~ ., data = y2, or y1 ~ var + var2, data = y2. This
 is what I wanted all along, to extend my interface (not do anything to
 R's formulae), but to also work in the traditional sense.
 
 The model I am writing code for really is modelling the relationship
 between two matrices of data. In one version of the method, there is
 real equivalence between both sides of the formula so it would seem odd
 to treat the two sides of the formula differently. At least to me ;-)
 
  Martin
 
 
   If it really is important to do it the way you describe,
   are the data frames necessarily numeric? If so you could
   preprocess your formula by placing as.matrix around all
   the variables representing data frames using something
   like this:
  
   
  https://www.stat.math.ethz.ch/pipermail/r-help/2004-December/061485.html
 
  GS Yes, they are numeric matrices (as data frames). I've
  GS looked at this, but I'd prefer to not have to do too
  GS much messing with the formula.
 
   Of course, if they are necessarily numeric maybe they can
   be matrices in the first place?
 
  GS Because read.table etc. produce data.frames and this is
  GS the natural way to work with data in R.
 
  but it is also slightly inefficient if they are numeric.
  There are places for data frames and for matrices.
 
 I agree - and in the code I've written, y1 and y2 quickly get coerced to
 matrices before the real number crunching begins.
 
 However, all the other R modelling functions I have used work with
 data.frames. Arguably, it could cause more confusion to write a function
 that looked, walked and quacked like an R modelling function but needed
 the user to apply an extra step to use - a step not usually required
 under normal R usage.
 
 All the best,
 
 Gav
 
  Why should it be a problem to use
  M - as.matrix(read.table(..))
  ?
 
  

Re: [Rd] problem using model.frame()

2005-08-17 Thread Jari Oksanen

On 18 Aug 2005, at 1:49, Gavin Simpson wrote:

 On Wed, 2005-08-17 at 20:24 +0200, Martin Maechler wrote:
 GS == Gavin Simpson [EMAIL PROTECTED]
 on Tue, 16 Aug 2005 18:44:23 +0100 writes:

 GS On Tue, 2005-08-16 at 12:35 -0400, Gabor Grothendieck
 GS wrote:
 On 8/16/05, Gavin Simpson [EMAIL PROTECTED]
 wrote:  On Tue, 2005-08-16 at 11:25 -0400, Gabor
 Grothendieck wrote:   It can handle data frames like
 this:

 model.frame(y1)   or   model.frame(~., y1)

 Thanks Gabor,

 Yes, I know that works, but I want the function
 coca.formula to accept a  formula like this y2 ~ y1,
 with both y1 and y2 being data frames. It is

 The expressions I gave work generally (i.e. lm, glm,
 ...), not just in model.matrix, so would it be ok if the
 user just does this?

 yourfunction(y2 ~., y1)

 GS Thanks again Gabor for your comments,

 GS I'd prefer the y1 ~ y2 as data frames - as this is the
 GS most natural way of doing things. I'd like to have (y2
 GS ~., y1) as well, and (y2 ~ spp1 + spp2 + spp3, y1) also
 GS work - silently without any trouble.

 I'm sorry, Gavin, I tend to disagree quite a bit.

 The formula notation has quite a history in the S language, and
 AFAIK never was the idea to use data.frames as formula
 components, but rather as environments in which formula
 components are looked up --- exactly as Gabor has explained.

 Hi Martin, thanks for your comments,

 But then one could have a matrix of variables on the rhs of the formula
 and it would work - whether this is a documented feature or un-intended
 side-effect of matrices being stored as vectors with dims, I don't 
 know.

 And whilst the formula may have a long history, a number of packages
 have extended the interface to implement a specific feature, which 
 don't
 work with standard functions like lm, glm and friends. I don't see how
 what I wanted to achieve is greatly different to that or using a 
 matrix.

 To break with such a deeply rooted principle,
 you should have very very good reasons, because you're breaking
 the concepts on which all other uses of formulae are based.
 And this would potentially lead to much confusion of your users,
 at least in the way they should learn to think about what
 formulae mean.

 In the end I managed to treat y1 ~ y2 (both data frames) as a special
 case, which allows the existing formula notation to work as well, so I
 can use y1 ~ y2, y1 ~ ., data = y2, or y1 ~ var + var2, data = y2. This
 is what I wanted all along, to extend my interface (not do anything to
 R's formulae), but to also work in the traditional sense.

 The model I am writing code for really is modelling the relationship
 between two matrices of data. In one version of the method, there is
 real equivalence between both sides of the formula so it would seem odd
 to treat the two sides of the formula differently. At least to me ;-)

It seems that I may be responsible for one of these extensions (lhs as 
a data.frame in cca and rda in vegan package). There the response (lhs) 
is multivariate or a multispecies community, and you must take that as 
a whole without manipulation (and if you tried using VGAM you see there 
really is painful to define lhs with, say, 127 elements). However, in 
general you shouldn't use models where you use all the 'explanatory' 
variables (rhs) that yo happen to have by accident. So much bad science 
has been created with that approach even in your field, Gav. The whole 
idea of formula is the ability to choose from candidate variables. That 
is: to build a model. Therefore you have one-sided formulae in prcomp() 
and princomp(): you can say prcomp(~ x1 + log(x2) +x4, data) or 
prcomp(~ . - x3, data). I think you should try to keep it so. Do 
instead like Gabor suggested: you could have a function coca.default or 
coca.matrix with interface:

coca.matrix(matx, maty, matz) -- or you can name this as coca.default.

and coca.formula which essentially parses your formula and returns a 
list of matrices you need:

coca.formula - function(formula, data)
{
  matricesout - parsemyformula(formula, data)
 coca(matricesout$matx, matricesout$maty, matricesoutz)
}
Then you need the generic: coca - function(...) UseMethod(coca) and 
it's done (but fails in R CMD check unless you add ... in all 
specific functions...). The real work is always done in coca.matrix (or 
coca.default), and the others just chew your data into suitable form 
for your workhorse.

If then somebody thinks that they need all possible variables as 
'explanatory' variables (or perhaps constraints in your case), they 
just call the function as

coca(matx, maty, matz)

And if you have coca.data.frame they don't need 'quacking' with extra 
steps:

coca.data.frame - function(dfx, dfy dfz) coca(as.matrix(dfx), 
as.matrix(dfy), as.matrix(dfz)).

This you call as coca(dfx, dfy, dfz) and there you go.

The essential feature in formula is the ability to define the model. 
Don't give it away.

cheers, jazza
--
Jari Oksanen, 

Re: [Rd] problem using model.frame()

2005-08-16 Thread Gavin Simpson
On Tue, 2005-08-16 at 11:25 -0400, Gabor Grothendieck wrote:
 It can handle data frames like this:
 
   model.frame(y1)
 or
   model.frame(~., y1)

Thanks Gabor,

Yes, I know that works, but I want the function coca.formula to accept a
formula like this y2 ~ y1, with both y1 and y2 being data frames. It is
more intuitive, to my mind at least for this particular example and
analysis, to specify the formula with a data frame on the rhs.

model.frame doesn't work with the formula ~ y1 if the object y1, in
the environment when model.frame evaluates the formula, is a data.frame.
It works if y1 is a matrix, however. I'd like to work around this
problem, say by creating an environment in which y1 is modified to be a
matrix, if possible. Can this be done?

At the moment I have something working by grabbing the bits of the
formula and then using get() to grab the named object. Of course, this
won't work if someone wants to use R's formula interface with the
following formula y2 ~ var1 + var2 + var3, data = y1, or to use the
subset argument common to many formula implementations. I'd like to have
the function work in as general a manner as possible, so I'm fishing
around for potential solutions.

All the best,

Gav 

 
 On 8/16/05, Gavin Simpson [EMAIL PROTECTED] wrote:
  Hi I'm having a problem with model.frame, encapsulated in this example:
  
  y1 - matrix(c(3,1,0,1,0,1,1,0,0,0,1,0,0,0,1,1,0,1,1,1),
  nrow = 5, byrow = TRUE)
  y1 - as.data.frame(y1)
  rownames(y1) - paste(site, 1:5, sep = )
  colnames(y1) - paste(spp, 1:4, sep = )
  y1
  
  model.frame(~ y1)
  Error in model.frame(formula, rownames, variables, varnames, extras, 
  extranames,  :
 invalid variable type
  
  temp - as.matrix(y1)
  model.frame(~ temp)
   temp.spp1 temp.spp2 temp.spp3 temp.spp4
  1 3 1 0 1
  2 0 1 1 0
  3 0 0 1 0
  4 0 0 1 1
  5 0 1 1 1
  
  Ideally the above wouldn't have names like temp.var1, temp.var2, but one
  could deal with that later.
  
  I have tracked down the source of the error message to line 1330 in
  model.c - here I'm stumped as I don't know any C, but it looks as if the
  code is looping over the variables in the formula and checking of they
  are the right type. So a matrix of variables gets through, but a
  data.frame doesn't.
  
  It would be good if model.frame could cope with data.frames in formulae,
  but seeing as I am incapable of providing a patch, is there a way around
  this problem?
  
  Below is the head of the function I am currently using, including the
  function for parsing the formula - borrowed and hacked from
  ordiParseFormula() in package vegan.
  
  I can work out the class of the rhs of the forumla. Is there a way to
  create a suitable environment for the data argument of parseFormula()
  such that it contains the rhs dataframe coerced to a matrix, which then
  should get through model.frame.default without error? How would I go
  about manipulating/creating such an environment? Any other ideas?
  
  Thanks in advance
  
  Gav
  
  coca.formula - function(formula, method = c(predictive, symmetric),
  reg.method = c(simpls, eigen), weights = NULL,
  n.axes = NULL, symmetric = FALSE, data)
   {
 parseFormula - function (formula, data)
   {
 browser()
 Terms - terms(formula, Condition, data = data)
 flapart - fla - formula - formula(Terms, width.cutoff = 500)
 specdata - formula[[2]]
 X - eval(specdata, data, parent.frame())
 X - as.matrix(X)
 formula[[2]] - NULL
 if (formula[[2]] == 1 || formula[[2]] == 0)
   Y - NULL
 else {
   mf - model.frame(formula, data, na.action = na.fail)
   Y - model.matrix(formula, mf)
   if (any(colnames(Y) == (Intercept))) {
 xint - which(colnames(Y) == (Intercept))
 Y - Y[, -xint, drop = FALSE]
   }
 }
 list(X = X, Y = Y)
   }
 if (missing(data))
   data - parent.frame()
 #browser()
 dat - parseFormula(formula, data)
  
  --
  %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
  Gavin Simpson [T] +44 (0)20 7679 5522
  ENSIS Research Fellow [F] +44 (0)20 7679 7565
  ENSIS Ltd.  ECRC [E] gavin.simpsonATNOSPAMucl.ac.uk
  UCL Department of Geography   [W] http://www.ucl.ac.uk/~ucfagls/cv/
  26 Bedford Way[W] http://www.ucl.ac.uk/~ucfagls/
  London.  WC1H 0AP.
  %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
  
  __
  R-devel@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-devel
 
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Gavin Simpson

Re: [Rd] problem using model.frame()

2005-08-16 Thread Gabor Grothendieck
On 8/16/05, Gavin Simpson [EMAIL PROTECTED] wrote:
 On Tue, 2005-08-16 at 11:25 -0400, Gabor Grothendieck wrote:
  It can handle data frames like this:
 
model.frame(y1)
  or
model.frame(~., y1)
 
 Thanks Gabor,
 
 Yes, I know that works, but I want the function coca.formula to accept a
 formula like this y2 ~ y1, with both y1 and y2 being data frames. It is

The expressions I gave work generally (i.e. lm, glm, ...), not just in 
model.matrix, so would it be ok if the user just does this?

yourfunction(y2 ~., y1)

If it really is important to do it the way you describe, are the data 
frames necessarily numeric? If so you could preprocess your formula 
by placing as.matrix around all the variables representing data frames 
using something like this:

https://www.stat.math.ethz.ch/pipermail/r-help/2004-December/061485.html

Of course, if they are necessarily numeric maybe they can be matrices in
the first place?

 more intuitive, to my mind at least for this particular example and
 analysis, to specify the formula with a data frame on the rhs.
 
 model.frame doesn't work with the formula ~ y1 if the object y1, in
 the environment when model.frame evaluates the formula, is a data.frame.
 It works if y1 is a matrix, however. I'd like to work around this
 problem, say by creating an environment in which y1 is modified to be a
 matrix, if possible. Can this be done?
 
 At the moment I have something working by grabbing the bits of the
 formula and then using get() to grab the named object. Of course, this
 won't work if someone wants to use R's formula interface with the
 following formula y2 ~ var1 + var2 + var3, data = y1, or to use the
 subset argument common to many formula implementations. I'd like to have
 the function work in as general a manner as possible, so I'm fishing
 around for potential solutions.
 
 All the best,
 
 Gav
 
 
  On 8/16/05, Gavin Simpson [EMAIL PROTECTED] wrote:
   Hi I'm having a problem with model.frame, encapsulated in this example:
  
   y1 - matrix(c(3,1,0,1,0,1,1,0,0,0,1,0,0,0,1,1,0,1,1,1),
   nrow = 5, byrow = TRUE)
   y1 - as.data.frame(y1)
   rownames(y1) - paste(site, 1:5, sep = )
   colnames(y1) - paste(spp, 1:4, sep = )
   y1
  
   model.frame(~ y1)
   Error in model.frame(formula, rownames, variables, varnames, extras, 
   extranames,  :
  invalid variable type
  
   temp - as.matrix(y1)
   model.frame(~ temp)
temp.spp1 temp.spp2 temp.spp3 temp.spp4
   1 3 1 0 1
   2 0 1 1 0
   3 0 0 1 0
   4 0 0 1 1
   5 0 1 1 1
  
   Ideally the above wouldn't have names like temp.var1, temp.var2, but one
   could deal with that later.
  
   I have tracked down the source of the error message to line 1330 in
   model.c - here I'm stumped as I don't know any C, but it looks as if the
   code is looping over the variables in the formula and checking of they
   are the right type. So a matrix of variables gets through, but a
   data.frame doesn't.
  
   It would be good if model.frame could cope with data.frames in formulae,
   but seeing as I am incapable of providing a patch, is there a way around
   this problem?
  
   Below is the head of the function I am currently using, including the
   function for parsing the formula - borrowed and hacked from
   ordiParseFormula() in package vegan.
  
   I can work out the class of the rhs of the forumla. Is there a way to
   create a suitable environment for the data argument of parseFormula()
   such that it contains the rhs dataframe coerced to a matrix, which then
   should get through model.frame.default without error? How would I go
   about manipulating/creating such an environment? Any other ideas?
  
   Thanks in advance
  
   Gav
  
   coca.formula - function(formula, method = c(predictive, symmetric),
   reg.method = c(simpls, eigen), weights = NULL,
   n.axes = NULL, symmetric = FALSE, data)
{
  parseFormula - function (formula, data)
{
  browser()
  Terms - terms(formula, Condition, data = data)
  flapart - fla - formula - formula(Terms, width.cutoff = 500)
  specdata - formula[[2]]
  X - eval(specdata, data, parent.frame())
  X - as.matrix(X)
  formula[[2]] - NULL
  if (formula[[2]] == 1 || formula[[2]] == 0)
Y - NULL
  else {
mf - model.frame(formula, data, na.action = na.fail)
Y - model.matrix(formula, mf)
if (any(colnames(Y) == (Intercept))) {
  xint - which(colnames(Y) == (Intercept))
  Y - Y[, -xint, drop = FALSE]
}
  }
  list(X = X, Y = Y)
}
  if (missing(data))
data - parent.frame()
  #browser()
  dat - parseFormula(formula, data)
  
   --