Re: [Rd] Problem using model.frame with argument subset in own function

2009-09-10 Thread Greg B. Hill

Gavin,

I ran into the same cryptic "invalid subscript type 'closure'" message in
a slightly less complicated scenario, and wanted to post the cause in 
my case (the root cause is probably the same either way).

Similarly to your case, I was subsetting a data frame. I had a list
of variable names corresponding to columns in the frame. 
Unfortunately the variable name I had assigned to this list, var, 
coincided with the name of a base package function in R for variance.

When I attempted to subset df[, var], I got the 'closure' error message,
but if I renamed the list of variable names so the collision didn't occur,
e.g. df[, vars] instead of df[, var], it worked as expected.

Sincerely, 
Greg B. Hill


Gavin Simpson wrote:
> 
> Dear List,
> 
> I am writing a formula method for a function in a package I maintain. I
> want the method to return a data.frame that potentially only contains
> some of the variables in 'data', as specified by the formula.
> 
> The problem I am having is in writing the function and wrapping it
> around model.frame. Consider the following data frame:
> 
> dat <- data.frame(A = runif(10), B = runif(10), C = runif(10))
> 
> And the wrapper function:
> 
> foo <- function(formula, data = NULL, ..., subset = NULL,
> na.action = na.pass) {
> mt <- terms(formula, data = data, simplify = TRUE)
> mf <- model.frame(formula(mt), data = data, subset = subset,
>   na.action = na.action)
> ## real function would do more stuff here and pass mf on to
> ## other functions
> mf
> }
> 
> This is how I envisage the function being called. The real world use
> would have a data.frame with tens or hundreds of components where only a
> few need to be excluded. Hence wanting formulas of the form below to
> work.
> 
> foo(~ . - B, data = dat)
> 
> The aim is to return only columns A and C in an object returned by
> model.frame. However, when I run the above, I get the following error:
> 
>> foo(~ A + B, data = dat)
> Error in xj[i] : invalid subscript type 'closure'
> 
> I've tracked this down to the line in model.frame.default
> 
> subset <- eval(substitute(subset), data, env)
> 
> After evaluating this line, subset contains:
> 
> Browse[1]> subset
> function (x, ...) 
> UseMethod("subset")
> 
> 
> Not NULL, and hence the error later on when calling the internal
> model.frame code.
> 
> So the question is, what am I doing wrong?
> 
> If I leave the subset argument out of the definition of foo and rely
> upon the default in model.frame.default, the function works as
> expected. 
> 
> Perhaps the question should be, how do I modify foo() to allow it to
> have a formal subset argument, passed to model.frame?
> 
> Any other suggestions gratefully accepted.
> 
> Thanks in advance,
> 
> G
> -- 
> %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
>  Dr. Gavin Simpson [t] +44 (0)20 7679 0522
>  ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
>  Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
>  Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
>  UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
> %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Problem-using-model.frame-with-argument-subset-in-own-function-tp24880908p25373059.html
Sent from the R devel mailing list archive at Nabble.com.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Problem using model.frame with argument subset in own function

2009-08-09 Thread Gavin Simpson
On Sun, 2009-08-09 at 11:32 -0500, Douglas Bates wrote:
> On Sat, Aug 8, 2009 at 1:31 PM, Gavin Simpson wrote:
> > Dear List,
> 
> > I am writing a formula method for a function in a package I maintain. I
> > want the method to return a data.frame that potentially only contains
> > some of the variables in 'data', as specified by the formula.
> 
> The usual way to call model.frame (the method that Thomas Lumley has
> called "the standard, non-standard evaluation) is to match the call to
> foo, replace the name of the function being called with
> as.name("model.frame") and force an evaluation in the parent frame.
> it looks like
> 

Thanks Doug. I also received an off-list reply from Brian Ripley
suggesting two alternative approaches.

The bit I was missing was how to manipulate other aspects of the call -
it hadn't clicked that the arguments of the function can be manipulated
by altering the components of the matched call.

In the end I came up with something like:

mf <- match.call()
mf[[1]] <- as.name("model.frame")
mt <- terms(formula, data = data, simplify = TRUE)
mf[[2]] <- formula(mt, data = data)
mf$na.action <- substitute(na.action)
dots <- list(...)
mf[[names(dots)]] <- NULL
mf <- eval(mf,parent.frame())
tran.default(mf, ...)

which seems to be working in the tests I have been running, allowing me
to pass along some components of the call to model.frame, whilst
reserving ... for the default methods arguments, and also get the
simplified formula.

All the best,

G

> mf <- match.call()
> if (missing(data)) data <- environment(formula)
> ## evaluate and install the model frame
> m <- match(c("formula", "data", "subset", "weights", "na.action", 
> "offset"),
>names(mf), 0)
> mf <- mf[c(1, m)]
> mf$drop.unused.levels <- TRUE
> mf[[1]] <- as.name("model.frame")
> fr <- eval(mf, parent.frame())
> 
> The point of all of this manipulation is to achieve the kind of result
> you need where the subset argument is evaluated in the correct
> environmnent.
> 
> > The problem I am having is in writing the function and wrapping it
> > around model.frame. Consider the following data frame:
> >
> > dat <- data.frame(A = runif(10), B = runif(10), C = runif(10))
> >
> > And the wrapper function:
> >
> > foo <- function(formula, data = NULL, ..., subset = NULL,
> >na.action = na.pass) {
> >mt <- terms(formula, data = data, simplify = TRUE)
> >mf <- model.frame(formula(mt), data = data, subset = subset,
> >  na.action = na.action)
> >## real function would do more stuff here and pass mf on to
> >## other functions
> >mf
> > }
> >
> > This is how I envisage the function being called. The real world use
> > would have a data.frame with tens or hundreds of components where only a
> > few need to be excluded. Hence wanting formulas of the form below to
> > work.
> >
> > foo(~ . - B, data = dat)
> >
> > The aim is to return only columns A and C in an object returned by
> > model.frame. However, when I run the above, I get the following error:
> >
> >> foo(~ A + B, data = dat)
> > Error in xj[i] : invalid subscript type 'closure'
> >
> > I've tracked this down to the line in model.frame.default
> >
> >subset <- eval(substitute(subset), data, env)
> >
> > After evaluating this line, subset contains:
> >
> > Browse[1]> subset
> > function (x, ...)
> > UseMethod("subset")
> > 
> >
> > Not NULL, and hence the error later on when calling the internal
> > model.frame code.
> >
> > So the question is, what am I doing wrong?
> >
> > If I leave the subset argument out of the definition of foo and rely
> > upon the default in model.frame.default, the function works as
> > expected.
> >
> > Perhaps the question should be, how do I modify foo() to allow it to
> > have a formal subset argument, passed to model.frame?
> >
> > Any other suggestions gratefully accepted.
> >
> > Thanks in advance,
> >
> > G
> > --
> > %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
> >  Dr. Gavin Simpson [t] +44 (0)20 7679 0522
> >  ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
> >  Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
> >  Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
> >  UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
> > %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%

Re: [Rd] Problem using model.frame with argument subset in own function

2009-08-09 Thread Douglas Bates
On Sat, Aug 8, 2009 at 1:31 PM, Gavin Simpson wrote:
> Dear List,

> I am writing a formula method for a function in a package I maintain. I
> want the method to return a data.frame that potentially only contains
> some of the variables in 'data', as specified by the formula.

The usual way to call model.frame (the method that Thomas Lumley has
called "the standard, non-standard evaluation) is to match the call to
foo, replace the name of the function being called with
as.name("model.frame") and force an evaluation in the parent frame.
it looks like

mf <- match.call()
if (missing(data)) data <- environment(formula)
## evaluate and install the model frame
m <- match(c("formula", "data", "subset", "weights", "na.action", "offset"),
   names(mf), 0)
mf <- mf[c(1, m)]
mf$drop.unused.levels <- TRUE
mf[[1]] <- as.name("model.frame")
fr <- eval(mf, parent.frame())

The point of all of this manipulation is to achieve the kind of result
you need where the subset argument is evaluated in the correct
environmnent.

> The problem I am having is in writing the function and wrapping it
> around model.frame. Consider the following data frame:
>
> dat <- data.frame(A = runif(10), B = runif(10), C = runif(10))
>
> And the wrapper function:
>
> foo <- function(formula, data = NULL, ..., subset = NULL,
>                na.action = na.pass) {
>    mt <- terms(formula, data = data, simplify = TRUE)
>    mf <- model.frame(formula(mt), data = data, subset = subset,
>                      na.action = na.action)
>    ## real function would do more stuff here and pass mf on to
>    ## other functions
>    mf
> }
>
> This is how I envisage the function being called. The real world use
> would have a data.frame with tens or hundreds of components where only a
> few need to be excluded. Hence wanting formulas of the form below to
> work.
>
> foo(~ . - B, data = dat)
>
> The aim is to return only columns A and C in an object returned by
> model.frame. However, when I run the above, I get the following error:
>
>> foo(~ A + B, data = dat)
> Error in xj[i] : invalid subscript type 'closure'
>
> I've tracked this down to the line in model.frame.default
>
>    subset <- eval(substitute(subset), data, env)
>
> After evaluating this line, subset contains:
>
> Browse[1]> subset
> function (x, ...)
> UseMethod("subset")
> 
>
> Not NULL, and hence the error later on when calling the internal
> model.frame code.
>
> So the question is, what am I doing wrong?
>
> If I leave the subset argument out of the definition of foo and rely
> upon the default in model.frame.default, the function works as
> expected.
>
> Perhaps the question should be, how do I modify foo() to allow it to
> have a formal subset argument, passed to model.frame?
>
> Any other suggestions gratefully accepted.
>
> Thanks in advance,
>
> G
> --
> %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
>  Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
>  ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
>  Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
>  Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
>  UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
> %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] problem using model.frame()

2005-08-19 Thread Gavin Simpson
On Thu, 2005-08-18 at 09:00 -0400, Gabor Grothendieck wrote:
> I think this one is a hard call.  Designing software is a
> series of tradeoffs. Its nice to maintain consistency with
> the R base, but in case of extensions (rather than changing
> behavior) as in this case, the argument against the change
> carries less weight.
> 
> The main problems with extensions are (1) that one has to
> remember which functions/packages have which extensions if
> one is to use them and (2) they can interfere with other
> future extensions.
> 
> On the other hand, if one is using a particular package a
> lot then convenience features like this may be attractive.
> Also, packages are where authors have the freedom to try out 
> new ideas and new functionality without being constrained.
> 
> Perhaps, if the extension in question is added there could be 
> a warning in the help file that this is a convenience feature 
> of this particular package and is not generally available 
> throughout R.

Thanks again Gabor for another useful contribution to this debate. Also
thanks to Martin, Gabor and Jari for their comments, ideas, suggestions
and viewpoints.

I still like y1 ~ y2 (both data frames), but during my bike ride to work
this morning I considered both sides of the argument and my position has
moved towards the R way of doing things - far be it for little old me to
go against years of S-formula tradition. So I'll revert the code back to
accepting y1 ~ ., data = y2 and leave it to throw an error for the rhs
being a data frame case.

Once again, thank you for helping me work through this dilemma.

All the best,

Gav

> On 8/18/05, Gavin Simpson <[EMAIL PROTECTED]> wrote:
> > On Thu, 2005-08-18 at 07:57 +0300, Jari Oksanen wrote:
> > > On 18 Aug 2005, at 1:49, Gavin Simpson wrote:
> > >
> > > > On Wed, 2005-08-17 at 20:24 +0200, Martin Maechler wrote:
> > > >>> "GS" == Gavin Simpson <[EMAIL PROTECTED]>
> > > >>> on Tue, 16 Aug 2005 18:44:23 +0100 writes:
> > > >>
> > > >> GS> On Tue, 2005-08-16 at 12:35 -0400, Gabor Grothendieck
> > > >> GS> wrote:
> > >  On 8/16/05, Gavin Simpson <[EMAIL PROTECTED]>
> > >  wrote: > On Tue, 2005-08-16 at 11:25 -0400, Gabor
> > >  Grothendieck wrote: > > It can handle data frames like
> > >  this:
> > > >>
> > > >> model.frame(y1) > > or > > model.frame(~., y1)
> > > >
> > > > Thanks Gabor,
> > > >
> > > > Yes, I know that works, but I want the function
> > >  coca.formula to accept a > formula like this y2 ~ y1,
> > >  with both y1 and y2 being data frames. It is
> > > 
> > >  The expressions I gave work generally (i.e. lm, glm,
> > >  ...), not just in model.matrix, so would it be ok if the
> > >  user just does this?
> > > 
> > >  yourfunction(y2 ~., y1)
> > > >>
> > > >> GS> Thanks again Gabor for your comments,
> > > >>
> > > >> GS> I'd prefer the y1 ~ y2 as data frames - as this is the
> > > >> GS> most natural way of doing things. I'd like to have (y2
> > > >> GS> ~., y1) as well, and (y2 ~ spp1 + spp2 + spp3, y1) also
> > > >> GS> work - silently without any trouble.
> > > >>
> > > >> I'm sorry, Gavin, I tend to disagree quite a bit.
> > > >>
> > > >> The formula notation has quite a history in the S language, and
> > > >> AFAIK never was the idea to use data.frames as formula
> > > >> components, but rather as "environments" in which formula
> > > >> components are looked up --- exactly as Gabor has explained.
> > > >
> > > > Hi Martin, thanks for your comments,
> > > >
> > > > But then one could have a matrix of variables on the rhs of the formula
> > > > and it would work - whether this is a documented feature or un-intended
> > > > side-effect of matrices being stored as vectors with dims, I don't
> > > > know.
> > > >
> > > > And whilst the formula may have a long history, a number of packages
> > > > have extended the interface to implement a specific feature, which
> > > > don't
> > > > work with standard functions like lm, glm and friends. I don't see how
> > > > what I wanted to achieve is greatly different to that or using a
> > > > matrix.
> > > >
> > > >> To break with such a deeply rooted principle,
> > > >> you should have very very good reasons, because you're breaking
> > > >> the concepts on which all other uses of formulae are based.
> > > >> And this would potentially lead to much confusion of your users,
> > > >> at least in the way they should learn to think about what
> > > >> formulae mean.
> > > >
> > > > In the end I managed to treat y1 ~ y2 (both data frames) as a special
> > > > case, which allows the existing formula notation to work as well, so I
> > > > can use y1 ~ y2, y1 ~ ., data = y2, or y1 ~ var + var2, data = y2. This
> > > > is what I wanted all along, to extend my interface (not do anything to
> > > > R's formulae), but to also work in the traditional sense.
> > > >
> > > > The model I am writing code for really is modelling the relation

Re: [Rd] problem using model.frame()

2005-08-18 Thread Gavin Simpson
On Thu, 2005-08-18 at 07:57 +0300, Jari Oksanen wrote:
> On 18 Aug 2005, at 1:49, Gavin Simpson wrote:
> 
> > On Wed, 2005-08-17 at 20:24 +0200, Martin Maechler wrote:
> >>> "GS" == Gavin Simpson <[EMAIL PROTECTED]>
> >>> on Tue, 16 Aug 2005 18:44:23 +0100 writes:
> >>
> >> GS> On Tue, 2005-08-16 at 12:35 -0400, Gabor Grothendieck
> >> GS> wrote:
>  On 8/16/05, Gavin Simpson <[EMAIL PROTECTED]>
>  wrote: > On Tue, 2005-08-16 at 11:25 -0400, Gabor
>  Grothendieck wrote: > > It can handle data frames like
>  this:
> >>
> >> model.frame(y1) > > or > > model.frame(~., y1)
> >
> > Thanks Gabor,
> >
> > Yes, I know that works, but I want the function
>  coca.formula to accept a > formula like this y2 ~ y1,
>  with both y1 and y2 being data frames. It is
> 
>  The expressions I gave work generally (i.e. lm, glm,
>  ...), not just in model.matrix, so would it be ok if the
>  user just does this?
> 
>  yourfunction(y2 ~., y1)
> >>
> >> GS> Thanks again Gabor for your comments,
> >>
> >> GS> I'd prefer the y1 ~ y2 as data frames - as this is the
> >> GS> most natural way of doing things. I'd like to have (y2
> >> GS> ~., y1) as well, and (y2 ~ spp1 + spp2 + spp3, y1) also
> >> GS> work - silently without any trouble.
> >>
> >> I'm sorry, Gavin, I tend to disagree quite a bit.
> >>
> >> The formula notation has quite a history in the S language, and
> >> AFAIK never was the idea to use data.frames as formula
> >> components, but rather as "environments" in which formula
> >> components are looked up --- exactly as Gabor has explained.
> >
> > Hi Martin, thanks for your comments,
> >
> > But then one could have a matrix of variables on the rhs of the formula
> > and it would work - whether this is a documented feature or un-intended
> > side-effect of matrices being stored as vectors with dims, I don't 
> > know.
> >
> > And whilst the formula may have a long history, a number of packages
> > have extended the interface to implement a specific feature, which 
> > don't
> > work with standard functions like lm, glm and friends. I don't see how
> > what I wanted to achieve is greatly different to that or using a 
> > matrix.
> >
> >> To break with such a deeply rooted principle,
> >> you should have very very good reasons, because you're breaking
> >> the concepts on which all other uses of formulae are based.
> >> And this would potentially lead to much confusion of your users,
> >> at least in the way they should learn to think about what
> >> formulae mean.
> >
> > In the end I managed to treat y1 ~ y2 (both data frames) as a special
> > case, which allows the existing formula notation to work as well, so I
> > can use y1 ~ y2, y1 ~ ., data = y2, or y1 ~ var + var2, data = y2. This
> > is what I wanted all along, to extend my interface (not do anything to
> > R's formulae), but to also work in the traditional sense.
> >
> > The model I am writing code for really is modelling the relationship
> > between two matrices of data. In one version of the method, there is
> > real equivalence between both sides of the formula so it would seem odd
> > to treat the two sides of the formula differently. At least to me ;-)
> 
> It seems that I may be responsible for one of these extensions (lhs as 
> a data.frame in cca and rda in vegan package). There the response (lhs) 
> is multivariate or a multispecies community, and you must take that as 
> a whole without manipulation (and if you tried using VGAM you see there 
> really is painful to define lhs with, say, 127 elements). 

Hi Jari,

Thanks for reminding me about this - I'd forgotten about not normally
being able to have a data.frame on the lhs of the formula either - I'm
surprised no-one pulled me up on that one before, either ;-) 

I guess what I'm proposing is really pushing the formula representation
too far for some people. I'm coming round to the y1 ~ ., data = y2 way
of doing things - still prefer y1 ~ y2 though ;-)

Also, both y1 and y2 are community matrices (i.e. both have many, many
species, aka variables for the non-community ecologists reading this).
I'm not sure that it makes sense to treat the two sides differently. In
the predictive co-correspondence mode (the default), multivariate pls is
used to regress one matrix on another, with the number of pls components
being chosen by cross-validation or a permutation test.

> However, in 
> general you shouldn't use models where you use all the 'explanatory' 
> variables (rhs) that yo happen to have by accident. So much bad science 
> has been created with that approach even in your field, Gav. 

Well, I agree with you there...

> The whole 
> idea of formula is the ability to choose from candidate variables. That 
> is: to build a model. Therefore you have one-sided formulae in prcomp() 
> and princomp(): you can say prcomp(~ x1 + log(x2) +x4, data) or 
> prcomp(~ . - x3, data). I think you should try to 

Re: [Rd] problem using model.frame()

2005-08-17 Thread Gavin Simpson
On Wed, 2005-08-17 at 21:48 -0400, Gabor Grothendieck wrote:
> If its just a matter of specifying two data frames how about just
> letting the user specify them as the first two arguments without
> injecting formulas into it so that any of these are allowed but
> data frames are still not allowed in formulas other than in the
> data argument:
> 
> yourfunction(df1, df2)
> yourfunction(y ~ sp1 + sp2)
> yourfunction(y ~., df)
> 
> This could easily be implemented by having yourfunction be
> generic in which case the first one would dispatch
> yourfunction.data.frame and the second and third would
> dispatch yourfunction.formula .  

Hi Gabor,

yourfunction() is already generic, I have .default and .formula methods.
The default implementation of the method (Co-correspondence analysis) is
akin to a regression and uses a form of multivariate PLS. So one data
matrix plays the role of the response and one the predictor. Which is
the reason for wanting to use a formula interface.

Cheers,

G

> On 8/17/05, Gavin Simpson <[EMAIL PROTECTED]> wrote:
> > On Wed, 2005-08-17 at 20:24 +0200, Martin Maechler wrote:
> > > > "GS" == Gavin Simpson <[EMAIL PROTECTED]>
> > > > on Tue, 16 Aug 2005 18:44:23 +0100 writes:
> > >
> > > GS> On Tue, 2005-08-16 at 12:35 -0400, Gabor Grothendieck
> > > GS> wrote:
> > > >> On 8/16/05, Gavin Simpson <[EMAIL PROTECTED]>
> > > >> wrote: > On Tue, 2005-08-16 at 11:25 -0400, Gabor
> > > >> Grothendieck wrote: > > It can handle data frames like
> > > >> this:
> > > >> > >
> > > >> > > model.frame(y1) > > or > > model.frame(~., y1)
> > > >> >
> > > >> > Thanks Gabor,
> > > >> >
> > > >> > Yes, I know that works, but I want the function
> > > >> coca.formula to accept a > formula like this y2 ~ y1,
> > > >> with both y1 and y2 being data frames. It is
> > > >>
> > > >> The expressions I gave work generally (i.e. lm, glm,
> > > >> ...), not just in model.matrix, so would it be ok if the
> > > >> user just does this?
> > > >>
> > > >> yourfunction(y2 ~., y1)
> > >
> > > GS> Thanks again Gabor for your comments,
> > >
> > > GS> I'd prefer the y1 ~ y2 as data frames - as this is the
> > > GS> most natural way of doing things. I'd like to have (y2
> > > GS> ~., y1) as well, and (y2 ~ spp1 + spp2 + spp3, y1) also
> > > GS> work - silently without any trouble.
> > >
> > > I'm sorry, Gavin, I tend to disagree quite a bit.
> > >
> > > The formula notation has quite a history in the S language, and
> > > AFAIK never was the idea to use data.frames as formula
> > > components, but rather as "environments" in which formula
> > > components are looked up --- exactly as Gabor has explained.
> > 
> > Hi Martin, thanks for your comments,
> > 
> > But then one could have a matrix of variables on the rhs of the formula
> > and it would work - whether this is a documented feature or un-intended
> > side-effect of matrices being stored as vectors with dims, I don't know.
> > 
> > And whilst the formula may have a long history, a number of packages
> > have extended the interface to implement a specific feature, which don't
> > work with standard functions like lm, glm and friends. I don't see how
> > what I wanted to achieve is greatly different to that or using a matrix.
> > 
> > > To break with such a deeply rooted principle,
> > > you should have very very good reasons, because you're breaking
> > > the concepts on which all other uses of formulae are based.
> > > And this would potentially lead to much confusion of your users,
> > > at least in the way they should learn to think about what
> > > formulae mean.
> > 
> > In the end I managed to treat y1 ~ y2 (both data frames) as a special
> > case, which allows the existing formula notation to work as well, so I
> > can use y1 ~ y2, y1 ~ ., data = y2, or y1 ~ var + var2, data = y2. This
> > is what I wanted all along, to extend my interface (not do anything to
> > R's formulae), but to also work in the traditional sense.
> > 
> > The model I am writing code for really is modelling the relationship
> > between two matrices of data. In one version of the method, there is
> > real equivalence between both sides of the formula so it would seem odd
> > to treat the two sides of the formula differently. At least to me ;-)
> > 
> > > Martin
> > >
> > >
> > > >> If it really is important to do it the way you describe,
> > > >> are the data frames necessarily numeric? If so you could
> > > >> preprocess your formula by placing as.matrix around all
> > > >> the variables representing data frames using something
> > > >> like this:
> > > >>
> > > >> 
> > > https://www.stat.math.ethz.ch/pipermail/r-help/2004-December/061485.html
> > >
> > > GS> Yes, they are numeric matrices (as data frames). I've
> > > GS> looked at this, but I'd prefer to not have to do too
> > > GS> much messing with the formula.
> > >
> > > >> Of co

Re: [Rd] problem using model.frame()

2005-08-17 Thread Jari Oksanen

On 18 Aug 2005, at 1:49, Gavin Simpson wrote:

> On Wed, 2005-08-17 at 20:24 +0200, Martin Maechler wrote:
>>> "GS" == Gavin Simpson <[EMAIL PROTECTED]>
>>> on Tue, 16 Aug 2005 18:44:23 +0100 writes:
>>
>> GS> On Tue, 2005-08-16 at 12:35 -0400, Gabor Grothendieck
>> GS> wrote:
 On 8/16/05, Gavin Simpson <[EMAIL PROTECTED]>
 wrote: > On Tue, 2005-08-16 at 11:25 -0400, Gabor
 Grothendieck wrote: > > It can handle data frames like
 this:
>>
>> model.frame(y1) > > or > > model.frame(~., y1)
>
> Thanks Gabor,
>
> Yes, I know that works, but I want the function
 coca.formula to accept a > formula like this y2 ~ y1,
 with both y1 and y2 being data frames. It is

 The expressions I gave work generally (i.e. lm, glm,
 ...), not just in model.matrix, so would it be ok if the
 user just does this?

 yourfunction(y2 ~., y1)
>>
>> GS> Thanks again Gabor for your comments,
>>
>> GS> I'd prefer the y1 ~ y2 as data frames - as this is the
>> GS> most natural way of doing things. I'd like to have (y2
>> GS> ~., y1) as well, and (y2 ~ spp1 + spp2 + spp3, y1) also
>> GS> work - silently without any trouble.
>>
>> I'm sorry, Gavin, I tend to disagree quite a bit.
>>
>> The formula notation has quite a history in the S language, and
>> AFAIK never was the idea to use data.frames as formula
>> components, but rather as "environments" in which formula
>> components are looked up --- exactly as Gabor has explained.
>
> Hi Martin, thanks for your comments,
>
> But then one could have a matrix of variables on the rhs of the formula
> and it would work - whether this is a documented feature or un-intended
> side-effect of matrices being stored as vectors with dims, I don't 
> know.
>
> And whilst the formula may have a long history, a number of packages
> have extended the interface to implement a specific feature, which 
> don't
> work with standard functions like lm, glm and friends. I don't see how
> what I wanted to achieve is greatly different to that or using a 
> matrix.
>
>> To break with such a deeply rooted principle,
>> you should have very very good reasons, because you're breaking
>> the concepts on which all other uses of formulae are based.
>> And this would potentially lead to much confusion of your users,
>> at least in the way they should learn to think about what
>> formulae mean.
>
> In the end I managed to treat y1 ~ y2 (both data frames) as a special
> case, which allows the existing formula notation to work as well, so I
> can use y1 ~ y2, y1 ~ ., data = y2, or y1 ~ var + var2, data = y2. This
> is what I wanted all along, to extend my interface (not do anything to
> R's formulae), but to also work in the traditional sense.
>
> The model I am writing code for really is modelling the relationship
> between two matrices of data. In one version of the method, there is
> real equivalence between both sides of the formula so it would seem odd
> to treat the two sides of the formula differently. At least to me ;-)

It seems that I may be responsible for one of these extensions (lhs as 
a data.frame in cca and rda in vegan package). There the response (lhs) 
is multivariate or a multispecies community, and you must take that as 
a whole without manipulation (and if you tried using VGAM you see there 
really is painful to define lhs with, say, 127 elements). However, in 
general you shouldn't use models where you use all the 'explanatory' 
variables (rhs) that yo happen to have by accident. So much bad science 
has been created with that approach even in your field, Gav. The whole 
idea of formula is the ability to choose from candidate variables. That 
is: to build a model. Therefore you have one-sided formulae in prcomp() 
and princomp(): you can say prcomp(~ x1 + log(x2) +x4, data) or 
prcomp(~ . - x3, data). I think you should try to keep it so. Do 
instead like Gabor suggested: you could have a function coca.default or 
coca.matrix with interface:

coca.matrix(matx, maty, matz) -- or you can name this as coca.default.

and coca.formula which essentially parses your formula and returns a 
list of matrices you need:

coca.formula <- function(formula, data)
{
  matricesout <- parsemyformula(formula, data)
 coca(matricesout$matx, matricesout$maty, matricesoutz)
}
Then you need the generic: coca <- function(...) UseMethod("coca") and 
it's done (but fails in R CMD check unless you add "..." in all 
specific functions...). The real work is always done in coca.matrix (or 
coca.default), and the others just chew your data into suitable form 
for your workhorse.

If then somebody thinks that they need all possible variables as 
'explanatory' variables (or perhaps constraints in your case), they 
just call the function as

coca(matx, maty, matz)

And if you have coca.data.frame they don't need 'quacking' with extra 
steps:

coca.data.frame <- function(dfx, dfy dfz) coca(as.matrix(dfx), 
as.matrix(dfy), a

Re: [Rd] problem using model.frame()

2005-08-17 Thread Gabor Grothendieck
If its just a matter of specifying two data frames how about just
letting the user specify them as the first two arguments without
injecting formulas into it so that any of these are allowed but
data frames are still not allowed in formulas other than in the
data argument:

yourfunction(df1, df2)
yourfunction(y ~ sp1 + sp2)
yourfunction(y ~., df)

This could easily be implemented by having yourfunction be
generic in which case the first one would dispatch
yourfunction.data.frame and the second and third would
dispatch yourfunction.formula .  

On 8/17/05, Gavin Simpson <[EMAIL PROTECTED]> wrote:
> On Wed, 2005-08-17 at 20:24 +0200, Martin Maechler wrote:
> > > "GS" == Gavin Simpson <[EMAIL PROTECTED]>
> > > on Tue, 16 Aug 2005 18:44:23 +0100 writes:
> >
> > GS> On Tue, 2005-08-16 at 12:35 -0400, Gabor Grothendieck
> > GS> wrote:
> > >> On 8/16/05, Gavin Simpson <[EMAIL PROTECTED]>
> > >> wrote: > On Tue, 2005-08-16 at 11:25 -0400, Gabor
> > >> Grothendieck wrote: > > It can handle data frames like
> > >> this:
> > >> > >
> > >> > > model.frame(y1) > > or > > model.frame(~., y1)
> > >> >
> > >> > Thanks Gabor,
> > >> >
> > >> > Yes, I know that works, but I want the function
> > >> coca.formula to accept a > formula like this y2 ~ y1,
> > >> with both y1 and y2 being data frames. It is
> > >>
> > >> The expressions I gave work generally (i.e. lm, glm,
> > >> ...), not just in model.matrix, so would it be ok if the
> > >> user just does this?
> > >>
> > >> yourfunction(y2 ~., y1)
> >
> > GS> Thanks again Gabor for your comments,
> >
> > GS> I'd prefer the y1 ~ y2 as data frames - as this is the
> > GS> most natural way of doing things. I'd like to have (y2
> > GS> ~., y1) as well, and (y2 ~ spp1 + spp2 + spp3, y1) also
> > GS> work - silently without any trouble.
> >
> > I'm sorry, Gavin, I tend to disagree quite a bit.
> >
> > The formula notation has quite a history in the S language, and
> > AFAIK never was the idea to use data.frames as formula
> > components, but rather as "environments" in which formula
> > components are looked up --- exactly as Gabor has explained.
> 
> Hi Martin, thanks for your comments,
> 
> But then one could have a matrix of variables on the rhs of the formula
> and it would work - whether this is a documented feature or un-intended
> side-effect of matrices being stored as vectors with dims, I don't know.
> 
> And whilst the formula may have a long history, a number of packages
> have extended the interface to implement a specific feature, which don't
> work with standard functions like lm, glm and friends. I don't see how
> what I wanted to achieve is greatly different to that or using a matrix.
> 
> > To break with such a deeply rooted principle,
> > you should have very very good reasons, because you're breaking
> > the concepts on which all other uses of formulae are based.
> > And this would potentially lead to much confusion of your users,
> > at least in the way they should learn to think about what
> > formulae mean.
> 
> In the end I managed to treat y1 ~ y2 (both data frames) as a special
> case, which allows the existing formula notation to work as well, so I
> can use y1 ~ y2, y1 ~ ., data = y2, or y1 ~ var + var2, data = y2. This
> is what I wanted all along, to extend my interface (not do anything to
> R's formulae), but to also work in the traditional sense.
> 
> The model I am writing code for really is modelling the relationship
> between two matrices of data. In one version of the method, there is
> real equivalence between both sides of the formula so it would seem odd
> to treat the two sides of the formula differently. At least to me ;-)
> 
> > Martin
> >
> >
> > >> If it really is important to do it the way you describe,
> > >> are the data frames necessarily numeric? If so you could
> > >> preprocess your formula by placing as.matrix around all
> > >> the variables representing data frames using something
> > >> like this:
> > >>
> > >> 
> > https://www.stat.math.ethz.ch/pipermail/r-help/2004-December/061485.html
> >
> > GS> Yes, they are numeric matrices (as data frames). I've
> > GS> looked at this, but I'd prefer to not have to do too
> > GS> much messing with the formula.
> >
> > >> Of course, if they are necessarily numeric maybe they can
> > >> be matrices in the first place?
> >
> > GS> Because read.table etc. produce data.frames and this is
> > GS> the natural way to work with data in R.
> >
> > but it is also slightly inefficient if they are numeric.
> > There are places for data frames and for matrices.
> 
> I agree - and in the code I've written, y1 and y2 quickly get coerced to
> matrices before the real number crunching begins.
> 
> However, all the other R modelling functions I have used work with
> data.frames. Arguably, it could cause more confusion to write a function
> tha

Re: [Rd] problem using model.frame()

2005-08-17 Thread Gavin Simpson
On Wed, 2005-08-17 at 20:24 +0200, Martin Maechler wrote:
> > "GS" == Gavin Simpson <[EMAIL PROTECTED]>
> > on Tue, 16 Aug 2005 18:44:23 +0100 writes:
> 
> GS> On Tue, 2005-08-16 at 12:35 -0400, Gabor Grothendieck
> GS> wrote:
> >> On 8/16/05, Gavin Simpson <[EMAIL PROTECTED]>
> >> wrote: > On Tue, 2005-08-16 at 11:25 -0400, Gabor
> >> Grothendieck wrote: > > It can handle data frames like
> >> this:
> >> > >
> >> > > model.frame(y1) > > or > > model.frame(~., y1)
> >> > 
> >> > Thanks Gabor,
> >> > 
> >> > Yes, I know that works, but I want the function
> >> coca.formula to accept a > formula like this y2 ~ y1,
> >> with both y1 and y2 being data frames. It is
> >> 
> >> The expressions I gave work generally (i.e. lm, glm,
> >> ...), not just in model.matrix, so would it be ok if the
> >> user just does this?
> >> 
> >> yourfunction(y2 ~., y1)
> 
> GS> Thanks again Gabor for your comments,
> 
> GS> I'd prefer the y1 ~ y2 as data frames - as this is the
> GS> most natural way of doing things. I'd like to have (y2
> GS> ~., y1) as well, and (y2 ~ spp1 + spp2 + spp3, y1) also
> GS> work - silently without any trouble.
> 
> I'm sorry, Gavin, I tend to disagree quite a bit.
> 
> The formula notation has quite a history in the S language, and
> AFAIK never was the idea to use data.frames as formula
> components, but rather as "environments" in which formula
> components are looked up --- exactly as Gabor has explained.

Hi Martin, thanks for your comments,

But then one could have a matrix of variables on the rhs of the formula
and it would work - whether this is a documented feature or un-intended
side-effect of matrices being stored as vectors with dims, I don't know.

And whilst the formula may have a long history, a number of packages
have extended the interface to implement a specific feature, which don't
work with standard functions like lm, glm and friends. I don't see how
what I wanted to achieve is greatly different to that or using a matrix.

> To break with such a deeply rooted principle, 
> you should have very very good reasons, because you're breaking
> the concepts on which all other uses of formulae are based.
> And this would potentially lead to much confusion of your users,
> at least in the way they should learn to think about what
> formulae mean.

In the end I managed to treat y1 ~ y2 (both data frames) as a special
case, which allows the existing formula notation to work as well, so I
can use y1 ~ y2, y1 ~ ., data = y2, or y1 ~ var + var2, data = y2. This
is what I wanted all along, to extend my interface (not do anything to
R's formulae), but to also work in the traditional sense.

The model I am writing code for really is modelling the relationship
between two matrices of data. In one version of the method, there is
real equivalence between both sides of the formula so it would seem odd
to treat the two sides of the formula differently. At least to me ;-)

> Martin
> 
> 
> >> If it really is important to do it the way you describe,
> >> are the data frames necessarily numeric? If so you could
> >> preprocess your formula by placing as.matrix around all
> >> the variables representing data frames using something
> >> like this:
> >> 
> >> 
> https://www.stat.math.ethz.ch/pipermail/r-help/2004-December/061485.html
> 
> GS> Yes, they are numeric matrices (as data frames). I've
> GS> looked at this, but I'd prefer to not have to do too
> GS> much messing with the formula.
> 
> >> Of course, if they are necessarily numeric maybe they can
> >> be matrices in the first place?
> 
> GS> Because read.table etc. produce data.frames and this is
> GS> the natural way to work with data in R.
> 
> but it is also slightly inefficient if they are numeric.
> There are places for data frames and for matrices.

I agree - and in the code I've written, y1 and y2 quickly get coerced to
matrices before the real number crunching begins.

However, all the other R modelling functions I have used work with
data.frames. Arguably, it could cause more confusion to write a function
that looked, walked and quacked like an R modelling function but needed
the user to apply an extra step to use - a step not usually required
under normal R usage.

All the best,

Gav

> Why should it be a problem to use 
> M <- as.matrix(read.table(..))
> ?
> 
> For large files, it could be quite a bit more efficient,
> needing a bit more of code, to
> use scan() to read the numeric data directly :
> 
>   h1 <- scan(..., n=1) ## 
>   nc <- length(h1)
>   a <- matrix(scan(, what = numeric(), ...),  
>   ncol = nc, dimnames = list(NULL, h1))
> 
> maybe this would be useful to be packaged into
> a small utility with usage
> 
>   read.matrix(...,  type = numeric(), ...)  
> 
> 
> GS> Following your suggestions, I altered my code to
>

Re: [Rd] problem using model.frame()

2005-08-17 Thread Martin Maechler
> "GS" == Gavin Simpson <[EMAIL PROTECTED]>
> on Tue, 16 Aug 2005 18:44:23 +0100 writes:

GS> On Tue, 2005-08-16 at 12:35 -0400, Gabor Grothendieck
GS> wrote:
>> On 8/16/05, Gavin Simpson <[EMAIL PROTECTED]>
>> wrote: > On Tue, 2005-08-16 at 11:25 -0400, Gabor
>> Grothendieck wrote: > > It can handle data frames like
>> this:
>> > >
>> > > model.frame(y1) > > or > > model.frame(~., y1)
>> > 
>> > Thanks Gabor,
>> > 
>> > Yes, I know that works, but I want the function
>> coca.formula to accept a > formula like this y2 ~ y1,
>> with both y1 and y2 being data frames. It is
>> 
>> The expressions I gave work generally (i.e. lm, glm,
>> ...), not just in model.matrix, so would it be ok if the
>> user just does this?
>> 
>> yourfunction(y2 ~., y1)

GS> Thanks again Gabor for your comments,

GS> I'd prefer the y1 ~ y2 as data frames - as this is the
GS> most natural way of doing things. I'd like to have (y2
GS> ~., y1) as well, and (y2 ~ spp1 + spp2 + spp3, y1) also
GS> work - silently without any trouble.

I'm sorry, Gavin, I tend to disagree quite a bit.

The formula notation has quite a history in the S language, and
AFAIK never was the idea to use data.frames as formula
components, but rather as "environments" in which formula
components are looked up --- exactly as Gabor has explained.

To break with such a deeply rooted principle, 
you should have very very good reasons, because you're breaking
the concepts on which all other uses of formulae are based.
And this would potentially lead to much confusion of your users,
at least in the way they should learn to think about what
formulae mean.

Martin


>> If it really is important to do it the way you describe,
>> are the data frames necessarily numeric? If so you could
>> preprocess your formula by placing as.matrix around all
>> the variables representing data frames using something
>> like this:
>> 
>> https://www.stat.math.ethz.ch/pipermail/r-help/2004-December/061485.html

GS> Yes, they are numeric matrices (as data frames). I've
GS> looked at this, but I'd prefer to not have to do too
GS> much messing with the formula.

>> Of course, if they are necessarily numeric maybe they can
>> be matrices in the first place?

GS> Because read.table etc. produce data.frames and this is
GS> the natural way to work with data in R.

but it is also slightly inefficient if they are numeric.
There are places for data frames and for matrices.

Why should it be a problem to use 
M <- as.matrix(read.table(..))
?

For large files, it could be quite a bit more efficient,
needing a bit more of code, to
use scan() to read the numeric data directly :

  h1 <- scan(..., n=1) ## 
  nc <- length(h1)
  a <- matrix(scan(, what = numeric(), ...),  
  ncol = nc, dimnames = list(NULL, h1))

maybe this would be useful to be packaged into
a small utility with usage

  read.matrix(...,  type = numeric(), ...)  


GS> Following your suggestions, I altered my code to
GS> evaluate the rhs of the formula and check if it was of
GS> class "data.frame". If it is then I stop processing and
GS> return it as a data.frame as this point. If not, it
GS> eventually gets passed on to model.frame() for it to
GS> deal with it.

GS> So far - limited testing - it seems to do what I wanted
GS> all along. I'm sure there's a gotcha in there somewhere
GS> but at least the code runs so I can check for problems
GS> against my examples.

GS> Right, back to writing documentation...

GS> G

>> > more intuitive, to my mind at least for this particular
>> example and > analysis, to specify the formula with a
>> data frame on the rhs.
>> > 
>> > model.frame doesn't work with the formula "~ y1" if the
>> object y1, in > the environment when model.frame
>> evaluates the formula, is a data.frame.  > It works if y1
>> is a matrix, however. I'd like to work around this >
>> problem, say by creating an environment in which y1 is
>> modified to be a > matrix, if possible. Can this be done?
>> > 
>> > At the moment I have something working by grabbing the
>> bits of the > formula and then using get() to grab the
>> named object. Of course, this > won't work if someone
>> wants to use R's formula interface with the > following
>> formula y2 ~ var1 + var2 + var3, data = y1, or to use the
>> > subset argument common to many formula
>> implementations. I'd like to have > the function work in
>> as general a manner as possible, so I'm fishing > around
>> for potential solutions.
>> > 
>> > All the best,
>> > 
>> > Gav
>> > 
>> > >
>> > > On 8/16/05, Gavin Simpson <[EMAIL PROTECTED]>
>> wrote: > > > Hi I'm having a problem with model.frame,
>> encapsulated in this e

Re: [Rd] problem using model.frame()

2005-08-16 Thread Gavin Simpson
On Tue, 2005-08-16 at 12:35 -0400, Gabor Grothendieck wrote:
> On 8/16/05, Gavin Simpson <[EMAIL PROTECTED]> wrote:
> > On Tue, 2005-08-16 at 11:25 -0400, Gabor Grothendieck wrote:
> > > It can handle data frames like this:
> > >
> > >   model.frame(y1)
> > > or
> > >   model.frame(~., y1)
> > 
> > Thanks Gabor,
> > 
> > Yes, I know that works, but I want the function coca.formula to accept a
> > formula like this y2 ~ y1, with both y1 and y2 being data frames. It is
> 
> The expressions I gave work generally (i.e. lm, glm, ...), not just in 
> model.matrix, so would it be ok if the user just does this?
> 
> yourfunction(y2 ~., y1)

Thanks again Gabor for your comments,

I'd prefer the y1 ~ y2 as data frames - as this is the most natural way
of doing things. I'd like to have (y2 ~., y1) as well, and (y2 ~ spp1 +
spp2 + spp3, y1) also work - silently without any trouble.

> If it really is important to do it the way you describe, are the data 
> frames necessarily numeric? If so you could preprocess your formula 
> by placing as.matrix around all the variables representing data frames 
> using something like this:
> 
> https://www.stat.math.ethz.ch/pipermail/r-help/2004-December/061485.html

Yes, they are numeric matrices (as data frames). I've looked at this,
but I'd prefer to not have to do too much messing with the formula.

> Of course, if they are necessarily numeric maybe they can be matrices in
> the first place?

Because read.table etc. produce data.frames and this is the natural way
to work with data in R.

Following your suggestions, I altered my code to evaluate the rhs of the
formula and check if it was of class "data.frame". If it is then I stop
processing and return it as a data.frame as this point. If not, it
eventually gets passed on to model.frame() for it to deal with it.

So far - limited testing - it seems to do what I wanted all along. I'm
sure there's a gotcha in there somewhere but at least the code runs so I
can check for problems against my examples.

Right, back to writing documentation...

G

> > more intuitive, to my mind at least for this particular example and
> > analysis, to specify the formula with a data frame on the rhs.
> > 
> > model.frame doesn't work with the formula "~ y1" if the object y1, in
> > the environment when model.frame evaluates the formula, is a data.frame.
> > It works if y1 is a matrix, however. I'd like to work around this
> > problem, say by creating an environment in which y1 is modified to be a
> > matrix, if possible. Can this be done?
> > 
> > At the moment I have something working by grabbing the bits of the
> > formula and then using get() to grab the named object. Of course, this
> > won't work if someone wants to use R's formula interface with the
> > following formula y2 ~ var1 + var2 + var3, data = y1, or to use the
> > subset argument common to many formula implementations. I'd like to have
> > the function work in as general a manner as possible, so I'm fishing
> > around for potential solutions.
> > 
> > All the best,
> > 
> > Gav
> > 
> > >
> > > On 8/16/05, Gavin Simpson <[EMAIL PROTECTED]> wrote:
> > > > Hi I'm having a problem with model.frame, encapsulated in this example:
> > > >
> > > > y1 <- matrix(c(3,1,0,1,0,1,1,0,0,0,1,0,0,0,1,1,0,1,1,1),
> > > > nrow = 5, byrow = TRUE)
> > > > y1 <- as.data.frame(y1)
> > > > rownames(y1) <- paste("site", 1:5, sep = "")
> > > > colnames(y1) <- paste("spp", 1:4, sep = "")
> > > > y1
> > > >
> > > > model.frame(~ y1)
> > > > Error in model.frame(formula, rownames, variables, varnames, extras, 
> > > > extranames,  :
> > > >invalid variable type
> > > >
> > > > temp <- as.matrix(y1)
> > > > model.frame(~ temp)
> > > >  temp.spp1 temp.spp2 temp.spp3 temp.spp4
> > > > 1 3 1 0 1
> > > > 2 0 1 1 0
> > > > 3 0 0 1 0
> > > > 4 0 0 1 1
> > > > 5 0 1 1 1
> > > >
> > > > Ideally the above wouldn't have names like temp.var1, temp.var2, but one
> > > > could deal with that later.
> > > >
> > > > I have tracked down the source of the error message to line 1330 in
> > > > model.c - here I'm stumped as I don't know any C, but it looks as if the
> > > > code is looping over the variables in the formula and checking of they
> > > > are the right "type". So a matrix of variables gets through, but a
> > > > data.frame doesn't.
> > > >
> > > > It would be good if model.frame could cope with data.frames in formulae,
> > > > but seeing as I am incapable of providing a patch, is there a way around
> > > > this problem?
> > > >
> > > > Below is the head of the function I am currently using, including the
> > > > function for parsing the formula - borrowed and hacked from
> > > > ordiParseFormula() in package vegan.
> > > >
> > > > I can work out the class of the rhs of the forumla. Is there a way to
> > > > create a suitable environment

Re: [Rd] problem using model.frame()

2005-08-16 Thread Gabor Grothendieck
On 8/16/05, Gavin Simpson <[EMAIL PROTECTED]> wrote:
> On Tue, 2005-08-16 at 11:25 -0400, Gabor Grothendieck wrote:
> > It can handle data frames like this:
> >
> >   model.frame(y1)
> > or
> >   model.frame(~., y1)
> 
> Thanks Gabor,
> 
> Yes, I know that works, but I want the function coca.formula to accept a
> formula like this y2 ~ y1, with both y1 and y2 being data frames. It is

The expressions I gave work generally (i.e. lm, glm, ...), not just in 
model.matrix, so would it be ok if the user just does this?

yourfunction(y2 ~., y1)

If it really is important to do it the way you describe, are the data 
frames necessarily numeric? If so you could preprocess your formula 
by placing as.matrix around all the variables representing data frames 
using something like this:

https://www.stat.math.ethz.ch/pipermail/r-help/2004-December/061485.html

Of course, if they are necessarily numeric maybe they can be matrices in
the first place?

> more intuitive, to my mind at least for this particular example and
> analysis, to specify the formula with a data frame on the rhs.
> 
> model.frame doesn't work with the formula "~ y1" if the object y1, in
> the environment when model.frame evaluates the formula, is a data.frame.
> It works if y1 is a matrix, however. I'd like to work around this
> problem, say by creating an environment in which y1 is modified to be a
> matrix, if possible. Can this be done?
> 
> At the moment I have something working by grabbing the bits of the
> formula and then using get() to grab the named object. Of course, this
> won't work if someone wants to use R's formula interface with the
> following formula y2 ~ var1 + var2 + var3, data = y1, or to use the
> subset argument common to many formula implementations. I'd like to have
> the function work in as general a manner as possible, so I'm fishing
> around for potential solutions.
> 
> All the best,
> 
> Gav
> 
> >
> > On 8/16/05, Gavin Simpson <[EMAIL PROTECTED]> wrote:
> > > Hi I'm having a problem with model.frame, encapsulated in this example:
> > >
> > > y1 <- matrix(c(3,1,0,1,0,1,1,0,0,0,1,0,0,0,1,1,0,1,1,1),
> > > nrow = 5, byrow = TRUE)
> > > y1 <- as.data.frame(y1)
> > > rownames(y1) <- paste("site", 1:5, sep = "")
> > > colnames(y1) <- paste("spp", 1:4, sep = "")
> > > y1
> > >
> > > model.frame(~ y1)
> > > Error in model.frame(formula, rownames, variables, varnames, extras, 
> > > extranames,  :
> > >invalid variable type
> > >
> > > temp <- as.matrix(y1)
> > > model.frame(~ temp)
> > >  temp.spp1 temp.spp2 temp.spp3 temp.spp4
> > > 1 3 1 0 1
> > > 2 0 1 1 0
> > > 3 0 0 1 0
> > > 4 0 0 1 1
> > > 5 0 1 1 1
> > >
> > > Ideally the above wouldn't have names like temp.var1, temp.var2, but one
> > > could deal with that later.
> > >
> > > I have tracked down the source of the error message to line 1330 in
> > > model.c - here I'm stumped as I don't know any C, but it looks as if the
> > > code is looping over the variables in the formula and checking of they
> > > are the right "type". So a matrix of variables gets through, but a
> > > data.frame doesn't.
> > >
> > > It would be good if model.frame could cope with data.frames in formulae,
> > > but seeing as I am incapable of providing a patch, is there a way around
> > > this problem?
> > >
> > > Below is the head of the function I am currently using, including the
> > > function for parsing the formula - borrowed and hacked from
> > > ordiParseFormula() in package vegan.
> > >
> > > I can work out the class of the rhs of the forumla. Is there a way to
> > > create a suitable environment for the data argument of parseFormula()
> > > such that it contains the rhs dataframe coerced to a matrix, which then
> > > should get through model.frame.default without error? How would I go
> > > about manipulating/creating such an environment? Any other ideas?
> > >
> > > Thanks in advance
> > >
> > > Gav
> > >
> > > coca.formula <- function(formula, method = c("predictive", "symmetric"),
> > > reg.method = c("simpls", "eigen"), weights = NULL,
> > > n.axes = NULL, symmetric = FALSE, data)
> > >  {
> > >parseFormula <- function (formula, data)
> > >  {
> > >browser()
> > >Terms <- terms(formula, "Condition", data = data)
> > >flapart <- fla <- formula <- formula(Terms, width.cutoff = 500)
> > >specdata <- formula[[2]]
> > >X <- eval(specdata, data, parent.frame())
> > >X <- as.matrix(X)
> > >formula[[2]] <- NULL
> > >if (formula[[2]] == "1" || formula[[2]] == "0")
> > >  Y <- NULL
> > >else {
> > >  mf <- model.frame(formula, data, na.action = na.fail)
> > >  Y <- model.matrix(formula, mf)
> > >  if (any(colnames(Y) == "(Intercept)")) {
> > >  

Re: [Rd] problem using model.frame()

2005-08-16 Thread Gavin Simpson
On Tue, 2005-08-16 at 11:25 -0400, Gabor Grothendieck wrote:
> It can handle data frames like this:
> 
>   model.frame(y1)
> or
>   model.frame(~., y1)

Thanks Gabor,

Yes, I know that works, but I want the function coca.formula to accept a
formula like this y2 ~ y1, with both y1 and y2 being data frames. It is
more intuitive, to my mind at least for this particular example and
analysis, to specify the formula with a data frame on the rhs.

model.frame doesn't work with the formula "~ y1" if the object y1, in
the environment when model.frame evaluates the formula, is a data.frame.
It works if y1 is a matrix, however. I'd like to work around this
problem, say by creating an environment in which y1 is modified to be a
matrix, if possible. Can this be done?

At the moment I have something working by grabbing the bits of the
formula and then using get() to grab the named object. Of course, this
won't work if someone wants to use R's formula interface with the
following formula y2 ~ var1 + var2 + var3, data = y1, or to use the
subset argument common to many formula implementations. I'd like to have
the function work in as general a manner as possible, so I'm fishing
around for potential solutions.

All the best,

Gav 

> 
> On 8/16/05, Gavin Simpson <[EMAIL PROTECTED]> wrote:
> > Hi I'm having a problem with model.frame, encapsulated in this example:
> > 
> > y1 <- matrix(c(3,1,0,1,0,1,1,0,0,0,1,0,0,0,1,1,0,1,1,1),
> > nrow = 5, byrow = TRUE)
> > y1 <- as.data.frame(y1)
> > rownames(y1) <- paste("site", 1:5, sep = "")
> > colnames(y1) <- paste("spp", 1:4, sep = "")
> > y1
> > 
> > model.frame(~ y1)
> > Error in model.frame(formula, rownames, variables, varnames, extras, 
> > extranames,  :
> >invalid variable type
> > 
> > temp <- as.matrix(y1)
> > model.frame(~ temp)
> >  temp.spp1 temp.spp2 temp.spp3 temp.spp4
> > 1 3 1 0 1
> > 2 0 1 1 0
> > 3 0 0 1 0
> > 4 0 0 1 1
> > 5 0 1 1 1
> > 
> > Ideally the above wouldn't have names like temp.var1, temp.var2, but one
> > could deal with that later.
> > 
> > I have tracked down the source of the error message to line 1330 in
> > model.c - here I'm stumped as I don't know any C, but it looks as if the
> > code is looping over the variables in the formula and checking of they
> > are the right "type". So a matrix of variables gets through, but a
> > data.frame doesn't.
> > 
> > It would be good if model.frame could cope with data.frames in formulae,
> > but seeing as I am incapable of providing a patch, is there a way around
> > this problem?
> > 
> > Below is the head of the function I am currently using, including the
> > function for parsing the formula - borrowed and hacked from
> > ordiParseFormula() in package vegan.
> > 
> > I can work out the class of the rhs of the forumla. Is there a way to
> > create a suitable environment for the data argument of parseFormula()
> > such that it contains the rhs dataframe coerced to a matrix, which then
> > should get through model.frame.default without error? How would I go
> > about manipulating/creating such an environment? Any other ideas?
> > 
> > Thanks in advance
> > 
> > Gav
> > 
> > coca.formula <- function(formula, method = c("predictive", "symmetric"),
> > reg.method = c("simpls", "eigen"), weights = NULL,
> > n.axes = NULL, symmetric = FALSE, data)
> >  {
> >parseFormula <- function (formula, data)
> >  {
> >browser()
> >Terms <- terms(formula, "Condition", data = data)
> >flapart <- fla <- formula <- formula(Terms, width.cutoff = 500)
> >specdata <- formula[[2]]
> >X <- eval(specdata, data, parent.frame())
> >X <- as.matrix(X)
> >formula[[2]] <- NULL
> >if (formula[[2]] == "1" || formula[[2]] == "0")
> >  Y <- NULL
> >else {
> >  mf <- model.frame(formula, data, na.action = na.fail)
> >  Y <- model.matrix(formula, mf)
> >  if (any(colnames(Y) == "(Intercept)")) {
> >xint <- which(colnames(Y) == "(Intercept)")
> >Y <- Y[, -xint, drop = FALSE]
> >  }
> >}
> >list(X = X, Y = Y)
> >  }
> >if (missing(data))
> >  data <- parent.frame()
> >#browser()
> >dat <- parseFormula(formula, data)
> > 
> > --
> > %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
> > Gavin Simpson [T] +44 (0)20 7679 5522
> > ENSIS Research Fellow [F] +44 (0)20 7679 7565
> > ENSIS Ltd. & ECRC [E] gavin.simpsonATNOSPAMucl.ac.uk
> > UCL Department of Geography   [W] http://www.ucl.ac.uk/~ucfagls/cv/
> > 26 Bedford Way[W] http://www.ucl.ac.uk/~ucfagls/
> > London.  WC1H 0AP.
> > %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

Re: [Rd] problem using model.frame()

2005-08-16 Thread Gabor Grothendieck
It can handle data frames like this:

model.frame(y1)
or
model.frame(~., y1)


On 8/16/05, Gavin Simpson <[EMAIL PROTECTED]> wrote:
> Hi I'm having a problem with model.frame, encapsulated in this example:
> 
> y1 <- matrix(c(3,1,0,1,0,1,1,0,0,0,1,0,0,0,1,1,0,1,1,1),
> nrow = 5, byrow = TRUE)
> y1 <- as.data.frame(y1)
> rownames(y1) <- paste("site", 1:5, sep = "")
> colnames(y1) <- paste("spp", 1:4, sep = "")
> y1
> 
> model.frame(~ y1)
> Error in model.frame(formula, rownames, variables, varnames, extras, 
> extranames,  :
>invalid variable type
> 
> temp <- as.matrix(y1)
> model.frame(~ temp)
>  temp.spp1 temp.spp2 temp.spp3 temp.spp4
> 1 3 1 0 1
> 2 0 1 1 0
> 3 0 0 1 0
> 4 0 0 1 1
> 5 0 1 1 1
> 
> Ideally the above wouldn't have names like temp.var1, temp.var2, but one
> could deal with that later.
> 
> I have tracked down the source of the error message to line 1330 in
> model.c - here I'm stumped as I don't know any C, but it looks as if the
> code is looping over the variables in the formula and checking of they
> are the right "type". So a matrix of variables gets through, but a
> data.frame doesn't.
> 
> It would be good if model.frame could cope with data.frames in formulae,
> but seeing as I am incapable of providing a patch, is there a way around
> this problem?
> 
> Below is the head of the function I am currently using, including the
> function for parsing the formula - borrowed and hacked from
> ordiParseFormula() in package vegan.
> 
> I can work out the class of the rhs of the forumla. Is there a way to
> create a suitable environment for the data argument of parseFormula()
> such that it contains the rhs dataframe coerced to a matrix, which then
> should get through model.frame.default without error? How would I go
> about manipulating/creating such an environment? Any other ideas?
> 
> Thanks in advance
> 
> Gav
> 
> coca.formula <- function(formula, method = c("predictive", "symmetric"),
> reg.method = c("simpls", "eigen"), weights = NULL,
> n.axes = NULL, symmetric = FALSE, data)
>  {
>parseFormula <- function (formula, data)
>  {
>browser()
>Terms <- terms(formula, "Condition", data = data)
>flapart <- fla <- formula <- formula(Terms, width.cutoff = 500)
>specdata <- formula[[2]]
>X <- eval(specdata, data, parent.frame())
>X <- as.matrix(X)
>formula[[2]] <- NULL
>if (formula[[2]] == "1" || formula[[2]] == "0")
>  Y <- NULL
>else {
>  mf <- model.frame(formula, data, na.action = na.fail)
>  Y <- model.matrix(formula, mf)
>  if (any(colnames(Y) == "(Intercept)")) {
>xint <- which(colnames(Y) == "(Intercept)")
>Y <- Y[, -xint, drop = FALSE]
>  }
>}
>list(X = X, Y = Y)
>  }
>if (missing(data))
>  data <- parent.frame()
>#browser()
>dat <- parseFormula(formula, data)
> 
> --
> %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
> Gavin Simpson [T] +44 (0)20 7679 5522
> ENSIS Research Fellow [F] +44 (0)20 7679 7565
> ENSIS Ltd. & ECRC [E] gavin.simpsonATNOSPAMucl.ac.uk
> UCL Department of Geography   [W] http://www.ucl.ac.uk/~ucfagls/cv/
> 26 Bedford Way[W] http://www.ucl.ac.uk/~ucfagls/
> London.  WC1H 0AP.
> %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel