Re: [Rd] Problem using model.frame with argument subset in own function
Gavin, I ran into the same cryptic "invalid subscript type 'closure'" message in a slightly less complicated scenario, and wanted to post the cause in my case (the root cause is probably the same either way). Similarly to your case, I was subsetting a data frame. I had a list of variable names corresponding to columns in the frame. Unfortunately the variable name I had assigned to this list, var, coincided with the name of a base package function in R for variance. When I attempted to subset df[, var], I got the 'closure' error message, but if I renamed the list of variable names so the collision didn't occur, e.g. df[, vars] instead of df[, var], it worked as expected. Sincerely, Greg B. Hill Gavin Simpson wrote: > > Dear List, > > I am writing a formula method for a function in a package I maintain. I > want the method to return a data.frame that potentially only contains > some of the variables in 'data', as specified by the formula. > > The problem I am having is in writing the function and wrapping it > around model.frame. Consider the following data frame: > > dat <- data.frame(A = runif(10), B = runif(10), C = runif(10)) > > And the wrapper function: > > foo <- function(formula, data = NULL, ..., subset = NULL, > na.action = na.pass) { > mt <- terms(formula, data = data, simplify = TRUE) > mf <- model.frame(formula(mt), data = data, subset = subset, > na.action = na.action) > ## real function would do more stuff here and pass mf on to > ## other functions > mf > } > > This is how I envisage the function being called. The real world use > would have a data.frame with tens or hundreds of components where only a > few need to be excluded. Hence wanting formulas of the form below to > work. > > foo(~ . - B, data = dat) > > The aim is to return only columns A and C in an object returned by > model.frame. However, when I run the above, I get the following error: > >> foo(~ A + B, data = dat) > Error in xj[i] : invalid subscript type 'closure' > > I've tracked this down to the line in model.frame.default > > subset <- eval(substitute(subset), data, env) > > After evaluating this line, subset contains: > > Browse[1]> subset > function (x, ...) > UseMethod("subset") > > > Not NULL, and hence the error later on when calling the internal > model.frame code. > > So the question is, what am I doing wrong? > > If I leave the subset argument out of the definition of foo and rely > upon the default in model.frame.default, the function works as > expected. > > Perhaps the question should be, how do I modify foo() to allow it to > have a formal subset argument, passed to model.frame? > > Any other suggestions gratefully accepted. > > Thanks in advance, > > G > -- > %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% > Dr. Gavin Simpson [t] +44 (0)20 7679 0522 > ECRC, UCL Geography, [f] +44 (0)20 7679 0565 > Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk > Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ > UK. WC1E 6BT. [w] http://www.freshwaters.org.uk > %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > > -- View this message in context: http://www.nabble.com/Problem-using-model.frame-with-argument-subset-in-own-function-tp24880908p25373059.html Sent from the R devel mailing list archive at Nabble.com. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Problem using model.frame with argument subset in own function
On Sun, 2009-08-09 at 11:32 -0500, Douglas Bates wrote: > On Sat, Aug 8, 2009 at 1:31 PM, Gavin Simpson wrote: > > Dear List, > > > I am writing a formula method for a function in a package I maintain. I > > want the method to return a data.frame that potentially only contains > > some of the variables in 'data', as specified by the formula. > > The usual way to call model.frame (the method that Thomas Lumley has > called "the standard, non-standard evaluation) is to match the call to > foo, replace the name of the function being called with > as.name("model.frame") and force an evaluation in the parent frame. > it looks like > Thanks Doug. I also received an off-list reply from Brian Ripley suggesting two alternative approaches. The bit I was missing was how to manipulate other aspects of the call - it hadn't clicked that the arguments of the function can be manipulated by altering the components of the matched call. In the end I came up with something like: mf <- match.call() mf[[1]] <- as.name("model.frame") mt <- terms(formula, data = data, simplify = TRUE) mf[[2]] <- formula(mt, data = data) mf$na.action <- substitute(na.action) dots <- list(...) mf[[names(dots)]] <- NULL mf <- eval(mf,parent.frame()) tran.default(mf, ...) which seems to be working in the tests I have been running, allowing me to pass along some components of the call to model.frame, whilst reserving ... for the default methods arguments, and also get the simplified formula. All the best, G > mf <- match.call() > if (missing(data)) data <- environment(formula) > ## evaluate and install the model frame > m <- match(c("formula", "data", "subset", "weights", "na.action", > "offset"), >names(mf), 0) > mf <- mf[c(1, m)] > mf$drop.unused.levels <- TRUE > mf[[1]] <- as.name("model.frame") > fr <- eval(mf, parent.frame()) > > The point of all of this manipulation is to achieve the kind of result > you need where the subset argument is evaluated in the correct > environmnent. > > > The problem I am having is in writing the function and wrapping it > > around model.frame. Consider the following data frame: > > > > dat <- data.frame(A = runif(10), B = runif(10), C = runif(10)) > > > > And the wrapper function: > > > > foo <- function(formula, data = NULL, ..., subset = NULL, > >na.action = na.pass) { > >mt <- terms(formula, data = data, simplify = TRUE) > >mf <- model.frame(formula(mt), data = data, subset = subset, > > na.action = na.action) > >## real function would do more stuff here and pass mf on to > >## other functions > >mf > > } > > > > This is how I envisage the function being called. The real world use > > would have a data.frame with tens or hundreds of components where only a > > few need to be excluded. Hence wanting formulas of the form below to > > work. > > > > foo(~ . - B, data = dat) > > > > The aim is to return only columns A and C in an object returned by > > model.frame. However, when I run the above, I get the following error: > > > >> foo(~ A + B, data = dat) > > Error in xj[i] : invalid subscript type 'closure' > > > > I've tracked this down to the line in model.frame.default > > > >subset <- eval(substitute(subset), data, env) > > > > After evaluating this line, subset contains: > > > > Browse[1]> subset > > function (x, ...) > > UseMethod("subset") > > > > > > Not NULL, and hence the error later on when calling the internal > > model.frame code. > > > > So the question is, what am I doing wrong? > > > > If I leave the subset argument out of the definition of foo and rely > > upon the default in model.frame.default, the function works as > > expected. > > > > Perhaps the question should be, how do I modify foo() to allow it to > > have a formal subset argument, passed to model.frame? > > > > Any other suggestions gratefully accepted. > > > > Thanks in advance, > > > > G > > -- > > %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% > > Dr. Gavin Simpson [t] +44 (0)20 7679 0522 > > ECRC, UCL Geography, [f] +44 (0)20 7679 0565 > > Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk > > Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ > > UK. WC1E 6BT. [w] http://www.freshwaters.org.uk > > %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% > > > > __ > > R-devel@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %
Re: [Rd] Problem using model.frame with argument subset in own function
On Sat, Aug 8, 2009 at 1:31 PM, Gavin Simpson wrote: > Dear List, > I am writing a formula method for a function in a package I maintain. I > want the method to return a data.frame that potentially only contains > some of the variables in 'data', as specified by the formula. The usual way to call model.frame (the method that Thomas Lumley has called "the standard, non-standard evaluation) is to match the call to foo, replace the name of the function being called with as.name("model.frame") and force an evaluation in the parent frame. it looks like mf <- match.call() if (missing(data)) data <- environment(formula) ## evaluate and install the model frame m <- match(c("formula", "data", "subset", "weights", "na.action", "offset"), names(mf), 0) mf <- mf[c(1, m)] mf$drop.unused.levels <- TRUE mf[[1]] <- as.name("model.frame") fr <- eval(mf, parent.frame()) The point of all of this manipulation is to achieve the kind of result you need where the subset argument is evaluated in the correct environmnent. > The problem I am having is in writing the function and wrapping it > around model.frame. Consider the following data frame: > > dat <- data.frame(A = runif(10), B = runif(10), C = runif(10)) > > And the wrapper function: > > foo <- function(formula, data = NULL, ..., subset = NULL, > na.action = na.pass) { > mt <- terms(formula, data = data, simplify = TRUE) > mf <- model.frame(formula(mt), data = data, subset = subset, > na.action = na.action) > ## real function would do more stuff here and pass mf on to > ## other functions > mf > } > > This is how I envisage the function being called. The real world use > would have a data.frame with tens or hundreds of components where only a > few need to be excluded. Hence wanting formulas of the form below to > work. > > foo(~ . - B, data = dat) > > The aim is to return only columns A and C in an object returned by > model.frame. However, when I run the above, I get the following error: > >> foo(~ A + B, data = dat) > Error in xj[i] : invalid subscript type 'closure' > > I've tracked this down to the line in model.frame.default > > subset <- eval(substitute(subset), data, env) > > After evaluating this line, subset contains: > > Browse[1]> subset > function (x, ...) > UseMethod("subset") > > > Not NULL, and hence the error later on when calling the internal > model.frame code. > > So the question is, what am I doing wrong? > > If I leave the subset argument out of the definition of foo and rely > upon the default in model.frame.default, the function works as > expected. > > Perhaps the question should be, how do I modify foo() to allow it to > have a formal subset argument, passed to model.frame? > > Any other suggestions gratefully accepted. > > Thanks in advance, > > G > -- > %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% > Dr. Gavin Simpson [t] +44 (0)20 7679 0522 > ECRC, UCL Geography, [f] +44 (0)20 7679 0565 > Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk > Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ > UK. WC1E 6BT. [w] http://www.freshwaters.org.uk > %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] problem using model.frame()
On Thu, 2005-08-18 at 09:00 -0400, Gabor Grothendieck wrote: > I think this one is a hard call. Designing software is a > series of tradeoffs. Its nice to maintain consistency with > the R base, but in case of extensions (rather than changing > behavior) as in this case, the argument against the change > carries less weight. > > The main problems with extensions are (1) that one has to > remember which functions/packages have which extensions if > one is to use them and (2) they can interfere with other > future extensions. > > On the other hand, if one is using a particular package a > lot then convenience features like this may be attractive. > Also, packages are where authors have the freedom to try out > new ideas and new functionality without being constrained. > > Perhaps, if the extension in question is added there could be > a warning in the help file that this is a convenience feature > of this particular package and is not generally available > throughout R. Thanks again Gabor for another useful contribution to this debate. Also thanks to Martin, Gabor and Jari for their comments, ideas, suggestions and viewpoints. I still like y1 ~ y2 (both data frames), but during my bike ride to work this morning I considered both sides of the argument and my position has moved towards the R way of doing things - far be it for little old me to go against years of S-formula tradition. So I'll revert the code back to accepting y1 ~ ., data = y2 and leave it to throw an error for the rhs being a data frame case. Once again, thank you for helping me work through this dilemma. All the best, Gav > On 8/18/05, Gavin Simpson <[EMAIL PROTECTED]> wrote: > > On Thu, 2005-08-18 at 07:57 +0300, Jari Oksanen wrote: > > > On 18 Aug 2005, at 1:49, Gavin Simpson wrote: > > > > > > > On Wed, 2005-08-17 at 20:24 +0200, Martin Maechler wrote: > > > >>> "GS" == Gavin Simpson <[EMAIL PROTECTED]> > > > >>> on Tue, 16 Aug 2005 18:44:23 +0100 writes: > > > >> > > > >> GS> On Tue, 2005-08-16 at 12:35 -0400, Gabor Grothendieck > > > >> GS> wrote: > > > On 8/16/05, Gavin Simpson <[EMAIL PROTECTED]> > > > wrote: > On Tue, 2005-08-16 at 11:25 -0400, Gabor > > > Grothendieck wrote: > > It can handle data frames like > > > this: > > > >> > > > >> model.frame(y1) > > or > > model.frame(~., y1) > > > > > > > > Thanks Gabor, > > > > > > > > Yes, I know that works, but I want the function > > > coca.formula to accept a > formula like this y2 ~ y1, > > > with both y1 and y2 being data frames. It is > > > > > > The expressions I gave work generally (i.e. lm, glm, > > > ...), not just in model.matrix, so would it be ok if the > > > user just does this? > > > > > > yourfunction(y2 ~., y1) > > > >> > > > >> GS> Thanks again Gabor for your comments, > > > >> > > > >> GS> I'd prefer the y1 ~ y2 as data frames - as this is the > > > >> GS> most natural way of doing things. I'd like to have (y2 > > > >> GS> ~., y1) as well, and (y2 ~ spp1 + spp2 + spp3, y1) also > > > >> GS> work - silently without any trouble. > > > >> > > > >> I'm sorry, Gavin, I tend to disagree quite a bit. > > > >> > > > >> The formula notation has quite a history in the S language, and > > > >> AFAIK never was the idea to use data.frames as formula > > > >> components, but rather as "environments" in which formula > > > >> components are looked up --- exactly as Gabor has explained. > > > > > > > > Hi Martin, thanks for your comments, > > > > > > > > But then one could have a matrix of variables on the rhs of the formula > > > > and it would work - whether this is a documented feature or un-intended > > > > side-effect of matrices being stored as vectors with dims, I don't > > > > know. > > > > > > > > And whilst the formula may have a long history, a number of packages > > > > have extended the interface to implement a specific feature, which > > > > don't > > > > work with standard functions like lm, glm and friends. I don't see how > > > > what I wanted to achieve is greatly different to that or using a > > > > matrix. > > > > > > > >> To break with such a deeply rooted principle, > > > >> you should have very very good reasons, because you're breaking > > > >> the concepts on which all other uses of formulae are based. > > > >> And this would potentially lead to much confusion of your users, > > > >> at least in the way they should learn to think about what > > > >> formulae mean. > > > > > > > > In the end I managed to treat y1 ~ y2 (both data frames) as a special > > > > case, which allows the existing formula notation to work as well, so I > > > > can use y1 ~ y2, y1 ~ ., data = y2, or y1 ~ var + var2, data = y2. This > > > > is what I wanted all along, to extend my interface (not do anything to > > > > R's formulae), but to also work in the traditional sense. > > > > > > > > The model I am writing code for really is modelling the relation
Re: [Rd] problem using model.frame()
On Thu, 2005-08-18 at 07:57 +0300, Jari Oksanen wrote: > On 18 Aug 2005, at 1:49, Gavin Simpson wrote: > > > On Wed, 2005-08-17 at 20:24 +0200, Martin Maechler wrote: > >>> "GS" == Gavin Simpson <[EMAIL PROTECTED]> > >>> on Tue, 16 Aug 2005 18:44:23 +0100 writes: > >> > >> GS> On Tue, 2005-08-16 at 12:35 -0400, Gabor Grothendieck > >> GS> wrote: > On 8/16/05, Gavin Simpson <[EMAIL PROTECTED]> > wrote: > On Tue, 2005-08-16 at 11:25 -0400, Gabor > Grothendieck wrote: > > It can handle data frames like > this: > >> > >> model.frame(y1) > > or > > model.frame(~., y1) > > > > Thanks Gabor, > > > > Yes, I know that works, but I want the function > coca.formula to accept a > formula like this y2 ~ y1, > with both y1 and y2 being data frames. It is > > The expressions I gave work generally (i.e. lm, glm, > ...), not just in model.matrix, so would it be ok if the > user just does this? > > yourfunction(y2 ~., y1) > >> > >> GS> Thanks again Gabor for your comments, > >> > >> GS> I'd prefer the y1 ~ y2 as data frames - as this is the > >> GS> most natural way of doing things. I'd like to have (y2 > >> GS> ~., y1) as well, and (y2 ~ spp1 + spp2 + spp3, y1) also > >> GS> work - silently without any trouble. > >> > >> I'm sorry, Gavin, I tend to disagree quite a bit. > >> > >> The formula notation has quite a history in the S language, and > >> AFAIK never was the idea to use data.frames as formula > >> components, but rather as "environments" in which formula > >> components are looked up --- exactly as Gabor has explained. > > > > Hi Martin, thanks for your comments, > > > > But then one could have a matrix of variables on the rhs of the formula > > and it would work - whether this is a documented feature or un-intended > > side-effect of matrices being stored as vectors with dims, I don't > > know. > > > > And whilst the formula may have a long history, a number of packages > > have extended the interface to implement a specific feature, which > > don't > > work with standard functions like lm, glm and friends. I don't see how > > what I wanted to achieve is greatly different to that or using a > > matrix. > > > >> To break with such a deeply rooted principle, > >> you should have very very good reasons, because you're breaking > >> the concepts on which all other uses of formulae are based. > >> And this would potentially lead to much confusion of your users, > >> at least in the way they should learn to think about what > >> formulae mean. > > > > In the end I managed to treat y1 ~ y2 (both data frames) as a special > > case, which allows the existing formula notation to work as well, so I > > can use y1 ~ y2, y1 ~ ., data = y2, or y1 ~ var + var2, data = y2. This > > is what I wanted all along, to extend my interface (not do anything to > > R's formulae), but to also work in the traditional sense. > > > > The model I am writing code for really is modelling the relationship > > between two matrices of data. In one version of the method, there is > > real equivalence between both sides of the formula so it would seem odd > > to treat the two sides of the formula differently. At least to me ;-) > > It seems that I may be responsible for one of these extensions (lhs as > a data.frame in cca and rda in vegan package). There the response (lhs) > is multivariate or a multispecies community, and you must take that as > a whole without manipulation (and if you tried using VGAM you see there > really is painful to define lhs with, say, 127 elements). Hi Jari, Thanks for reminding me about this - I'd forgotten about not normally being able to have a data.frame on the lhs of the formula either - I'm surprised no-one pulled me up on that one before, either ;-) I guess what I'm proposing is really pushing the formula representation too far for some people. I'm coming round to the y1 ~ ., data = y2 way of doing things - still prefer y1 ~ y2 though ;-) Also, both y1 and y2 are community matrices (i.e. both have many, many species, aka variables for the non-community ecologists reading this). I'm not sure that it makes sense to treat the two sides differently. In the predictive co-correspondence mode (the default), multivariate pls is used to regress one matrix on another, with the number of pls components being chosen by cross-validation or a permutation test. > However, in > general you shouldn't use models where you use all the 'explanatory' > variables (rhs) that yo happen to have by accident. So much bad science > has been created with that approach even in your field, Gav. Well, I agree with you there... > The whole > idea of formula is the ability to choose from candidate variables. That > is: to build a model. Therefore you have one-sided formulae in prcomp() > and princomp(): you can say prcomp(~ x1 + log(x2) +x4, data) or > prcomp(~ . - x3, data). I think you should try to
Re: [Rd] problem using model.frame()
On Wed, 2005-08-17 at 21:48 -0400, Gabor Grothendieck wrote: > If its just a matter of specifying two data frames how about just > letting the user specify them as the first two arguments without > injecting formulas into it so that any of these are allowed but > data frames are still not allowed in formulas other than in the > data argument: > > yourfunction(df1, df2) > yourfunction(y ~ sp1 + sp2) > yourfunction(y ~., df) > > This could easily be implemented by having yourfunction be > generic in which case the first one would dispatch > yourfunction.data.frame and the second and third would > dispatch yourfunction.formula . Hi Gabor, yourfunction() is already generic, I have .default and .formula methods. The default implementation of the method (Co-correspondence analysis) is akin to a regression and uses a form of multivariate PLS. So one data matrix plays the role of the response and one the predictor. Which is the reason for wanting to use a formula interface. Cheers, G > On 8/17/05, Gavin Simpson <[EMAIL PROTECTED]> wrote: > > On Wed, 2005-08-17 at 20:24 +0200, Martin Maechler wrote: > > > > "GS" == Gavin Simpson <[EMAIL PROTECTED]> > > > > on Tue, 16 Aug 2005 18:44:23 +0100 writes: > > > > > > GS> On Tue, 2005-08-16 at 12:35 -0400, Gabor Grothendieck > > > GS> wrote: > > > >> On 8/16/05, Gavin Simpson <[EMAIL PROTECTED]> > > > >> wrote: > On Tue, 2005-08-16 at 11:25 -0400, Gabor > > > >> Grothendieck wrote: > > It can handle data frames like > > > >> this: > > > >> > > > > > >> > > model.frame(y1) > > or > > model.frame(~., y1) > > > >> > > > > >> > Thanks Gabor, > > > >> > > > > >> > Yes, I know that works, but I want the function > > > >> coca.formula to accept a > formula like this y2 ~ y1, > > > >> with both y1 and y2 being data frames. It is > > > >> > > > >> The expressions I gave work generally (i.e. lm, glm, > > > >> ...), not just in model.matrix, so would it be ok if the > > > >> user just does this? > > > >> > > > >> yourfunction(y2 ~., y1) > > > > > > GS> Thanks again Gabor for your comments, > > > > > > GS> I'd prefer the y1 ~ y2 as data frames - as this is the > > > GS> most natural way of doing things. I'd like to have (y2 > > > GS> ~., y1) as well, and (y2 ~ spp1 + spp2 + spp3, y1) also > > > GS> work - silently without any trouble. > > > > > > I'm sorry, Gavin, I tend to disagree quite a bit. > > > > > > The formula notation has quite a history in the S language, and > > > AFAIK never was the idea to use data.frames as formula > > > components, but rather as "environments" in which formula > > > components are looked up --- exactly as Gabor has explained. > > > > Hi Martin, thanks for your comments, > > > > But then one could have a matrix of variables on the rhs of the formula > > and it would work - whether this is a documented feature or un-intended > > side-effect of matrices being stored as vectors with dims, I don't know. > > > > And whilst the formula may have a long history, a number of packages > > have extended the interface to implement a specific feature, which don't > > work with standard functions like lm, glm and friends. I don't see how > > what I wanted to achieve is greatly different to that or using a matrix. > > > > > To break with such a deeply rooted principle, > > > you should have very very good reasons, because you're breaking > > > the concepts on which all other uses of formulae are based. > > > And this would potentially lead to much confusion of your users, > > > at least in the way they should learn to think about what > > > formulae mean. > > > > In the end I managed to treat y1 ~ y2 (both data frames) as a special > > case, which allows the existing formula notation to work as well, so I > > can use y1 ~ y2, y1 ~ ., data = y2, or y1 ~ var + var2, data = y2. This > > is what I wanted all along, to extend my interface (not do anything to > > R's formulae), but to also work in the traditional sense. > > > > The model I am writing code for really is modelling the relationship > > between two matrices of data. In one version of the method, there is > > real equivalence between both sides of the formula so it would seem odd > > to treat the two sides of the formula differently. At least to me ;-) > > > > > Martin > > > > > > > > > >> If it really is important to do it the way you describe, > > > >> are the data frames necessarily numeric? If so you could > > > >> preprocess your formula by placing as.matrix around all > > > >> the variables representing data frames using something > > > >> like this: > > > >> > > > >> > > > https://www.stat.math.ethz.ch/pipermail/r-help/2004-December/061485.html > > > > > > GS> Yes, they are numeric matrices (as data frames). I've > > > GS> looked at this, but I'd prefer to not have to do too > > > GS> much messing with the formula. > > > > > > >> Of co
Re: [Rd] problem using model.frame()
On 18 Aug 2005, at 1:49, Gavin Simpson wrote: > On Wed, 2005-08-17 at 20:24 +0200, Martin Maechler wrote: >>> "GS" == Gavin Simpson <[EMAIL PROTECTED]> >>> on Tue, 16 Aug 2005 18:44:23 +0100 writes: >> >> GS> On Tue, 2005-08-16 at 12:35 -0400, Gabor Grothendieck >> GS> wrote: On 8/16/05, Gavin Simpson <[EMAIL PROTECTED]> wrote: > On Tue, 2005-08-16 at 11:25 -0400, Gabor Grothendieck wrote: > > It can handle data frames like this: >> >> model.frame(y1) > > or > > model.frame(~., y1) > > Thanks Gabor, > > Yes, I know that works, but I want the function coca.formula to accept a > formula like this y2 ~ y1, with both y1 and y2 being data frames. It is The expressions I gave work generally (i.e. lm, glm, ...), not just in model.matrix, so would it be ok if the user just does this? yourfunction(y2 ~., y1) >> >> GS> Thanks again Gabor for your comments, >> >> GS> I'd prefer the y1 ~ y2 as data frames - as this is the >> GS> most natural way of doing things. I'd like to have (y2 >> GS> ~., y1) as well, and (y2 ~ spp1 + spp2 + spp3, y1) also >> GS> work - silently without any trouble. >> >> I'm sorry, Gavin, I tend to disagree quite a bit. >> >> The formula notation has quite a history in the S language, and >> AFAIK never was the idea to use data.frames as formula >> components, but rather as "environments" in which formula >> components are looked up --- exactly as Gabor has explained. > > Hi Martin, thanks for your comments, > > But then one could have a matrix of variables on the rhs of the formula > and it would work - whether this is a documented feature or un-intended > side-effect of matrices being stored as vectors with dims, I don't > know. > > And whilst the formula may have a long history, a number of packages > have extended the interface to implement a specific feature, which > don't > work with standard functions like lm, glm and friends. I don't see how > what I wanted to achieve is greatly different to that or using a > matrix. > >> To break with such a deeply rooted principle, >> you should have very very good reasons, because you're breaking >> the concepts on which all other uses of formulae are based. >> And this would potentially lead to much confusion of your users, >> at least in the way they should learn to think about what >> formulae mean. > > In the end I managed to treat y1 ~ y2 (both data frames) as a special > case, which allows the existing formula notation to work as well, so I > can use y1 ~ y2, y1 ~ ., data = y2, or y1 ~ var + var2, data = y2. This > is what I wanted all along, to extend my interface (not do anything to > R's formulae), but to also work in the traditional sense. > > The model I am writing code for really is modelling the relationship > between two matrices of data. In one version of the method, there is > real equivalence between both sides of the formula so it would seem odd > to treat the two sides of the formula differently. At least to me ;-) It seems that I may be responsible for one of these extensions (lhs as a data.frame in cca and rda in vegan package). There the response (lhs) is multivariate or a multispecies community, and you must take that as a whole without manipulation (and if you tried using VGAM you see there really is painful to define lhs with, say, 127 elements). However, in general you shouldn't use models where you use all the 'explanatory' variables (rhs) that yo happen to have by accident. So much bad science has been created with that approach even in your field, Gav. The whole idea of formula is the ability to choose from candidate variables. That is: to build a model. Therefore you have one-sided formulae in prcomp() and princomp(): you can say prcomp(~ x1 + log(x2) +x4, data) or prcomp(~ . - x3, data). I think you should try to keep it so. Do instead like Gabor suggested: you could have a function coca.default or coca.matrix with interface: coca.matrix(matx, maty, matz) -- or you can name this as coca.default. and coca.formula which essentially parses your formula and returns a list of matrices you need: coca.formula <- function(formula, data) { matricesout <- parsemyformula(formula, data) coca(matricesout$matx, matricesout$maty, matricesoutz) } Then you need the generic: coca <- function(...) UseMethod("coca") and it's done (but fails in R CMD check unless you add "..." in all specific functions...). The real work is always done in coca.matrix (or coca.default), and the others just chew your data into suitable form for your workhorse. If then somebody thinks that they need all possible variables as 'explanatory' variables (or perhaps constraints in your case), they just call the function as coca(matx, maty, matz) And if you have coca.data.frame they don't need 'quacking' with extra steps: coca.data.frame <- function(dfx, dfy dfz) coca(as.matrix(dfx), as.matrix(dfy), a
Re: [Rd] problem using model.frame()
If its just a matter of specifying two data frames how about just letting the user specify them as the first two arguments without injecting formulas into it so that any of these are allowed but data frames are still not allowed in formulas other than in the data argument: yourfunction(df1, df2) yourfunction(y ~ sp1 + sp2) yourfunction(y ~., df) This could easily be implemented by having yourfunction be generic in which case the first one would dispatch yourfunction.data.frame and the second and third would dispatch yourfunction.formula . On 8/17/05, Gavin Simpson <[EMAIL PROTECTED]> wrote: > On Wed, 2005-08-17 at 20:24 +0200, Martin Maechler wrote: > > > "GS" == Gavin Simpson <[EMAIL PROTECTED]> > > > on Tue, 16 Aug 2005 18:44:23 +0100 writes: > > > > GS> On Tue, 2005-08-16 at 12:35 -0400, Gabor Grothendieck > > GS> wrote: > > >> On 8/16/05, Gavin Simpson <[EMAIL PROTECTED]> > > >> wrote: > On Tue, 2005-08-16 at 11:25 -0400, Gabor > > >> Grothendieck wrote: > > It can handle data frames like > > >> this: > > >> > > > > >> > > model.frame(y1) > > or > > model.frame(~., y1) > > >> > > > >> > Thanks Gabor, > > >> > > > >> > Yes, I know that works, but I want the function > > >> coca.formula to accept a > formula like this y2 ~ y1, > > >> with both y1 and y2 being data frames. It is > > >> > > >> The expressions I gave work generally (i.e. lm, glm, > > >> ...), not just in model.matrix, so would it be ok if the > > >> user just does this? > > >> > > >> yourfunction(y2 ~., y1) > > > > GS> Thanks again Gabor for your comments, > > > > GS> I'd prefer the y1 ~ y2 as data frames - as this is the > > GS> most natural way of doing things. I'd like to have (y2 > > GS> ~., y1) as well, and (y2 ~ spp1 + spp2 + spp3, y1) also > > GS> work - silently without any trouble. > > > > I'm sorry, Gavin, I tend to disagree quite a bit. > > > > The formula notation has quite a history in the S language, and > > AFAIK never was the idea to use data.frames as formula > > components, but rather as "environments" in which formula > > components are looked up --- exactly as Gabor has explained. > > Hi Martin, thanks for your comments, > > But then one could have a matrix of variables on the rhs of the formula > and it would work - whether this is a documented feature or un-intended > side-effect of matrices being stored as vectors with dims, I don't know. > > And whilst the formula may have a long history, a number of packages > have extended the interface to implement a specific feature, which don't > work with standard functions like lm, glm and friends. I don't see how > what I wanted to achieve is greatly different to that or using a matrix. > > > To break with such a deeply rooted principle, > > you should have very very good reasons, because you're breaking > > the concepts on which all other uses of formulae are based. > > And this would potentially lead to much confusion of your users, > > at least in the way they should learn to think about what > > formulae mean. > > In the end I managed to treat y1 ~ y2 (both data frames) as a special > case, which allows the existing formula notation to work as well, so I > can use y1 ~ y2, y1 ~ ., data = y2, or y1 ~ var + var2, data = y2. This > is what I wanted all along, to extend my interface (not do anything to > R's formulae), but to also work in the traditional sense. > > The model I am writing code for really is modelling the relationship > between two matrices of data. In one version of the method, there is > real equivalence between both sides of the formula so it would seem odd > to treat the two sides of the formula differently. At least to me ;-) > > > Martin > > > > > > >> If it really is important to do it the way you describe, > > >> are the data frames necessarily numeric? If so you could > > >> preprocess your formula by placing as.matrix around all > > >> the variables representing data frames using something > > >> like this: > > >> > > >> > > https://www.stat.math.ethz.ch/pipermail/r-help/2004-December/061485.html > > > > GS> Yes, they are numeric matrices (as data frames). I've > > GS> looked at this, but I'd prefer to not have to do too > > GS> much messing with the formula. > > > > >> Of course, if they are necessarily numeric maybe they can > > >> be matrices in the first place? > > > > GS> Because read.table etc. produce data.frames and this is > > GS> the natural way to work with data in R. > > > > but it is also slightly inefficient if they are numeric. > > There are places for data frames and for matrices. > > I agree - and in the code I've written, y1 and y2 quickly get coerced to > matrices before the real number crunching begins. > > However, all the other R modelling functions I have used work with > data.frames. Arguably, it could cause more confusion to write a function > tha
Re: [Rd] problem using model.frame()
On Wed, 2005-08-17 at 20:24 +0200, Martin Maechler wrote: > > "GS" == Gavin Simpson <[EMAIL PROTECTED]> > > on Tue, 16 Aug 2005 18:44:23 +0100 writes: > > GS> On Tue, 2005-08-16 at 12:35 -0400, Gabor Grothendieck > GS> wrote: > >> On 8/16/05, Gavin Simpson <[EMAIL PROTECTED]> > >> wrote: > On Tue, 2005-08-16 at 11:25 -0400, Gabor > >> Grothendieck wrote: > > It can handle data frames like > >> this: > >> > > > >> > > model.frame(y1) > > or > > model.frame(~., y1) > >> > > >> > Thanks Gabor, > >> > > >> > Yes, I know that works, but I want the function > >> coca.formula to accept a > formula like this y2 ~ y1, > >> with both y1 and y2 being data frames. It is > >> > >> The expressions I gave work generally (i.e. lm, glm, > >> ...), not just in model.matrix, so would it be ok if the > >> user just does this? > >> > >> yourfunction(y2 ~., y1) > > GS> Thanks again Gabor for your comments, > > GS> I'd prefer the y1 ~ y2 as data frames - as this is the > GS> most natural way of doing things. I'd like to have (y2 > GS> ~., y1) as well, and (y2 ~ spp1 + spp2 + spp3, y1) also > GS> work - silently without any trouble. > > I'm sorry, Gavin, I tend to disagree quite a bit. > > The formula notation has quite a history in the S language, and > AFAIK never was the idea to use data.frames as formula > components, but rather as "environments" in which formula > components are looked up --- exactly as Gabor has explained. Hi Martin, thanks for your comments, But then one could have a matrix of variables on the rhs of the formula and it would work - whether this is a documented feature or un-intended side-effect of matrices being stored as vectors with dims, I don't know. And whilst the formula may have a long history, a number of packages have extended the interface to implement a specific feature, which don't work with standard functions like lm, glm and friends. I don't see how what I wanted to achieve is greatly different to that or using a matrix. > To break with such a deeply rooted principle, > you should have very very good reasons, because you're breaking > the concepts on which all other uses of formulae are based. > And this would potentially lead to much confusion of your users, > at least in the way they should learn to think about what > formulae mean. In the end I managed to treat y1 ~ y2 (both data frames) as a special case, which allows the existing formula notation to work as well, so I can use y1 ~ y2, y1 ~ ., data = y2, or y1 ~ var + var2, data = y2. This is what I wanted all along, to extend my interface (not do anything to R's formulae), but to also work in the traditional sense. The model I am writing code for really is modelling the relationship between two matrices of data. In one version of the method, there is real equivalence between both sides of the formula so it would seem odd to treat the two sides of the formula differently. At least to me ;-) > Martin > > > >> If it really is important to do it the way you describe, > >> are the data frames necessarily numeric? If so you could > >> preprocess your formula by placing as.matrix around all > >> the variables representing data frames using something > >> like this: > >> > >> > https://www.stat.math.ethz.ch/pipermail/r-help/2004-December/061485.html > > GS> Yes, they are numeric matrices (as data frames). I've > GS> looked at this, but I'd prefer to not have to do too > GS> much messing with the formula. > > >> Of course, if they are necessarily numeric maybe they can > >> be matrices in the first place? > > GS> Because read.table etc. produce data.frames and this is > GS> the natural way to work with data in R. > > but it is also slightly inefficient if they are numeric. > There are places for data frames and for matrices. I agree - and in the code I've written, y1 and y2 quickly get coerced to matrices before the real number crunching begins. However, all the other R modelling functions I have used work with data.frames. Arguably, it could cause more confusion to write a function that looked, walked and quacked like an R modelling function but needed the user to apply an extra step to use - a step not usually required under normal R usage. All the best, Gav > Why should it be a problem to use > M <- as.matrix(read.table(..)) > ? > > For large files, it could be quite a bit more efficient, > needing a bit more of code, to > use scan() to read the numeric data directly : > > h1 <- scan(..., n=1) ## > nc <- length(h1) > a <- matrix(scan(, what = numeric(), ...), > ncol = nc, dimnames = list(NULL, h1)) > > maybe this would be useful to be packaged into > a small utility with usage > > read.matrix(..., type = numeric(), ...) > > > GS> Following your suggestions, I altered my code to >
Re: [Rd] problem using model.frame()
> "GS" == Gavin Simpson <[EMAIL PROTECTED]> > on Tue, 16 Aug 2005 18:44:23 +0100 writes: GS> On Tue, 2005-08-16 at 12:35 -0400, Gabor Grothendieck GS> wrote: >> On 8/16/05, Gavin Simpson <[EMAIL PROTECTED]> >> wrote: > On Tue, 2005-08-16 at 11:25 -0400, Gabor >> Grothendieck wrote: > > It can handle data frames like >> this: >> > > >> > > model.frame(y1) > > or > > model.frame(~., y1) >> > >> > Thanks Gabor, >> > >> > Yes, I know that works, but I want the function >> coca.formula to accept a > formula like this y2 ~ y1, >> with both y1 and y2 being data frames. It is >> >> The expressions I gave work generally (i.e. lm, glm, >> ...), not just in model.matrix, so would it be ok if the >> user just does this? >> >> yourfunction(y2 ~., y1) GS> Thanks again Gabor for your comments, GS> I'd prefer the y1 ~ y2 as data frames - as this is the GS> most natural way of doing things. I'd like to have (y2 GS> ~., y1) as well, and (y2 ~ spp1 + spp2 + spp3, y1) also GS> work - silently without any trouble. I'm sorry, Gavin, I tend to disagree quite a bit. The formula notation has quite a history in the S language, and AFAIK never was the idea to use data.frames as formula components, but rather as "environments" in which formula components are looked up --- exactly as Gabor has explained. To break with such a deeply rooted principle, you should have very very good reasons, because you're breaking the concepts on which all other uses of formulae are based. And this would potentially lead to much confusion of your users, at least in the way they should learn to think about what formulae mean. Martin >> If it really is important to do it the way you describe, >> are the data frames necessarily numeric? If so you could >> preprocess your formula by placing as.matrix around all >> the variables representing data frames using something >> like this: >> >> https://www.stat.math.ethz.ch/pipermail/r-help/2004-December/061485.html GS> Yes, they are numeric matrices (as data frames). I've GS> looked at this, but I'd prefer to not have to do too GS> much messing with the formula. >> Of course, if they are necessarily numeric maybe they can >> be matrices in the first place? GS> Because read.table etc. produce data.frames and this is GS> the natural way to work with data in R. but it is also slightly inefficient if they are numeric. There are places for data frames and for matrices. Why should it be a problem to use M <- as.matrix(read.table(..)) ? For large files, it could be quite a bit more efficient, needing a bit more of code, to use scan() to read the numeric data directly : h1 <- scan(..., n=1) ## nc <- length(h1) a <- matrix(scan(, what = numeric(), ...), ncol = nc, dimnames = list(NULL, h1)) maybe this would be useful to be packaged into a small utility with usage read.matrix(..., type = numeric(), ...) GS> Following your suggestions, I altered my code to GS> evaluate the rhs of the formula and check if it was of GS> class "data.frame". If it is then I stop processing and GS> return it as a data.frame as this point. If not, it GS> eventually gets passed on to model.frame() for it to GS> deal with it. GS> So far - limited testing - it seems to do what I wanted GS> all along. I'm sure there's a gotcha in there somewhere GS> but at least the code runs so I can check for problems GS> against my examples. GS> Right, back to writing documentation... GS> G >> > more intuitive, to my mind at least for this particular >> example and > analysis, to specify the formula with a >> data frame on the rhs. >> > >> > model.frame doesn't work with the formula "~ y1" if the >> object y1, in > the environment when model.frame >> evaluates the formula, is a data.frame. > It works if y1 >> is a matrix, however. I'd like to work around this > >> problem, say by creating an environment in which y1 is >> modified to be a > matrix, if possible. Can this be done? >> > >> > At the moment I have something working by grabbing the >> bits of the > formula and then using get() to grab the >> named object. Of course, this > won't work if someone >> wants to use R's formula interface with the > following >> formula y2 ~ var1 + var2 + var3, data = y1, or to use the >> > subset argument common to many formula >> implementations. I'd like to have > the function work in >> as general a manner as possible, so I'm fishing > around >> for potential solutions. >> > >> > All the best, >> > >> > Gav >> > >> > > >> > > On 8/16/05, Gavin Simpson <[EMAIL PROTECTED]> >> wrote: > > > Hi I'm having a problem with model.frame, >> encapsulated in this e
Re: [Rd] problem using model.frame()
On Tue, 2005-08-16 at 12:35 -0400, Gabor Grothendieck wrote: > On 8/16/05, Gavin Simpson <[EMAIL PROTECTED]> wrote: > > On Tue, 2005-08-16 at 11:25 -0400, Gabor Grothendieck wrote: > > > It can handle data frames like this: > > > > > > model.frame(y1) > > > or > > > model.frame(~., y1) > > > > Thanks Gabor, > > > > Yes, I know that works, but I want the function coca.formula to accept a > > formula like this y2 ~ y1, with both y1 and y2 being data frames. It is > > The expressions I gave work generally (i.e. lm, glm, ...), not just in > model.matrix, so would it be ok if the user just does this? > > yourfunction(y2 ~., y1) Thanks again Gabor for your comments, I'd prefer the y1 ~ y2 as data frames - as this is the most natural way of doing things. I'd like to have (y2 ~., y1) as well, and (y2 ~ spp1 + spp2 + spp3, y1) also work - silently without any trouble. > If it really is important to do it the way you describe, are the data > frames necessarily numeric? If so you could preprocess your formula > by placing as.matrix around all the variables representing data frames > using something like this: > > https://www.stat.math.ethz.ch/pipermail/r-help/2004-December/061485.html Yes, they are numeric matrices (as data frames). I've looked at this, but I'd prefer to not have to do too much messing with the formula. > Of course, if they are necessarily numeric maybe they can be matrices in > the first place? Because read.table etc. produce data.frames and this is the natural way to work with data in R. Following your suggestions, I altered my code to evaluate the rhs of the formula and check if it was of class "data.frame". If it is then I stop processing and return it as a data.frame as this point. If not, it eventually gets passed on to model.frame() for it to deal with it. So far - limited testing - it seems to do what I wanted all along. I'm sure there's a gotcha in there somewhere but at least the code runs so I can check for problems against my examples. Right, back to writing documentation... G > > more intuitive, to my mind at least for this particular example and > > analysis, to specify the formula with a data frame on the rhs. > > > > model.frame doesn't work with the formula "~ y1" if the object y1, in > > the environment when model.frame evaluates the formula, is a data.frame. > > It works if y1 is a matrix, however. I'd like to work around this > > problem, say by creating an environment in which y1 is modified to be a > > matrix, if possible. Can this be done? > > > > At the moment I have something working by grabbing the bits of the > > formula and then using get() to grab the named object. Of course, this > > won't work if someone wants to use R's formula interface with the > > following formula y2 ~ var1 + var2 + var3, data = y1, or to use the > > subset argument common to many formula implementations. I'd like to have > > the function work in as general a manner as possible, so I'm fishing > > around for potential solutions. > > > > All the best, > > > > Gav > > > > > > > > On 8/16/05, Gavin Simpson <[EMAIL PROTECTED]> wrote: > > > > Hi I'm having a problem with model.frame, encapsulated in this example: > > > > > > > > y1 <- matrix(c(3,1,0,1,0,1,1,0,0,0,1,0,0,0,1,1,0,1,1,1), > > > > nrow = 5, byrow = TRUE) > > > > y1 <- as.data.frame(y1) > > > > rownames(y1) <- paste("site", 1:5, sep = "") > > > > colnames(y1) <- paste("spp", 1:4, sep = "") > > > > y1 > > > > > > > > model.frame(~ y1) > > > > Error in model.frame(formula, rownames, variables, varnames, extras, > > > > extranames, : > > > >invalid variable type > > > > > > > > temp <- as.matrix(y1) > > > > model.frame(~ temp) > > > > temp.spp1 temp.spp2 temp.spp3 temp.spp4 > > > > 1 3 1 0 1 > > > > 2 0 1 1 0 > > > > 3 0 0 1 0 > > > > 4 0 0 1 1 > > > > 5 0 1 1 1 > > > > > > > > Ideally the above wouldn't have names like temp.var1, temp.var2, but one > > > > could deal with that later. > > > > > > > > I have tracked down the source of the error message to line 1330 in > > > > model.c - here I'm stumped as I don't know any C, but it looks as if the > > > > code is looping over the variables in the formula and checking of they > > > > are the right "type". So a matrix of variables gets through, but a > > > > data.frame doesn't. > > > > > > > > It would be good if model.frame could cope with data.frames in formulae, > > > > but seeing as I am incapable of providing a patch, is there a way around > > > > this problem? > > > > > > > > Below is the head of the function I am currently using, including the > > > > function for parsing the formula - borrowed and hacked from > > > > ordiParseFormula() in package vegan. > > > > > > > > I can work out the class of the rhs of the forumla. Is there a way to > > > > create a suitable environment
Re: [Rd] problem using model.frame()
On 8/16/05, Gavin Simpson <[EMAIL PROTECTED]> wrote: > On Tue, 2005-08-16 at 11:25 -0400, Gabor Grothendieck wrote: > > It can handle data frames like this: > > > > model.frame(y1) > > or > > model.frame(~., y1) > > Thanks Gabor, > > Yes, I know that works, but I want the function coca.formula to accept a > formula like this y2 ~ y1, with both y1 and y2 being data frames. It is The expressions I gave work generally (i.e. lm, glm, ...), not just in model.matrix, so would it be ok if the user just does this? yourfunction(y2 ~., y1) If it really is important to do it the way you describe, are the data frames necessarily numeric? If so you could preprocess your formula by placing as.matrix around all the variables representing data frames using something like this: https://www.stat.math.ethz.ch/pipermail/r-help/2004-December/061485.html Of course, if they are necessarily numeric maybe they can be matrices in the first place? > more intuitive, to my mind at least for this particular example and > analysis, to specify the formula with a data frame on the rhs. > > model.frame doesn't work with the formula "~ y1" if the object y1, in > the environment when model.frame evaluates the formula, is a data.frame. > It works if y1 is a matrix, however. I'd like to work around this > problem, say by creating an environment in which y1 is modified to be a > matrix, if possible. Can this be done? > > At the moment I have something working by grabbing the bits of the > formula and then using get() to grab the named object. Of course, this > won't work if someone wants to use R's formula interface with the > following formula y2 ~ var1 + var2 + var3, data = y1, or to use the > subset argument common to many formula implementations. I'd like to have > the function work in as general a manner as possible, so I'm fishing > around for potential solutions. > > All the best, > > Gav > > > > > On 8/16/05, Gavin Simpson <[EMAIL PROTECTED]> wrote: > > > Hi I'm having a problem with model.frame, encapsulated in this example: > > > > > > y1 <- matrix(c(3,1,0,1,0,1,1,0,0,0,1,0,0,0,1,1,0,1,1,1), > > > nrow = 5, byrow = TRUE) > > > y1 <- as.data.frame(y1) > > > rownames(y1) <- paste("site", 1:5, sep = "") > > > colnames(y1) <- paste("spp", 1:4, sep = "") > > > y1 > > > > > > model.frame(~ y1) > > > Error in model.frame(formula, rownames, variables, varnames, extras, > > > extranames, : > > >invalid variable type > > > > > > temp <- as.matrix(y1) > > > model.frame(~ temp) > > > temp.spp1 temp.spp2 temp.spp3 temp.spp4 > > > 1 3 1 0 1 > > > 2 0 1 1 0 > > > 3 0 0 1 0 > > > 4 0 0 1 1 > > > 5 0 1 1 1 > > > > > > Ideally the above wouldn't have names like temp.var1, temp.var2, but one > > > could deal with that later. > > > > > > I have tracked down the source of the error message to line 1330 in > > > model.c - here I'm stumped as I don't know any C, but it looks as if the > > > code is looping over the variables in the formula and checking of they > > > are the right "type". So a matrix of variables gets through, but a > > > data.frame doesn't. > > > > > > It would be good if model.frame could cope with data.frames in formulae, > > > but seeing as I am incapable of providing a patch, is there a way around > > > this problem? > > > > > > Below is the head of the function I am currently using, including the > > > function for parsing the formula - borrowed and hacked from > > > ordiParseFormula() in package vegan. > > > > > > I can work out the class of the rhs of the forumla. Is there a way to > > > create a suitable environment for the data argument of parseFormula() > > > such that it contains the rhs dataframe coerced to a matrix, which then > > > should get through model.frame.default without error? How would I go > > > about manipulating/creating such an environment? Any other ideas? > > > > > > Thanks in advance > > > > > > Gav > > > > > > coca.formula <- function(formula, method = c("predictive", "symmetric"), > > > reg.method = c("simpls", "eigen"), weights = NULL, > > > n.axes = NULL, symmetric = FALSE, data) > > > { > > >parseFormula <- function (formula, data) > > > { > > >browser() > > >Terms <- terms(formula, "Condition", data = data) > > >flapart <- fla <- formula <- formula(Terms, width.cutoff = 500) > > >specdata <- formula[[2]] > > >X <- eval(specdata, data, parent.frame()) > > >X <- as.matrix(X) > > >formula[[2]] <- NULL > > >if (formula[[2]] == "1" || formula[[2]] == "0") > > > Y <- NULL > > >else { > > > mf <- model.frame(formula, data, na.action = na.fail) > > > Y <- model.matrix(formula, mf) > > > if (any(colnames(Y) == "(Intercept)")) { > > >
Re: [Rd] problem using model.frame()
On Tue, 2005-08-16 at 11:25 -0400, Gabor Grothendieck wrote: > It can handle data frames like this: > > model.frame(y1) > or > model.frame(~., y1) Thanks Gabor, Yes, I know that works, but I want the function coca.formula to accept a formula like this y2 ~ y1, with both y1 and y2 being data frames. It is more intuitive, to my mind at least for this particular example and analysis, to specify the formula with a data frame on the rhs. model.frame doesn't work with the formula "~ y1" if the object y1, in the environment when model.frame evaluates the formula, is a data.frame. It works if y1 is a matrix, however. I'd like to work around this problem, say by creating an environment in which y1 is modified to be a matrix, if possible. Can this be done? At the moment I have something working by grabbing the bits of the formula and then using get() to grab the named object. Of course, this won't work if someone wants to use R's formula interface with the following formula y2 ~ var1 + var2 + var3, data = y1, or to use the subset argument common to many formula implementations. I'd like to have the function work in as general a manner as possible, so I'm fishing around for potential solutions. All the best, Gav > > On 8/16/05, Gavin Simpson <[EMAIL PROTECTED]> wrote: > > Hi I'm having a problem with model.frame, encapsulated in this example: > > > > y1 <- matrix(c(3,1,0,1,0,1,1,0,0,0,1,0,0,0,1,1,0,1,1,1), > > nrow = 5, byrow = TRUE) > > y1 <- as.data.frame(y1) > > rownames(y1) <- paste("site", 1:5, sep = "") > > colnames(y1) <- paste("spp", 1:4, sep = "") > > y1 > > > > model.frame(~ y1) > > Error in model.frame(formula, rownames, variables, varnames, extras, > > extranames, : > >invalid variable type > > > > temp <- as.matrix(y1) > > model.frame(~ temp) > > temp.spp1 temp.spp2 temp.spp3 temp.spp4 > > 1 3 1 0 1 > > 2 0 1 1 0 > > 3 0 0 1 0 > > 4 0 0 1 1 > > 5 0 1 1 1 > > > > Ideally the above wouldn't have names like temp.var1, temp.var2, but one > > could deal with that later. > > > > I have tracked down the source of the error message to line 1330 in > > model.c - here I'm stumped as I don't know any C, but it looks as if the > > code is looping over the variables in the formula and checking of they > > are the right "type". So a matrix of variables gets through, but a > > data.frame doesn't. > > > > It would be good if model.frame could cope with data.frames in formulae, > > but seeing as I am incapable of providing a patch, is there a way around > > this problem? > > > > Below is the head of the function I am currently using, including the > > function for parsing the formula - borrowed and hacked from > > ordiParseFormula() in package vegan. > > > > I can work out the class of the rhs of the forumla. Is there a way to > > create a suitable environment for the data argument of parseFormula() > > such that it contains the rhs dataframe coerced to a matrix, which then > > should get through model.frame.default without error? How would I go > > about manipulating/creating such an environment? Any other ideas? > > > > Thanks in advance > > > > Gav > > > > coca.formula <- function(formula, method = c("predictive", "symmetric"), > > reg.method = c("simpls", "eigen"), weights = NULL, > > n.axes = NULL, symmetric = FALSE, data) > > { > >parseFormula <- function (formula, data) > > { > >browser() > >Terms <- terms(formula, "Condition", data = data) > >flapart <- fla <- formula <- formula(Terms, width.cutoff = 500) > >specdata <- formula[[2]] > >X <- eval(specdata, data, parent.frame()) > >X <- as.matrix(X) > >formula[[2]] <- NULL > >if (formula[[2]] == "1" || formula[[2]] == "0") > > Y <- NULL > >else { > > mf <- model.frame(formula, data, na.action = na.fail) > > Y <- model.matrix(formula, mf) > > if (any(colnames(Y) == "(Intercept)")) { > >xint <- which(colnames(Y) == "(Intercept)") > >Y <- Y[, -xint, drop = FALSE] > > } > >} > >list(X = X, Y = Y) > > } > >if (missing(data)) > > data <- parent.frame() > >#browser() > >dat <- parseFormula(formula, data) > > > > -- > > %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% > > Gavin Simpson [T] +44 (0)20 7679 5522 > > ENSIS Research Fellow [F] +44 (0)20 7679 7565 > > ENSIS Ltd. & ECRC [E] gavin.simpsonATNOSPAMucl.ac.uk > > UCL Department of Geography [W] http://www.ucl.ac.uk/~ucfagls/cv/ > > 26 Bedford Way[W] http://www.ucl.ac.uk/~ucfagls/ > > London. WC1H 0AP. > > %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Re: [Rd] problem using model.frame()
It can handle data frames like this: model.frame(y1) or model.frame(~., y1) On 8/16/05, Gavin Simpson <[EMAIL PROTECTED]> wrote: > Hi I'm having a problem with model.frame, encapsulated in this example: > > y1 <- matrix(c(3,1,0,1,0,1,1,0,0,0,1,0,0,0,1,1,0,1,1,1), > nrow = 5, byrow = TRUE) > y1 <- as.data.frame(y1) > rownames(y1) <- paste("site", 1:5, sep = "") > colnames(y1) <- paste("spp", 1:4, sep = "") > y1 > > model.frame(~ y1) > Error in model.frame(formula, rownames, variables, varnames, extras, > extranames, : >invalid variable type > > temp <- as.matrix(y1) > model.frame(~ temp) > temp.spp1 temp.spp2 temp.spp3 temp.spp4 > 1 3 1 0 1 > 2 0 1 1 0 > 3 0 0 1 0 > 4 0 0 1 1 > 5 0 1 1 1 > > Ideally the above wouldn't have names like temp.var1, temp.var2, but one > could deal with that later. > > I have tracked down the source of the error message to line 1330 in > model.c - here I'm stumped as I don't know any C, but it looks as if the > code is looping over the variables in the formula and checking of they > are the right "type". So a matrix of variables gets through, but a > data.frame doesn't. > > It would be good if model.frame could cope with data.frames in formulae, > but seeing as I am incapable of providing a patch, is there a way around > this problem? > > Below is the head of the function I am currently using, including the > function for parsing the formula - borrowed and hacked from > ordiParseFormula() in package vegan. > > I can work out the class of the rhs of the forumla. Is there a way to > create a suitable environment for the data argument of parseFormula() > such that it contains the rhs dataframe coerced to a matrix, which then > should get through model.frame.default without error? How would I go > about manipulating/creating such an environment? Any other ideas? > > Thanks in advance > > Gav > > coca.formula <- function(formula, method = c("predictive", "symmetric"), > reg.method = c("simpls", "eigen"), weights = NULL, > n.axes = NULL, symmetric = FALSE, data) > { >parseFormula <- function (formula, data) > { >browser() >Terms <- terms(formula, "Condition", data = data) >flapart <- fla <- formula <- formula(Terms, width.cutoff = 500) >specdata <- formula[[2]] >X <- eval(specdata, data, parent.frame()) >X <- as.matrix(X) >formula[[2]] <- NULL >if (formula[[2]] == "1" || formula[[2]] == "0") > Y <- NULL >else { > mf <- model.frame(formula, data, na.action = na.fail) > Y <- model.matrix(formula, mf) > if (any(colnames(Y) == "(Intercept)")) { >xint <- which(colnames(Y) == "(Intercept)") >Y <- Y[, -xint, drop = FALSE] > } >} >list(X = X, Y = Y) > } >if (missing(data)) > data <- parent.frame() >#browser() >dat <- parseFormula(formula, data) > > -- > %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% > Gavin Simpson [T] +44 (0)20 7679 5522 > ENSIS Research Fellow [F] +44 (0)20 7679 7565 > ENSIS Ltd. & ECRC [E] gavin.simpsonATNOSPAMucl.ac.uk > UCL Department of Geography [W] http://www.ucl.ac.uk/~ucfagls/cv/ > 26 Bedford Way[W] http://www.ucl.ac.uk/~ucfagls/ > London. WC1H 0AP. > %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel