Re: [Rd] Problem using model.frame with argument subset in own function
Gavin, I ran into the same cryptic invalid subscript type 'closure' message in a slightly less complicated scenario, and wanted to post the cause in my case (the root cause is probably the same either way). Similarly to your case, I was subsetting a data frame. I had a list of variable names corresponding to columns in the frame. Unfortunately the variable name I had assigned to this list, var, coincided with the name of a base package function in R for variance. When I attempted to subset df[, var], I got the 'closure' error message, but if I renamed the list of variable names so the collision didn't occur, e.g. df[, vars] instead of df[, var], it worked as expected. Sincerely, Greg B. Hill Gavin Simpson wrote: Dear List, I am writing a formula method for a function in a package I maintain. I want the method to return a data.frame that potentially only contains some of the variables in 'data', as specified by the formula. The problem I am having is in writing the function and wrapping it around model.frame. Consider the following data frame: dat - data.frame(A = runif(10), B = runif(10), C = runif(10)) And the wrapper function: foo - function(formula, data = NULL, ..., subset = NULL, na.action = na.pass) { mt - terms(formula, data = data, simplify = TRUE) mf - model.frame(formula(mt), data = data, subset = subset, na.action = na.action) ## real function would do more stuff here and pass mf on to ## other functions mf } This is how I envisage the function being called. The real world use would have a data.frame with tens or hundreds of components where only a few need to be excluded. Hence wanting formulas of the form below to work. foo(~ . - B, data = dat) The aim is to return only columns A and C in an object returned by model.frame. However, when I run the above, I get the following error: foo(~ A + B, data = dat) Error in xj[i] : invalid subscript type 'closure' I've tracked this down to the line in model.frame.default subset - eval(substitute(subset), data, env) After evaluating this line, subset contains: Browse[1] subset function (x, ...) UseMethod(subset) environment: namespace:base Not NULL, and hence the error later on when calling the internal model.frame code. So the question is, what am I doing wrong? If I leave the subset argument out of the definition of foo and rely upon the default in model.frame.default, the function works as expected. Perhaps the question should be, how do I modify foo() to allow it to have a formal subset argument, passed to model.frame? Any other suggestions gratefully accepted. Thanks in advance, G -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- View this message in context: http://www.nabble.com/Problem-using-model.frame-with-argument-subset-in-own-function-tp24880908p25373059.html Sent from the R devel mailing list archive at Nabble.com. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Problem using model.frame with argument subset in own function
On Sat, Aug 8, 2009 at 1:31 PM, Gavin Simpsongavin.simp...@ucl.ac.uk wrote: Dear List, I am writing a formula method for a function in a package I maintain. I want the method to return a data.frame that potentially only contains some of the variables in 'data', as specified by the formula. The usual way to call model.frame (the method that Thomas Lumley has called the standard, non-standard evaluation) is to match the call to foo, replace the name of the function being called with as.name(model.frame) and force an evaluation in the parent frame. it looks like mf - match.call() if (missing(data)) data - environment(formula) ## evaluate and install the model frame m - match(c(formula, data, subset, weights, na.action, offset), names(mf), 0) mf - mf[c(1, m)] mf$drop.unused.levels - TRUE mf[[1]] - as.name(model.frame) fr - eval(mf, parent.frame()) The point of all of this manipulation is to achieve the kind of result you need where the subset argument is evaluated in the correct environmnent. The problem I am having is in writing the function and wrapping it around model.frame. Consider the following data frame: dat - data.frame(A = runif(10), B = runif(10), C = runif(10)) And the wrapper function: foo - function(formula, data = NULL, ..., subset = NULL, na.action = na.pass) { mt - terms(formula, data = data, simplify = TRUE) mf - model.frame(formula(mt), data = data, subset = subset, na.action = na.action) ## real function would do more stuff here and pass mf on to ## other functions mf } This is how I envisage the function being called. The real world use would have a data.frame with tens or hundreds of components where only a few need to be excluded. Hence wanting formulas of the form below to work. foo(~ . - B, data = dat) The aim is to return only columns A and C in an object returned by model.frame. However, when I run the above, I get the following error: foo(~ A + B, data = dat) Error in xj[i] : invalid subscript type 'closure' I've tracked this down to the line in model.frame.default subset - eval(substitute(subset), data, env) After evaluating this line, subset contains: Browse[1] subset function (x, ...) UseMethod(subset) environment: namespace:base Not NULL, and hence the error later on when calling the internal model.frame code. So the question is, what am I doing wrong? If I leave the subset argument out of the definition of foo and rely upon the default in model.frame.default, the function works as expected. Perhaps the question should be, how do I modify foo() to allow it to have a formal subset argument, passed to model.frame? Any other suggestions gratefully accepted. Thanks in advance, G -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Problem using model.frame with argument subset in own function
On Sun, 2009-08-09 at 11:32 -0500, Douglas Bates wrote: On Sat, Aug 8, 2009 at 1:31 PM, Gavin Simpsongavin.simp...@ucl.ac.uk wrote: Dear List, I am writing a formula method for a function in a package I maintain. I want the method to return a data.frame that potentially only contains some of the variables in 'data', as specified by the formula. The usual way to call model.frame (the method that Thomas Lumley has called the standard, non-standard evaluation) is to match the call to foo, replace the name of the function being called with as.name(model.frame) and force an evaluation in the parent frame. it looks like Thanks Doug. I also received an off-list reply from Brian Ripley suggesting two alternative approaches. The bit I was missing was how to manipulate other aspects of the call - it hadn't clicked that the arguments of the function can be manipulated by altering the components of the matched call. In the end I came up with something like: mf - match.call() mf[[1]] - as.name(model.frame) mt - terms(formula, data = data, simplify = TRUE) mf[[2]] - formula(mt, data = data) mf$na.action - substitute(na.action) dots - list(...) mf[[names(dots)]] - NULL mf - eval(mf,parent.frame()) tran.default(mf, ...) which seems to be working in the tests I have been running, allowing me to pass along some components of the call to model.frame, whilst reserving ... for the default methods arguments, and also get the simplified formula. All the best, G mf - match.call() if (missing(data)) data - environment(formula) ## evaluate and install the model frame m - match(c(formula, data, subset, weights, na.action, offset), names(mf), 0) mf - mf[c(1, m)] mf$drop.unused.levels - TRUE mf[[1]] - as.name(model.frame) fr - eval(mf, parent.frame()) The point of all of this manipulation is to achieve the kind of result you need where the subset argument is evaluated in the correct environmnent. The problem I am having is in writing the function and wrapping it around model.frame. Consider the following data frame: dat - data.frame(A = runif(10), B = runif(10), C = runif(10)) And the wrapper function: foo - function(formula, data = NULL, ..., subset = NULL, na.action = na.pass) { mt - terms(formula, data = data, simplify = TRUE) mf - model.frame(formula(mt), data = data, subset = subset, na.action = na.action) ## real function would do more stuff here and pass mf on to ## other functions mf } This is how I envisage the function being called. The real world use would have a data.frame with tens or hundreds of components where only a few need to be excluded. Hence wanting formulas of the form below to work. foo(~ . - B, data = dat) The aim is to return only columns A and C in an object returned by model.frame. However, when I run the above, I get the following error: foo(~ A + B, data = dat) Error in xj[i] : invalid subscript type 'closure' I've tracked this down to the line in model.frame.default subset - eval(substitute(subset), data, env) After evaluating this line, subset contains: Browse[1] subset function (x, ...) UseMethod(subset) environment: namespace:base Not NULL, and hence the error later on when calling the internal model.frame code. So the question is, what am I doing wrong? If I leave the subset argument out of the definition of foo and rely upon the default in model.frame.default, the function works as expected. Perhaps the question should be, how do I modify foo() to allow it to have a formal subset argument, passed to model.frame? Any other suggestions gratefully accepted. Thanks in advance, G -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% __ R-devel@r-project.org mailing list
Re: [Rd] problem using model.frame()
On Thu, 2005-08-18 at 09:00 -0400, Gabor Grothendieck wrote: I think this one is a hard call. Designing software is a series of tradeoffs. Its nice to maintain consistency with the R base, but in case of extensions (rather than changing behavior) as in this case, the argument against the change carries less weight. The main problems with extensions are (1) that one has to remember which functions/packages have which extensions if one is to use them and (2) they can interfere with other future extensions. On the other hand, if one is using a particular package a lot then convenience features like this may be attractive. Also, packages are where authors have the freedom to try out new ideas and new functionality without being constrained. Perhaps, if the extension in question is added there could be a warning in the help file that this is a convenience feature of this particular package and is not generally available throughout R. Thanks again Gabor for another useful contribution to this debate. Also thanks to Martin, Gabor and Jari for their comments, ideas, suggestions and viewpoints. I still like y1 ~ y2 (both data frames), but during my bike ride to work this morning I considered both sides of the argument and my position has moved towards the R way of doing things - far be it for little old me to go against years of S-formula tradition. So I'll revert the code back to accepting y1 ~ ., data = y2 and leave it to throw an error for the rhs being a data frame case. Once again, thank you for helping me work through this dilemma. All the best, Gav On 8/18/05, Gavin Simpson [EMAIL PROTECTED] wrote: On Thu, 2005-08-18 at 07:57 +0300, Jari Oksanen wrote: On 18 Aug 2005, at 1:49, Gavin Simpson wrote: On Wed, 2005-08-17 at 20:24 +0200, Martin Maechler wrote: GS == Gavin Simpson [EMAIL PROTECTED] on Tue, 16 Aug 2005 18:44:23 +0100 writes: GS On Tue, 2005-08-16 at 12:35 -0400, Gabor Grothendieck GS wrote: On 8/16/05, Gavin Simpson [EMAIL PROTECTED] wrote: On Tue, 2005-08-16 at 11:25 -0400, Gabor Grothendieck wrote: It can handle data frames like this: model.frame(y1) or model.frame(~., y1) Thanks Gabor, Yes, I know that works, but I want the function coca.formula to accept a formula like this y2 ~ y1, with both y1 and y2 being data frames. It is The expressions I gave work generally (i.e. lm, glm, ...), not just in model.matrix, so would it be ok if the user just does this? yourfunction(y2 ~., y1) GS Thanks again Gabor for your comments, GS I'd prefer the y1 ~ y2 as data frames - as this is the GS most natural way of doing things. I'd like to have (y2 GS ~., y1) as well, and (y2 ~ spp1 + spp2 + spp3, y1) also GS work - silently without any trouble. I'm sorry, Gavin, I tend to disagree quite a bit. The formula notation has quite a history in the S language, and AFAIK never was the idea to use data.frames as formula components, but rather as environments in which formula components are looked up --- exactly as Gabor has explained. Hi Martin, thanks for your comments, But then one could have a matrix of variables on the rhs of the formula and it would work - whether this is a documented feature or un-intended side-effect of matrices being stored as vectors with dims, I don't know. And whilst the formula may have a long history, a number of packages have extended the interface to implement a specific feature, which don't work with standard functions like lm, glm and friends. I don't see how what I wanted to achieve is greatly different to that or using a matrix. To break with such a deeply rooted principle, you should have very very good reasons, because you're breaking the concepts on which all other uses of formulae are based. And this would potentially lead to much confusion of your users, at least in the way they should learn to think about what formulae mean. In the end I managed to treat y1 ~ y2 (both data frames) as a special case, which allows the existing formula notation to work as well, so I can use y1 ~ y2, y1 ~ ., data = y2, or y1 ~ var + var2, data = y2. This is what I wanted all along, to extend my interface (not do anything to R's formulae), but to also work in the traditional sense. The model I am writing code for really is modelling the relationship between two matrices of data. In one version of the method, there is real equivalence between both sides of the formula so it would seem odd to treat the two sides of the formula differently. At least to me ;-) It seems that I may be responsible for one of these extensions (lhs as a data.frame in cca and rda in vegan package). There the response (lhs) is multivariate or a multispecies
Re: [Rd] problem using model.frame()
GS == Gavin Simpson [EMAIL PROTECTED] on Tue, 16 Aug 2005 18:44:23 +0100 writes: GS On Tue, 2005-08-16 at 12:35 -0400, Gabor Grothendieck GS wrote: On 8/16/05, Gavin Simpson [EMAIL PROTECTED] wrote: On Tue, 2005-08-16 at 11:25 -0400, Gabor Grothendieck wrote: It can handle data frames like this: model.frame(y1) or model.frame(~., y1) Thanks Gabor, Yes, I know that works, but I want the function coca.formula to accept a formula like this y2 ~ y1, with both y1 and y2 being data frames. It is The expressions I gave work generally (i.e. lm, glm, ...), not just in model.matrix, so would it be ok if the user just does this? yourfunction(y2 ~., y1) GS Thanks again Gabor for your comments, GS I'd prefer the y1 ~ y2 as data frames - as this is the GS most natural way of doing things. I'd like to have (y2 GS ~., y1) as well, and (y2 ~ spp1 + spp2 + spp3, y1) also GS work - silently without any trouble. I'm sorry, Gavin, I tend to disagree quite a bit. The formula notation has quite a history in the S language, and AFAIK never was the idea to use data.frames as formula components, but rather as environments in which formula components are looked up --- exactly as Gabor has explained. To break with such a deeply rooted principle, you should have very very good reasons, because you're breaking the concepts on which all other uses of formulae are based. And this would potentially lead to much confusion of your users, at least in the way they should learn to think about what formulae mean. Martin If it really is important to do it the way you describe, are the data frames necessarily numeric? If so you could preprocess your formula by placing as.matrix around all the variables representing data frames using something like this: https://www.stat.math.ethz.ch/pipermail/r-help/2004-December/061485.html GS Yes, they are numeric matrices (as data frames). I've GS looked at this, but I'd prefer to not have to do too GS much messing with the formula. Of course, if they are necessarily numeric maybe they can be matrices in the first place? GS Because read.table etc. produce data.frames and this is GS the natural way to work with data in R. but it is also slightly inefficient if they are numeric. There are places for data frames and for matrices. Why should it be a problem to use M - as.matrix(read.table(..)) ? For large files, it could be quite a bit more efficient, needing a bit more of code, to use scan() to read the numeric data directly : h1 - scan(..., n=1) ## read variable names nc - length(h1) a - matrix(scan(, what = numeric(), ...), ncol = nc, dimnames = list(NULL, h1)) maybe this would be useful to be packaged into a small utility with usage read.matrix(..., type = numeric(), ...) GS Following your suggestions, I altered my code to GS evaluate the rhs of the formula and check if it was of GS class data.frame. If it is then I stop processing and GS return it as a data.frame as this point. If not, it GS eventually gets passed on to model.frame() for it to GS deal with it. GS So far - limited testing - it seems to do what I wanted GS all along. I'm sure there's a gotcha in there somewhere GS but at least the code runs so I can check for problems GS against my examples. GS Right, back to writing documentation... GS G more intuitive, to my mind at least for this particular example and analysis, to specify the formula with a data frame on the rhs. model.frame doesn't work with the formula ~ y1 if the object y1, in the environment when model.frame evaluates the formula, is a data.frame. It works if y1 is a matrix, however. I'd like to work around this problem, say by creating an environment in which y1 is modified to be a matrix, if possible. Can this be done? At the moment I have something working by grabbing the bits of the formula and then using get() to grab the named object. Of course, this won't work if someone wants to use R's formula interface with the following formula y2 ~ var1 + var2 + var3, data = y1, or to use the subset argument common to many formula implementations. I'd like to have the function work in as general a manner as possible, so I'm fishing around for potential solutions. All the best, Gav On 8/16/05, Gavin Simpson [EMAIL PROTECTED] wrote:Hi I'm having a problem with model.frame, encapsulated in this example: y1 - matrix(c(3,1,0,1,0,1,1,0,0,0,1,0,0,0,1,1,0,1,1,1), nrow = 5, byrow = TRUE)y1 - as.data.frame(y1) rownames(y1) - paste(site, 1:5, sep =
Re: [Rd] problem using model.frame()
If its just a matter of specifying two data frames how about just letting the user specify them as the first two arguments without injecting formulas into it so that any of these are allowed but data frames are still not allowed in formulas other than in the data argument: yourfunction(df1, df2) yourfunction(y ~ sp1 + sp2) yourfunction(y ~., df) This could easily be implemented by having yourfunction be generic in which case the first one would dispatch yourfunction.data.frame and the second and third would dispatch yourfunction.formula . On 8/17/05, Gavin Simpson [EMAIL PROTECTED] wrote: On Wed, 2005-08-17 at 20:24 +0200, Martin Maechler wrote: GS == Gavin Simpson [EMAIL PROTECTED] on Tue, 16 Aug 2005 18:44:23 +0100 writes: GS On Tue, 2005-08-16 at 12:35 -0400, Gabor Grothendieck GS wrote: On 8/16/05, Gavin Simpson [EMAIL PROTECTED] wrote: On Tue, 2005-08-16 at 11:25 -0400, Gabor Grothendieck wrote: It can handle data frames like this: model.frame(y1) or model.frame(~., y1) Thanks Gabor, Yes, I know that works, but I want the function coca.formula to accept a formula like this y2 ~ y1, with both y1 and y2 being data frames. It is The expressions I gave work generally (i.e. lm, glm, ...), not just in model.matrix, so would it be ok if the user just does this? yourfunction(y2 ~., y1) GS Thanks again Gabor for your comments, GS I'd prefer the y1 ~ y2 as data frames - as this is the GS most natural way of doing things. I'd like to have (y2 GS ~., y1) as well, and (y2 ~ spp1 + spp2 + spp3, y1) also GS work - silently without any trouble. I'm sorry, Gavin, I tend to disagree quite a bit. The formula notation has quite a history in the S language, and AFAIK never was the idea to use data.frames as formula components, but rather as environments in which formula components are looked up --- exactly as Gabor has explained. Hi Martin, thanks for your comments, But then one could have a matrix of variables on the rhs of the formula and it would work - whether this is a documented feature or un-intended side-effect of matrices being stored as vectors with dims, I don't know. And whilst the formula may have a long history, a number of packages have extended the interface to implement a specific feature, which don't work with standard functions like lm, glm and friends. I don't see how what I wanted to achieve is greatly different to that or using a matrix. To break with such a deeply rooted principle, you should have very very good reasons, because you're breaking the concepts on which all other uses of formulae are based. And this would potentially lead to much confusion of your users, at least in the way they should learn to think about what formulae mean. In the end I managed to treat y1 ~ y2 (both data frames) as a special case, which allows the existing formula notation to work as well, so I can use y1 ~ y2, y1 ~ ., data = y2, or y1 ~ var + var2, data = y2. This is what I wanted all along, to extend my interface (not do anything to R's formulae), but to also work in the traditional sense. The model I am writing code for really is modelling the relationship between two matrices of data. In one version of the method, there is real equivalence between both sides of the formula so it would seem odd to treat the two sides of the formula differently. At least to me ;-) Martin If it really is important to do it the way you describe, are the data frames necessarily numeric? If so you could preprocess your formula by placing as.matrix around all the variables representing data frames using something like this: https://www.stat.math.ethz.ch/pipermail/r-help/2004-December/061485.html GS Yes, they are numeric matrices (as data frames). I've GS looked at this, but I'd prefer to not have to do too GS much messing with the formula. Of course, if they are necessarily numeric maybe they can be matrices in the first place? GS Because read.table etc. produce data.frames and this is GS the natural way to work with data in R. but it is also slightly inefficient if they are numeric. There are places for data frames and for matrices. I agree - and in the code I've written, y1 and y2 quickly get coerced to matrices before the real number crunching begins. However, all the other R modelling functions I have used work with data.frames. Arguably, it could cause more confusion to write a function that looked, walked and quacked like an R modelling function but needed the user to apply an extra step to use - a step not usually required under normal R usage. All the best, Gav Why should it be a problem to use M - as.matrix(read.table(..)) ?
Re: [Rd] problem using model.frame()
On 18 Aug 2005, at 1:49, Gavin Simpson wrote: On Wed, 2005-08-17 at 20:24 +0200, Martin Maechler wrote: GS == Gavin Simpson [EMAIL PROTECTED] on Tue, 16 Aug 2005 18:44:23 +0100 writes: GS On Tue, 2005-08-16 at 12:35 -0400, Gabor Grothendieck GS wrote: On 8/16/05, Gavin Simpson [EMAIL PROTECTED] wrote: On Tue, 2005-08-16 at 11:25 -0400, Gabor Grothendieck wrote: It can handle data frames like this: model.frame(y1) or model.frame(~., y1) Thanks Gabor, Yes, I know that works, but I want the function coca.formula to accept a formula like this y2 ~ y1, with both y1 and y2 being data frames. It is The expressions I gave work generally (i.e. lm, glm, ...), not just in model.matrix, so would it be ok if the user just does this? yourfunction(y2 ~., y1) GS Thanks again Gabor for your comments, GS I'd prefer the y1 ~ y2 as data frames - as this is the GS most natural way of doing things. I'd like to have (y2 GS ~., y1) as well, and (y2 ~ spp1 + spp2 + spp3, y1) also GS work - silently without any trouble. I'm sorry, Gavin, I tend to disagree quite a bit. The formula notation has quite a history in the S language, and AFAIK never was the idea to use data.frames as formula components, but rather as environments in which formula components are looked up --- exactly as Gabor has explained. Hi Martin, thanks for your comments, But then one could have a matrix of variables on the rhs of the formula and it would work - whether this is a documented feature or un-intended side-effect of matrices being stored as vectors with dims, I don't know. And whilst the formula may have a long history, a number of packages have extended the interface to implement a specific feature, which don't work with standard functions like lm, glm and friends. I don't see how what I wanted to achieve is greatly different to that or using a matrix. To break with such a deeply rooted principle, you should have very very good reasons, because you're breaking the concepts on which all other uses of formulae are based. And this would potentially lead to much confusion of your users, at least in the way they should learn to think about what formulae mean. In the end I managed to treat y1 ~ y2 (both data frames) as a special case, which allows the existing formula notation to work as well, so I can use y1 ~ y2, y1 ~ ., data = y2, or y1 ~ var + var2, data = y2. This is what I wanted all along, to extend my interface (not do anything to R's formulae), but to also work in the traditional sense. The model I am writing code for really is modelling the relationship between two matrices of data. In one version of the method, there is real equivalence between both sides of the formula so it would seem odd to treat the two sides of the formula differently. At least to me ;-) It seems that I may be responsible for one of these extensions (lhs as a data.frame in cca and rda in vegan package). There the response (lhs) is multivariate or a multispecies community, and you must take that as a whole without manipulation (and if you tried using VGAM you see there really is painful to define lhs with, say, 127 elements). However, in general you shouldn't use models where you use all the 'explanatory' variables (rhs) that yo happen to have by accident. So much bad science has been created with that approach even in your field, Gav. The whole idea of formula is the ability to choose from candidate variables. That is: to build a model. Therefore you have one-sided formulae in prcomp() and princomp(): you can say prcomp(~ x1 + log(x2) +x4, data) or prcomp(~ . - x3, data). I think you should try to keep it so. Do instead like Gabor suggested: you could have a function coca.default or coca.matrix with interface: coca.matrix(matx, maty, matz) -- or you can name this as coca.default. and coca.formula which essentially parses your formula and returns a list of matrices you need: coca.formula - function(formula, data) { matricesout - parsemyformula(formula, data) coca(matricesout$matx, matricesout$maty, matricesoutz) } Then you need the generic: coca - function(...) UseMethod(coca) and it's done (but fails in R CMD check unless you add ... in all specific functions...). The real work is always done in coca.matrix (or coca.default), and the others just chew your data into suitable form for your workhorse. If then somebody thinks that they need all possible variables as 'explanatory' variables (or perhaps constraints in your case), they just call the function as coca(matx, maty, matz) And if you have coca.data.frame they don't need 'quacking' with extra steps: coca.data.frame - function(dfx, dfy dfz) coca(as.matrix(dfx), as.matrix(dfy), as.matrix(dfz)). This you call as coca(dfx, dfy, dfz) and there you go. The essential feature in formula is the ability to define the model. Don't give it away. cheers, jazza -- Jari Oksanen,
Re: [Rd] problem using model.frame()
On Tue, 2005-08-16 at 11:25 -0400, Gabor Grothendieck wrote: It can handle data frames like this: model.frame(y1) or model.frame(~., y1) Thanks Gabor, Yes, I know that works, but I want the function coca.formula to accept a formula like this y2 ~ y1, with both y1 and y2 being data frames. It is more intuitive, to my mind at least for this particular example and analysis, to specify the formula with a data frame on the rhs. model.frame doesn't work with the formula ~ y1 if the object y1, in the environment when model.frame evaluates the formula, is a data.frame. It works if y1 is a matrix, however. I'd like to work around this problem, say by creating an environment in which y1 is modified to be a matrix, if possible. Can this be done? At the moment I have something working by grabbing the bits of the formula and then using get() to grab the named object. Of course, this won't work if someone wants to use R's formula interface with the following formula y2 ~ var1 + var2 + var3, data = y1, or to use the subset argument common to many formula implementations. I'd like to have the function work in as general a manner as possible, so I'm fishing around for potential solutions. All the best, Gav On 8/16/05, Gavin Simpson [EMAIL PROTECTED] wrote: Hi I'm having a problem with model.frame, encapsulated in this example: y1 - matrix(c(3,1,0,1,0,1,1,0,0,0,1,0,0,0,1,1,0,1,1,1), nrow = 5, byrow = TRUE) y1 - as.data.frame(y1) rownames(y1) - paste(site, 1:5, sep = ) colnames(y1) - paste(spp, 1:4, sep = ) y1 model.frame(~ y1) Error in model.frame(formula, rownames, variables, varnames, extras, extranames, : invalid variable type temp - as.matrix(y1) model.frame(~ temp) temp.spp1 temp.spp2 temp.spp3 temp.spp4 1 3 1 0 1 2 0 1 1 0 3 0 0 1 0 4 0 0 1 1 5 0 1 1 1 Ideally the above wouldn't have names like temp.var1, temp.var2, but one could deal with that later. I have tracked down the source of the error message to line 1330 in model.c - here I'm stumped as I don't know any C, but it looks as if the code is looping over the variables in the formula and checking of they are the right type. So a matrix of variables gets through, but a data.frame doesn't. It would be good if model.frame could cope with data.frames in formulae, but seeing as I am incapable of providing a patch, is there a way around this problem? Below is the head of the function I am currently using, including the function for parsing the formula - borrowed and hacked from ordiParseFormula() in package vegan. I can work out the class of the rhs of the forumla. Is there a way to create a suitable environment for the data argument of parseFormula() such that it contains the rhs dataframe coerced to a matrix, which then should get through model.frame.default without error? How would I go about manipulating/creating such an environment? Any other ideas? Thanks in advance Gav coca.formula - function(formula, method = c(predictive, symmetric), reg.method = c(simpls, eigen), weights = NULL, n.axes = NULL, symmetric = FALSE, data) { parseFormula - function (formula, data) { browser() Terms - terms(formula, Condition, data = data) flapart - fla - formula - formula(Terms, width.cutoff = 500) specdata - formula[[2]] X - eval(specdata, data, parent.frame()) X - as.matrix(X) formula[[2]] - NULL if (formula[[2]] == 1 || formula[[2]] == 0) Y - NULL else { mf - model.frame(formula, data, na.action = na.fail) Y - model.matrix(formula, mf) if (any(colnames(Y) == (Intercept))) { xint - which(colnames(Y) == (Intercept)) Y - Y[, -xint, drop = FALSE] } } list(X = X, Y = Y) } if (missing(data)) data - parent.frame() #browser() dat - parseFormula(formula, data) -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Gavin Simpson [T] +44 (0)20 7679 5522 ENSIS Research Fellow [F] +44 (0)20 7679 7565 ENSIS Ltd. ECRC [E] gavin.simpsonATNOSPAMucl.ac.uk UCL Department of Geography [W] http://www.ucl.ac.uk/~ucfagls/cv/ 26 Bedford Way[W] http://www.ucl.ac.uk/~ucfagls/ London. WC1H 0AP. %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Gavin Simpson
Re: [Rd] problem using model.frame()
On 8/16/05, Gavin Simpson [EMAIL PROTECTED] wrote: On Tue, 2005-08-16 at 11:25 -0400, Gabor Grothendieck wrote: It can handle data frames like this: model.frame(y1) or model.frame(~., y1) Thanks Gabor, Yes, I know that works, but I want the function coca.formula to accept a formula like this y2 ~ y1, with both y1 and y2 being data frames. It is The expressions I gave work generally (i.e. lm, glm, ...), not just in model.matrix, so would it be ok if the user just does this? yourfunction(y2 ~., y1) If it really is important to do it the way you describe, are the data frames necessarily numeric? If so you could preprocess your formula by placing as.matrix around all the variables representing data frames using something like this: https://www.stat.math.ethz.ch/pipermail/r-help/2004-December/061485.html Of course, if they are necessarily numeric maybe they can be matrices in the first place? more intuitive, to my mind at least for this particular example and analysis, to specify the formula with a data frame on the rhs. model.frame doesn't work with the formula ~ y1 if the object y1, in the environment when model.frame evaluates the formula, is a data.frame. It works if y1 is a matrix, however. I'd like to work around this problem, say by creating an environment in which y1 is modified to be a matrix, if possible. Can this be done? At the moment I have something working by grabbing the bits of the formula and then using get() to grab the named object. Of course, this won't work if someone wants to use R's formula interface with the following formula y2 ~ var1 + var2 + var3, data = y1, or to use the subset argument common to many formula implementations. I'd like to have the function work in as general a manner as possible, so I'm fishing around for potential solutions. All the best, Gav On 8/16/05, Gavin Simpson [EMAIL PROTECTED] wrote: Hi I'm having a problem with model.frame, encapsulated in this example: y1 - matrix(c(3,1,0,1,0,1,1,0,0,0,1,0,0,0,1,1,0,1,1,1), nrow = 5, byrow = TRUE) y1 - as.data.frame(y1) rownames(y1) - paste(site, 1:5, sep = ) colnames(y1) - paste(spp, 1:4, sep = ) y1 model.frame(~ y1) Error in model.frame(formula, rownames, variables, varnames, extras, extranames, : invalid variable type temp - as.matrix(y1) model.frame(~ temp) temp.spp1 temp.spp2 temp.spp3 temp.spp4 1 3 1 0 1 2 0 1 1 0 3 0 0 1 0 4 0 0 1 1 5 0 1 1 1 Ideally the above wouldn't have names like temp.var1, temp.var2, but one could deal with that later. I have tracked down the source of the error message to line 1330 in model.c - here I'm stumped as I don't know any C, but it looks as if the code is looping over the variables in the formula and checking of they are the right type. So a matrix of variables gets through, but a data.frame doesn't. It would be good if model.frame could cope with data.frames in formulae, but seeing as I am incapable of providing a patch, is there a way around this problem? Below is the head of the function I am currently using, including the function for parsing the formula - borrowed and hacked from ordiParseFormula() in package vegan. I can work out the class of the rhs of the forumla. Is there a way to create a suitable environment for the data argument of parseFormula() such that it contains the rhs dataframe coerced to a matrix, which then should get through model.frame.default without error? How would I go about manipulating/creating such an environment? Any other ideas? Thanks in advance Gav coca.formula - function(formula, method = c(predictive, symmetric), reg.method = c(simpls, eigen), weights = NULL, n.axes = NULL, symmetric = FALSE, data) { parseFormula - function (formula, data) { browser() Terms - terms(formula, Condition, data = data) flapart - fla - formula - formula(Terms, width.cutoff = 500) specdata - formula[[2]] X - eval(specdata, data, parent.frame()) X - as.matrix(X) formula[[2]] - NULL if (formula[[2]] == 1 || formula[[2]] == 0) Y - NULL else { mf - model.frame(formula, data, na.action = na.fail) Y - model.matrix(formula, mf) if (any(colnames(Y) == (Intercept))) { xint - which(colnames(Y) == (Intercept)) Y - Y[, -xint, drop = FALSE] } } list(X = X, Y = Y) } if (missing(data)) data - parent.frame() #browser() dat - parseFormula(formula, data) --