On Thu, 2005-08-18 at 09:00 -0400, Gabor Grothendieck wrote: > I think this one is a hard call. Designing software is a > series of tradeoffs. Its nice to maintain consistency with > the R base, but in case of extensions (rather than changing > behavior) as in this case, the argument against the change > carries less weight. > > The main problems with extensions are (1) that one has to > remember which functions/packages have which extensions if > one is to use them and (2) they can interfere with other > future extensions. > > On the other hand, if one is using a particular package a > lot then convenience features like this may be attractive. > Also, packages are where authors have the freedom to try out > new ideas and new functionality without being constrained. > > Perhaps, if the extension in question is added there could be > a warning in the help file that this is a convenience feature > of this particular package and is not generally available > throughout R.
Thanks again Gabor for another useful contribution to this debate. Also thanks to Martin, Gabor and Jari for their comments, ideas, suggestions and viewpoints. I still like y1 ~ y2 (both data frames), but during my bike ride to work this morning I considered both sides of the argument and my position has moved towards the R way of doing things - far be it for little old me to go against years of S-formula tradition. So I'll revert the code back to accepting y1 ~ ., data = y2 and leave it to throw an error for the rhs being a data frame case. Once again, thank you for helping me work through this dilemma. All the best, Gav > On 8/18/05, Gavin Simpson <[EMAIL PROTECTED]> wrote: > > On Thu, 2005-08-18 at 07:57 +0300, Jari Oksanen wrote: > > > On 18 Aug 2005, at 1:49, Gavin Simpson wrote: > > > > > > > On Wed, 2005-08-17 at 20:24 +0200, Martin Maechler wrote: > > > >>>>>>> "GS" == Gavin Simpson <[EMAIL PROTECTED]> > > > >>>>>>> on Tue, 16 Aug 2005 18:44:23 +0100 writes: > > > >> > > > >> GS> On Tue, 2005-08-16 at 12:35 -0400, Gabor Grothendieck > > > >> GS> wrote: > > > >>>> On 8/16/05, Gavin Simpson <[EMAIL PROTECTED]> > > > >>>> wrote: > On Tue, 2005-08-16 at 11:25 -0400, Gabor > > > >>>> Grothendieck wrote: > > It can handle data frames like > > > >>>> this: > > > >>>>>> > > > >>>>>> model.frame(y1) > > or > > model.frame(~., y1) > > > >>>>> > > > >>>>> Thanks Gabor, > > > >>>>> > > > >>>>> Yes, I know that works, but I want the function > > > >>>> coca.formula to accept a > formula like this y2 ~ y1, > > > >>>> with both y1 and y2 being data frames. It is > > > >>>> > > > >>>> The expressions I gave work generally (i.e. lm, glm, > > > >>>> ...), not just in model.matrix, so would it be ok if the > > > >>>> user just does this? > > > >>>> > > > >>>> yourfunction(y2 ~., y1) > > > >> > > > >> GS> Thanks again Gabor for your comments, > > > >> > > > >> GS> I'd prefer the y1 ~ y2 as data frames - as this is the > > > >> GS> most natural way of doing things. I'd like to have (y2 > > > >> GS> ~., y1) as well, and (y2 ~ spp1 + spp2 + spp3, y1) also > > > >> GS> work - silently without any trouble. > > > >> > > > >> I'm sorry, Gavin, I tend to disagree quite a bit. > > > >> > > > >> The formula notation has quite a history in the S language, and > > > >> AFAIK never was the idea to use data.frames as formula > > > >> components, but rather as "environments" in which formula > > > >> components are looked up --- exactly as Gabor has explained. > > > > > > > > Hi Martin, thanks for your comments, > > > > > > > > But then one could have a matrix of variables on the rhs of the formula > > > > and it would work - whether this is a documented feature or un-intended > > > > side-effect of matrices being stored as vectors with dims, I don't > > > > know. > > > > > > > > And whilst the formula may have a long history, a number of packages > > > > have extended the interface to implement a specific feature, which > > > > don't > > > > work with standard functions like lm, glm and friends. I don't see how > > > > what I wanted to achieve is greatly different to that or using a > > > > matrix. > > > > > > > >> To break with such a deeply rooted principle, > > > >> you should have very very good reasons, because you're breaking > > > >> the concepts on which all other uses of formulae are based. > > > >> And this would potentially lead to much confusion of your users, > > > >> at least in the way they should learn to think about what > > > >> formulae mean. > > > > > > > > In the end I managed to treat y1 ~ y2 (both data frames) as a special > > > > case, which allows the existing formula notation to work as well, so I > > > > can use y1 ~ y2, y1 ~ ., data = y2, or y1 ~ var + var2, data = y2. This > > > > is what I wanted all along, to extend my interface (not do anything to > > > > R's formulae), but to also work in the traditional sense. > > > > > > > > The model I am writing code for really is modelling the relationship > > > > between two matrices of data. In one version of the method, there is > > > > real equivalence between both sides of the formula so it would seem odd > > > > to treat the two sides of the formula differently. At least to me ;-) > > > > > > It seems that I may be responsible for one of these extensions (lhs as > > > a data.frame in cca and rda in vegan package). There the response (lhs) > > > is multivariate or a multispecies community, and you must take that as > > > a whole without manipulation (and if you tried using VGAM you see there > > > really is painful to define lhs with, say, 127 elements). > > > > Hi Jari, > > > > Thanks for reminding me about this - I'd forgotten about not normally > > being able to have a data.frame on the lhs of the formula either - I'm > > surprised no-one pulled me up on that one before, either ;-) > > > > I guess what I'm proposing is really pushing the formula representation > > too far for some people. I'm coming round to the y1 ~ ., data = y2 way > > of doing things - still prefer y1 ~ y2 though ;-) > > > > Also, both y1 and y2 are community matrices (i.e. both have many, many > > species, aka variables for the non-community ecologists reading this). > > I'm not sure that it makes sense to treat the two sides differently. In > > the predictive co-correspondence mode (the default), multivariate pls is > > used to regress one matrix on another, with the number of pls components > > being chosen by cross-validation or a permutation test. > > > > > However, in > > > general you shouldn't use models where you use all the 'explanatory' > > > variables (rhs) that yo happen to have by accident. So much bad science > > > has been created with that approach even in your field, Gav. > > > > Well, I agree with you there... > > > > > The whole > > > idea of formula is the ability to choose from candidate variables. That > > > is: to build a model. Therefore you have one-sided formulae in prcomp() > > > and princomp(): you can say prcomp(~ x1 + log(x2) +x4, data) or > > > prcomp(~ . - x3, data). I think you should try to keep it so. Do > > > instead like Gabor suggested: you could have a function coca.default or > > > coca.matrix with interface: > > > > > > coca.matrix(matx, maty, matz) -- or you can name this as coca.default. > > > > > > and coca.formula which essentially parses your formula and returns a > > > list of matrices you need: > > > > > > coca.formula <- function(formula, data) > > > { > > > matricesout <- parsemyformula(formula, data) > > > coca(matricesout$matx, matricesout$maty, matricesoutz) > > > } > > > Then you need the generic: coca <- function(...) UseMethod("coca") and > > > it's done (but fails in R CMD check unless you add "..." in all > > > specific functions...). The real work is always done in coca.matrix (or > > > coca.default), and the others just chew your data into suitable form > > > for your workhorse. > > > > > > If then somebody thinks that they need all possible variables as > > > 'explanatory' variables (or perhaps constraints in your case), they > > > just call the function as > > > > > > coca(matx, maty, matz) > > > > My functions are already generic with coca.default and coca.formula. The > > issue with matrices/data.frames was only a problem in the formula > > interface. > > > > > And if you have coca.data.frame they don't need 'quacking' with extra > > > steps: > > > > > > coca.data.frame <- function(dfx, dfy dfz) coca(as.matrix(dfx), > > > as.matrix(dfy), as.matrix(dfz)). > > > > > > This you call as coca(dfx, dfy, dfz) and there you go. > > > > > > The essential feature in formula is the ability to define the model. > > > Don't give it away. > > > > I think the point I'm trying to make is that I don't think what I'm > > trying to do is any different than doing lm(y ~ x, data), (where y, x > > are vectors) - it is just that my x and y happen to be multivariate. I > > think it is easier to think of each community as a single entity in this > > regard - the relationship *is* between community 1 and community 2, not > > parts of community 2, or some parsimonious model of community 2 - but > > that might just be semantics - unlike your cca/rda functions which > > really are a (weighted) multivariate multiple regression. Happy to be > > convinced otherwise though. > > > > Also, it is worth re-iterating that I haven't broken the traditional way > > of working with formulae with my function - you can still do y1 ~ ., > > data = y2, or y1 ~ spp1 + spp2 + spp3, data = y2, for maximum > > flexibility. All I wanted (and worked out how) to do was treat the rhs > > in a special way if it were a data frame, just like Jari treats a > > data.frame on the lhs of formulae in package vegan as a special case. > > > > Thanks everyone for your ideas and comments - lots of food for thought. > > I wavering between both camps on this - still time to be convinced and > > change it before I finish the package. > > > > All the best, > > > > G > > > > > > > > cheers, jazza > > > -- > > > Jari Oksanen, Oulu, Finland > > > > > -- > > %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% > > Gavin Simpson [T] +44 (0)20 7679 5522 > > ENSIS Research Fellow [F] +44 (0)20 7679 7565 > > ENSIS Ltd. & ECRC [E] gavin.simpsonATNOSPAMucl.ac.uk > > UCL Department of Geography [W] http://www.ucl.ac.uk/~ucfagls/cv/ > > 26 Bedford Way [W] http://www.ucl.ac.uk/~ucfagls/ > > London. WC1H 0AP. > > %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% > > > > ______________________________________________ > > R-devel@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Gavin Simpson [T] +44 (0)20 7679 5522 ENSIS Research Fellow [F] +44 (0)20 7679 7565 ENSIS Ltd. & ECRC [E] gavin.simpsonATNOSPAMucl.ac.uk UCL Department of Geography [W] http://www.ucl.ac.uk/~ucfagls/cv/ 26 Bedford Way [W] http://www.ucl.ac.uk/~ucfagls/ London. WC1H 0AP. %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel