[R] Loading of namespace on load of .Rdata (was strange behaviour of load)
Last week Giovanni Parrinello posted a message asking why various packages were loaded when he loaded an .Rdata file. Brian Ripley replied saying he thought it was because the saved workspace contained a reference to the namespace of ipred. (Correspondence copied below). This begs the question: how did the reference to the namespace of ipred come to be in the .Rdata file? Brian did say it is likely to be because the workspace contained object(s) saved with environment the namespace of ipred - but how would this come about? In this case I think is because the .Rdata file contained an object whose *parent* environment was the namespace of ipred. Take the following example from ?bagging (having loaded ipred): > data(BreastCancer) > > mod <- bagging(Class ~ Cl.thickness + Cell.size + + Cell.shape + Marg.adhesion + + Epith.c.size + Bare.nuclei + + Bl.cromatin + Normal.nucleoli + + Mitoses, data=BreastCancer, coob=TRUE) > > environment(mod$mtrees[[1]]$btree$terms) > > parent.env(environment(mod$mtrees[[1]]$btree$terms)) This occurs because the terms object is taken from the model frame which was evaluated within the environment of a function from the ipred package (here ipred:::irpart). Therefore I think the behaviour observed by Giovanni will only occur in unusual circumstances: when the workspace contains a formula object, a terms object, a function, or some other object with a non-NULL environment, which has been created in the environment of a packaged function. In particular, this would not always occur with a packaged model fitting function, e.g. (from ?loglm in MASS) > library(MASS) > minn38a <- array(0, c(3,4,7,2), lapply(minn38[, -5], levels)) > minn38a[data.matrix(minn38[,-5])] <- minn38$f > fm <- loglm(~1 + 2 + 3 + 4, minn38a) > environment(fm$terms) in this case because the terms component is obtained from the formula, whose environment is .GlobalEnv. So, I have two points on this (more for R-devel than R-help now) 1. There is a more general situation where it would be useful to load the namespace of a package after loading a saved workspace: when the workspace contains objects of a class for which special methods are required. E.g. if 'fm' from the example above were saved in a workspace, the namespace of MASS would not be loaded when the workspace was loaded into R. Thus unless MASS was loaded by the user, default methods would be used by summary(), print() etc rather than the specialised methods for objects of class "loglm". Of course the user should quickly realise this, but there may be cases where the default method gives a convincing but incorrect or unhelpful result. An alternative would be to add an attribute to objects of class "loglm" (say), e.g. attr(loglmObject, ".Environment") <- environment(MASS) so that the namespace would automatically be loaded when it is required. [In fact alternatives such as environment(loglmObject) <- environment(MASS) or loglmObject$anyoldthing <- environment(MASS) would work just as well, but perhaps the first suggestion is neatest.]. What do others think of this idea? Should it (or an equivalent idea) be encouraged amongst package writers? 2. In the case highlighted by Giovanni, the namespace of ipred was loaded, but the package was not. This would be fine, except that the packages on which ipred depends *were* loaded. This seems inconsistent. I guess as long as there are packages without namespaces though, this is the only way to proceed. Perhaps in the meantime, package authors should be encouraged to use importFrom() rather than import()? Or perhaps where packages do have namespaces, only the namespace should be loaded in such a case. Heather > From: Prof Brian Ripley <[EMAIL PROTECTED]> > Date: 12 January 2006 08:21:35 GMT > To: giovanni parrinello <[EMAIL PROTECTED]> > Cc: r-help@stat.math.ethz.ch > Subject: Re: [R] Strange behaviour of load > > On Wed, 11 Jan 2006, giovanni parrinello wrote: > >> Dear All, >> simetimes when I load an Rdata I get this message >> >> ### >> Code: >> >> load('bladder1.RData') >> Carico il pacchetto richiesto: rpart ( Bad traslastion: Load required >> package-...) >> Carico il pacchetto richiesto: MASS >> Carico il pacchetto richiesto: mlbench >> Carico il pacchetto richiesto: survival >> Carico il pacchetto richiesto: splines >> >> Carico il pacchetto richiesto: 'survival' >> >> >>The following object(s) are masked from package:Hmisc : >> >> untangle.specials >> >> Carico il pacchetto richiesto: class >> Carico il pacchetto richiesto: nnet >> # >> >> So I have many unrequired packages loaded. >> Any idea? > > They are required! My guess is that you have object(s) saved with > environment the namespace of some package, and loading that namespace > is > pulling these in. The only CRAN package which requires mlbench > appears to > be ipred, and that requires all of those
Re: [R] sweep() and recycling
Agreed my examples may be trivial and I'm sure there are more efficient ways to do the same thing, but I disagree that my examples are against the spirit of sweep (). In the first example a vector is swept out from the rows, with one value for odd rows and one value for even rows. In the second example an array of values is swept out across the third dimension. In the third example an array of values is swept out from the full array. The first example is a natural use of recycling. E.g. sweep(matrix(1:100, 50, 2), 1, c(1, 1000), "+", give.warning = TRUE) is a quicker way of writing sweep(matrix(1:100, 50, 2), 1, rep(c(1, 1000), 25), "+", give.warning = TRUE) but your code would give a warning in the first case, even though the intent and the result are exactly the same as in the second case. As you say, it is only a warning, that can be ignored. However the warning should at least reflect the warning condition used, i.e. warn that the length of STATS does not equal the extent of MARGIN, rather warning that STATS does not recycle exactly. Heather >>> Robin Hankin <[EMAIL PROTECTED]> 06/21/05 02:47pm >>> Hi On Jun 21, 2005, at 02:33 pm, Heather Turner wrote: > I think the warning condition in Robin's patch is too harsh - the > following examples seem reasonable to me, but all produce warnings > > sweep(array(1:24, dim = c(4,3,2)), 1, 1:2, give.warning = TRUE) > sweep(array(1:24, dim = c(4,3,2)), 1, 1:12, give.warning = TRUE) > sweep(array(1:24, dim = c(4,3,2)), 1, 1:24, give.warning = TRUE) > The examples above do give warnings (as intended) but I think all three cases above are inimical to the spirit of sweep(): nothing is being "swept" out. So a warning is appropriate, IMO. In any case, one can always suppress (or ignore!) a warning if one knows what one is doing. YMMV, but if I wanted to do the above operations I would replace sweep(array(0, dim = c(4,3,2)), c(1,3), 1:12, "+" , give.warning = FALSE) with aperm(array(1:12,c(4,2,3)),c(1,3,2)) best wishes rksh > I have written an alternative (given below) which does not give > warnings in the above cases, but does warn in the following case > >> sweep(array(1:24, dim = c(4,3,2)), 1:2, 1:3) > , , 1 > > [,1] [,2] [,3] > [1,]036 > [2,]039 > [3,]069 > [4,]369 > > , , 2 > > [,1] [,2] [,3] > [1,] 12 15 18 > [2,] 12 15 21 > [3,] 12 18 21 > [4,] 15 18 21 > > Warning message: > STATS does not recycle exactly across MARGIN > > The code could be easily modified to warn in other cases, e.g. when > length of STATS is a divisor of the corresponding array extent (as in > the first example above, with length(STATS) = 2). > > The code also includes Gabor's suggestion. > > Heather > > sweep <- function (x, MARGIN, STATS, FUN = "-", warn = > getOption("warn"), ...) > { > FUN <- match.fun(FUN) > dims <- dim(x) > perm <- c(MARGIN, (1:length(dims))[-MARGIN]) > if (warn >= 0) { > s <- length(STATS) > cumDim <- c(1, cumprod(dims[perm])) > if (s > max(cumDim)) > warning("length of STATS greater than length of array", > call. = FALSE) > else { > upper <- min(ifelse(cumDim > s, cumDim, max(cumDim))) > lower <- max(ifelse(cumDim < s, cumDim, min(cumDim))) > if (any(upper %% s != 0, s %% lower != 0)) > warning("STATS does not recycle exactly across MARGIN", > call. = FALSE) > } > } > FUN(x, aperm(array(STATS, dims[perm]), order(perm)), ...) > } > >>>> Gabor Grothendieck <[EMAIL PROTECTED]> 06/21/05 01:25pm >>> > \ > Perhaps the signature should be: > >sweep(...other args go here..., warn=getOption("warn")) > > so that the name and value of the argument are consistent with > the R warn option. > > On 6/21/05, Robin Hankin <[EMAIL PROTECTED]> wrote: >> >> On Jun 20, 2005, at 04:58 pm, Prof Brian Ripley wrote: >> >>> The issue here is that the equivalent command array(1:5, c(6,6)) (to >>> matrix(1:5,6,6)) gives no warning, and sweep uses array(). >>> >>> I am not sure either should: fractional recycling was normally >>> allowed >>> in S3 (S4 tightened up a bit). >>> >>> Perhaps someone who thinks sweep() should warn could contribute a >>> tested patch? >>> >> >> >> OK, modified R code and Rd file below (is this the best way to do >> this?) >> &g
Re: [R] sweep() and recycling
I think the warning condition in Robin's patch is too harsh - the following examples seem reasonable to me, but all produce warnings sweep(array(1:24, dim = c(4,3,2)), 1, 1:2, give.warning = TRUE) sweep(array(1:24, dim = c(4,3,2)), 1, 1:12, give.warning = TRUE) sweep(array(1:24, dim = c(4,3,2)), 1, 1:24, give.warning = TRUE) I have written an alternative (given below) which does not give warnings in the above cases, but does warn in the following case > sweep(array(1:24, dim = c(4,3,2)), 1:2, 1:3) , , 1 [,1] [,2] [,3] [1,]036 [2,]039 [3,]069 [4,]369 , , 2 [,1] [,2] [,3] [1,] 12 15 18 [2,] 12 15 21 [3,] 12 18 21 [4,] 15 18 21 Warning message: STATS does not recycle exactly across MARGIN The code could be easily modified to warn in other cases, e.g. when length of STATS is a divisor of the corresponding array extent (as in the first example above, with length(STATS) = 2). The code also includes Gabor's suggestion. Heather sweep <- function (x, MARGIN, STATS, FUN = "-", warn = getOption("warn"), ...) { FUN <- match.fun(FUN) dims <- dim(x) perm <- c(MARGIN, (1:length(dims))[-MARGIN]) if (warn >= 0) { s <- length(STATS) cumDim <- c(1, cumprod(dims[perm])) if (s > max(cumDim)) warning("length of STATS greater than length of array", call. = FALSE) else { upper <- min(ifelse(cumDim > s, cumDim, max(cumDim))) lower <- max(ifelse(cumDim < s, cumDim, min(cumDim))) if (any(upper %% s != 0, s %% lower != 0)) warning("STATS does not recycle exactly across MARGIN", call. = FALSE) } } FUN(x, aperm(array(STATS, dims[perm]), order(perm)), ...) } >>> Gabor Grothendieck <[EMAIL PROTECTED]> 06/21/05 01:25pm >>> \ Perhaps the signature should be: sweep(...other args go here..., warn=getOption("warn")) so that the name and value of the argument are consistent with the R warn option. On 6/21/05, Robin Hankin <[EMAIL PROTECTED]> wrote: > > On Jun 20, 2005, at 04:58 pm, Prof Brian Ripley wrote: > > > The issue here is that the equivalent command array(1:5, c(6,6)) (to > > matrix(1:5,6,6)) gives no warning, and sweep uses array(). > > > > I am not sure either should: fractional recycling was normally allowed > > in S3 (S4 tightened up a bit). > > > > Perhaps someone who thinks sweep() should warn could contribute a > > tested patch? > > > > > OK, modified R code and Rd file below (is this the best way to do > this?) > > > > > "sweep" <- > function (x, MARGIN, STATS, FUN = "-", give.warning = FALSE, ...) > { > FUN <- match.fun(FUN) > dims <- dim(x) > if(give.warning & length(STATS)>1 & any(dims[MARGIN] != > dim(as.array(STATS{ > warning("array extents do not recycle exactly") > } > perm <- c(MARGIN, (1:length(dims))[-MARGIN]) > FUN(x, aperm(array(STATS, dims[perm]), order(perm)), ...) > } > > > > > > > > \name{sweep} > \alias{sweep} > \title{Sweep out Array Summaries} > \description{ > Return an array obtained from an input array by sweeping out a summary > statistic. > } > \usage{ > sweep(x, MARGIN, STATS, FUN="-", give.warning = FALSE, \dots) > } > \arguments{ > \item{x}{an array.} > \item{MARGIN}{a vector of indices giving the extents of \code{x} > which correspond to \code{STATS}.} > \item{STATS}{the summary statistic which is to be swept out.} > \item{FUN}{the function to be used to carry out the sweep. In the > case of binary operators such as \code{"/"} etc., the function name > must be quoted.} > \item{give.warning}{Boolean, with default \code{FALSE} meaning to > give no warning, even if array extents do not match. If > \code{TRUE}, check for the correct dimensions and if a > mismatch is detected, give a suitable warning.} > \item{\dots}{optional arguments to \code{FUN}.} > } > \value{ > An array with the same shape as \code{x}, but with the summary > statistics swept out. > } > \note{ > If \code{STATS} is of length 1, recycling is carried out with no > warning irrespective of the value of \code{give.warning}. > } > > \references{ > Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) > \emph{The New S Language}. > Wadsworth \& Brooks/Cole. > } > \seealso{ > \code{\link{apply}} on which \code{sweep} used to be based; > \code{\link{scale}} for centering and scaling. > } > \examples{ > require(stats) # for median > med.att <- apply(attitude, 2, median) > sweep(data.matrix(attitude), 2, med.att)# subtract the column medians > > a <- array(0, c(2, 3, 4)) > b <- matrix(1:8, c(2, 4)) > sweep(a, c(1, 3), b, "+", give.warning = TRUE) # no warning: > all(dim(a)[c(1,3)] == dim(b)) > sweep(a, c(1, 2), b, "+", give.warning = TRUE) # warning given > > } > \keyword{array} > \keyword{iteration} > > > > > -- > Robin Hankin > Uncertainty Analyst > National Oceanography
Re: [R] sweep() and recycling
Dr H Turner Research Assistant Dept. of Statistics The University of Warwick Coventry CV4 7AL Tel: 024 76575870 >>> Robin Hankin <[EMAIL PROTECTED]> 06/20/05 10:32am >>> Hi I had a hard-to-find bug in some of my code the other day, which I eventually traced to my misusing of sweep(). I would expect sweep() to give me a warning if the elements don't recycle nicely, but X <- matrix(1:36,6,6) sweep(X,1,1:5,"+") [,1] [,2] [,3] [,4] [,5] [,6] [1,]29 16 23 30 32 [2,]4 11 18 25 27 34 [3,]6 13 20 22 29 36 [4,]8 15 17 24 31 38 [5,] 10 12 19 26 33 40 [6,]7 14 21 28 35 37 gives no warning, even though (5,36)=1. Also, sweep(X,1,c(1,1000),"+") and sweep(X,2,c(1,1000),"+") behave as expected, But sweep(X,1,1:5,"+") and sweep(X,2,1:5,"+") are identical! ...which is also expected since the matrix is square. Perhaps the following will help you see why: > matrix(1:5,6,6) [,1] [,2] [,3] [,4] [,5] [,6] [1,]123451 [2,]234512 [3,]345123 [4,]451234 [5,]512345 [6,]123451 Warning message: data length [5] is not a sub-multiple or multiple of the number of rows [6] in matrix > matrix(1:5,6,6, byrow = TRUE) [,1] [,2] [,3] [,4] [,5] [,6] [1,]123451 [2,]234512 [3,]345123 [4,]451234 [5,]512345 [6,]123451 Warning message: data length [5] is not a sub-multiple or multiple of the number of rows [6] in matrix i.e. recycling a vector of length 5 down the columns gives the same result as recycling the same vector along the rows. I agree with your main point however, it would be useful if 'sweep' would give a warning when the length of the vector of statistics to be swept out was not a sub-multiple or multiple of the corresponding array dimension. Heather __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] using eval() with pre-built expression inside function
Sorry, I forgot to add to my original post that I want to use the "envir" argument of eval() so that I can specify a secondary data source as follows: eval(expression(model.frame(formula, data = lookHereFirst), envir = lookHereSecond, enclos = lookHereThird) Your version of f1 does produce the desired model frame, but does this by passing all the data to model.frame() directly, i.e. it evaluates model.frame(formula = y~x, data = dataset) However, if I adapt your f1 as follows: f1 <- function(formula, data) { mf <- match.call(expand.dots = FALSE) mf[[1]] <- as.name("model.frame") print(mf) mf } it still works if some of the data is provided through the "data" argument of model.frame() and the rest of the data is provided through the "envir" argument of eval(): > test1 <- f1(formula = y ~ x, data = dataset[,1, drop = FALSE]) model.frame(formula = y ~ x, data = dataset[, 1, drop = FALSE]) > eval(test1, dataset[,2,drop = FALSE], parent.frame()) y x 1 66.71259 1 2 83.21937 1 3 85.24412 1 4 88.10189 1 5 75.71918 1 So you have provided a solution, but I still don't understand why I can't construct the call as I was trying to before: f2 <- function(formula, data) { mf <- as.call(c(as.name("model.frame"), formula = formula, data = substitute(data))) print(mf) mf } This doesn't work: > test2 <- f2(formula = y ~ x, data = dataset[,1, drop = FALSE]) model.frame(formula = y ~ x, data = dataset[, 1, drop = FALSE]) > eval(test2, dataset[,2,drop = FALSE], parent.frame()) Error in eval(expr, envir, enclos) : Object "x" not found Apparently test1 and test2 are not identical, but I can't see why, which may be the root of my problem here. Can anyone explain why they are different? Thanks Heather >>> "Roger D. Peng" <[EMAIL PROTECTED]> 01/24/05 04:48pm >>> If you look at the beginning of lm(), you'll see that match.call() is used and name of the function (in this case "f1") is replaced with "model.frame". Does something like this work? f1 <- function(formula, data) { mf <- match.call(expand.dots = FALSE) mf[[1]] <- as.name("model.frame") eval(mf, parent.frame()) } -roger Heather Turner wrote: > I'm trying to evaluate a pre-built expression using eval(), e.g. > > dataset <- data.frame(y = runif(30, 50,100), x = gl(5, 6)) > > # one like this > mf <- expression(model.frame(y~x)) > eval(mf, dataset, parent.frame()) > > # rather than this > eval(expression(model.frame(y~x)), dataset, parent.frame()) > > In the example above there is no problem, the problem comes when I try to do > a similar thing within a function, e.g. > > f1 <- function(formula, data) { > mt <- terms(formula) > mf <- as.expression(as.call(c(as.name("model.frame"), formula = mt))) > eval(mf, data, parent.frame()) > } > > >>f1(formula = y ~ x, data = dataset) > > Error in eval(expr, envir, enclos) : Object "y" not found > > I can get round this by building a call to eval using paste, e.g. > > f2 <- function(formula, data) { > mt <- terms(formula) > mf <- as.expression(as.call(c(as.name("model.frame"), formula = mt))) > direct <- parse(text = paste("eval(expression(", mf, >"), data, parent.frame())")) > print(direct) > eval(direct) > } > > >>f2(formula = y ~ x, data = dataset) > > expression(eval(expression(model.frame(formula = y ~ x)), data, > parent.frame())) > y x > 1 92.23087 1 > 2 63.43658 1 > 3 55.24448 1 > 4 72.75650 1 > 5 67.58781 1 > ... > > but this seems rather convoluted. Can anyone explain why f1 doesn't work > (when f2 does) and/or suggest a neater way of dealing with this? > > Thanks > > Heather > > Mrs H Turner > Research Assistant > Dept. of Statistics > University of Warwick > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html > -- Roger D. Peng http://www.biostat.jhsph.edu/~rpeng/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] using eval() with pre-built expression inside function
I'm trying to evaluate a pre-built expression using eval(), e.g. dataset <- data.frame(y = runif(30, 50,100), x = gl(5, 6)) # one like this mf <- expression(model.frame(y~x)) eval(mf, dataset, parent.frame()) # rather than this eval(expression(model.frame(y~x)), dataset, parent.frame()) In the example above there is no problem, the problem comes when I try to do a similar thing within a function, e.g. f1 <- function(formula, data) { mt <- terms(formula) mf <- as.expression(as.call(c(as.name("model.frame"), formula = mt))) eval(mf, data, parent.frame()) } > f1(formula = y ~ x, data = dataset) Error in eval(expr, envir, enclos) : Object "y" not found I can get round this by building a call to eval using paste, e.g. f2 <- function(formula, data) { mt <- terms(formula) mf <- as.expression(as.call(c(as.name("model.frame"), formula = mt))) direct <- parse(text = paste("eval(expression(", mf, "), data, parent.frame())")) print(direct) eval(direct) } > f2(formula = y ~ x, data = dataset) expression(eval(expression(model.frame(formula = y ~ x)), data, parent.frame())) y x 1 92.23087 1 2 63.43658 1 3 55.24448 1 4 72.75650 1 5 67.58781 1 ... but this seems rather convoluted. Can anyone explain why f1 doesn't work (when f2 does) and/or suggest a neater way of dealing with this? Thanks Heather Mrs H Turner Research Assistant Dept. of Statistics University of Warwick __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] Advice on parsing formulae
Okay, well you could define a tvar function as follows tvar <- function(term) term Then stick to fm1 X <- model.matrix(fm1,keep.order=TRUE) pAssign <- attr(X, "assign") tvar.terms <- terms( fm1, specials = "tvar",keep.order=TRUE ) idx <- attr(tvar.terms,"specials")$tvar if (attr(tvar.terms,"response")) idx <- idx -1 all.labels <- attr(tvar.terms, "term.labels") tvar.labels <- all.labels[idx] tvarAssign <- match(pAssign, match(tvar.labels, all.labels)) tvarAssign[is.na(tvarAssign)] <- 0 I think that the specials attributes is an index of the variables attribute, hence I have replaced if (attr(tvar.terms,"intercept")) idx <- idx -1 with if (attr(tvar.terms,"response")) idx <- idx -1 check e.g. fm1 <- Y ~ tvar(x:A) + tvar(z) + u + tvar(B) + tvar(poly(v,3)) fm1 <- ~ 1 + tvar(x:A) + tvar(z) + u + tvar(B) + tvar(poly(v,3)) Heather From: "Claus Dethlefsen" <[EMAIL PROTECTED]> To: "'Heather Turner'" <[EMAIL PROTECTED]>, <[EMAIL PROTECTED]> Date: 12/16/04 8:33am Subject:RE: [R] Advice on parsing formulae Thank you for the advice. I have now boiled my problem down to the following: How do I create fm2 from fm1 ? fm1 <- Y ~ 1 + tvar(x:A) + tvar(z) + u + tvar(B) + tvar(poly(v,3)) fm2 <- Y ~ 1 + x:A + z + u + B + poly(v, 3) Thus, how do I simply remove tvar( * ) from a formula? Do I have to write a function to parse the string and then re-create the formula? Is there an easy way of doing this? When my above problem is solved, I can (with the help from Heather Turner and Chuck Berry) do the following ## define som data x <- z <- u <- v <- rnorm(5) A <- B <- factor( rep( c(1,2), c(3,2) ) ) ## define my formula fm1 and manually create fm2. fm1 <- Y ~ 1 + tvar(x:A) + tvar(z) + u + tvar(B) + tvar(poly(v,3)) fm2 <- Y ~ 1 + x:A + z + u + B + poly(v, 3) ## extract the term.labels from fm2, make the design matrix and extract 'assign' term.labels <- unname(sapply(attr(terms(fm2), "term.labels"), removeSpace)) X <- model.matrix(fm2,keep.order=TRUE) pAssign <- attr(X, "assign") ## Now, extract the tvar-terms from fm1 tvar.terms <- terms( fm1, specials = "tvar",keep.order=TRUE ) idx <- attr(tvar.terms,"specials")$tvar if (attr(tvar.terms,"intercept")) idx <- idx -1 tvar <- attr(terms(fm2,keep.order=TRUE),"term.labels")[idx] tvar <- unname( sapply( tvar, removeSpace) ) ## Finally, combine the information to get the vector I asked for tvarAssign <- match(pAssign, sort(match(tvar, term.labels))) tvarAssign[is.na(tvarAssign)] <- 0 Mrs H Turner Research Assistant Dept. of Statistics University of Warwick __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Advice on parsing formulae
I think this will do what you want: # Need this function to remove spaces from term labels later on > removeSpace <- function(string) gsub("[[:space:]]", "", string) # Specify which terms are in a "tvar" group # (could remove spaces separately) > tvar <- unname(sapply(c("x:A", "z", "B", "poly(v,3)"), removeSpace)) # Use terms to get term labels from formula > formula <- Y ~ 1 + x:A + z + u + B + poly(v,3) > term.labels <- unname(sapply(attr(terms(formula), "term.labels"), > removeSpace)) > tvar [1] "x:A" "z" "B" "poly(v,3)" > term.labels [1] "z" "u" "B" "poly(v,3)" "x:A" # Get assign variable for parameters # (You would use first two lines, but I don't have data so defined assign variable myself) > #X <- model.matrix(formula) > #pAssign <- attr(X, "assign") > pAssign <- c(0,1,2,3,4,4,4,5,5) # Define "tvarAssign" > tvarAssign <- match(pAssign, sort(match(tvar, term.labels))) > tvarAssign[is.na(tvarAssign)] <- 0 > tvarAssign [1] 0 1 0 2 3 3 3 4 4 HTH Heather Mrs H Turner Research Assistant Dept. of Statistics University of Warwick >>> "Claus Dethlefsen" <[EMAIL PROTECTED]> 12/13/04 04:10pm >>> Dear list I would like to be able to group terms in a formula using a function that I will call tvar(), eg. the formula Y ~ 1 + tvar(x:A) + tvar(z) + u + tvar(B) + tvar(poly(v,3)) where x,u and v are numeric and A and B are factors - binary, say. As output, I want the model.matrix as if tvar had not been there at all. In addition, I would like to have information on the grouping, as a vector as long as ncol( model.matrix ) with zeros corresponding to terms outside tvar and with an index grouping the terms inside each tvar(). In the (sick) example: > model.matrix(Y ~ 1 + tvar(x:A) + tvar(z) + u + tvar(B) + tvar(poly(v,3))) (Intercept) z u B2 poly(v, 3)1 poly(v, 3)2 poly(v, 3)3 x:A1 x:A2 11 -1.55 -1.03 0 0.160 -0.350 -0.281 0.66 0.00 21 -1.08 0.55 0 -0.164 -0.211 0.340 0.91 0.00 31 0.29 -0.26 0 -0.236 -0.073 0.311 -1.93 0.00 41 -1.11 0.96 0 0.222 -0.285 -0.385 -0.23 0.00 51 0.43 -0.76 1 -0.434 0.515 -0.532 0.22 0.00 I would like the vector c(0,1,0,2,3,3,3,4,4) pointing to the tvar-grouped terms. Thus what I would like, looks a bit like the 'assign' attribute of the model.matrix() output. I have not figured out a way of doing this in a nice way and would like some help, please. I hope somebody can help me (or point the manual-pages I should read), Best, Claus Dethlefsen --- Assistant Professor, Claus Dethlefsen, Ph.D. mailto:[EMAIL PROTECTED], http://www.math.auc.dk/~dethlef Dpt. of Mathematical Sciences, Aalborg University Fr. Bajers Vej 7G, 9220 Aalborg East Denmark -- No virus found in this outgoing message. Checked by AVG Anti-Virus. __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html