[R] Loading of namespace on load of .Rdata (was strange behaviour of load)

2006-01-18 Thread Heather Turner
Last week Giovanni Parrinello posted a message asking why various packages were 
loaded when he loaded an .Rdata file. Brian Ripley replied saying he thought it 
was because the saved workspace contained a reference to the namespace of 
ipred. (Correspondence copied below).

This begs the question: how did the reference to the namespace of ipred come to 
be in the .Rdata file? Brian did say it is likely to be because the workspace 
contained object(s) saved with environment the namespace of ipred - but how 
would this come about?

In this case I think is because the .Rdata file contained an object whose 
*parent* environment was the namespace of ipred. Take the following example 
from ?bagging (having loaded ipred):

> data(BreastCancer)
> 
> mod <- bagging(Class ~ Cl.thickness + Cell.size
+ + Cell.shape + Marg.adhesion   
+ + Epith.c.size + Bare.nuclei   
+ + Bl.cromatin + Normal.nucleoli
+ + Mitoses, data=BreastCancer, coob=TRUE)
>
> environment(mod$mtrees[[1]]$btree$terms)

>
> parent.env(environment(mod$mtrees[[1]]$btree$terms))


This occurs because the terms object is taken from the model frame which was 
evaluated within the environment of a function from the ipred package (here 
ipred:::irpart).

Therefore I think the behaviour observed by Giovanni will only occur in unusual 
circumstances: when the workspace contains a formula object, a terms object, a 
function, or some other object with a non-NULL environment, which has been 
created in the environment of a packaged function. In particular, this would 
not always occur with a packaged model fitting function, e.g. (from ?loglm in 
MASS)

> library(MASS)
> minn38a <- array(0, c(3,4,7,2), lapply(minn38[, -5], levels))
> minn38a[data.matrix(minn38[,-5])] <- minn38$f
> fm <- loglm(~1 + 2 + 3 + 4, minn38a)  
> environment(fm$terms)


in this case because the terms component is obtained from the formula, whose 
environment is .GlobalEnv.

So, I have two points on this (more for R-devel than R-help now)

1. There is a more general situation where it would be useful to load the 
namespace of a package after loading a saved workspace: when the workspace 
contains objects of a class for which special methods are required. E.g. if 
'fm' from the example above were saved in a workspace, the namespace of MASS 
would not be loaded when the workspace was loaded into R. Thus unless MASS was 
loaded by the user, default methods would be used by summary(), print() etc 
rather than the specialised methods for objects of class "loglm".

Of course the user should quickly realise this, but there may be cases where 
the default method gives a convincing but incorrect or unhelpful result. An 
alternative would be to add an attribute to objects of class "loglm" (say), 
e.g. attr(loglmObject, ".Environment") <- environment(MASS)
so that the namespace would automatically be loaded when it is required. [In 
fact alternatives such as environment(loglmObject) <- environment(MASS) or 
loglmObject$anyoldthing <- environment(MASS) would work just as well, but 
perhaps the first suggestion is neatest.].

What do others think of this idea? Should it (or an equivalent idea) be 
encouraged amongst package writers?

2. In the case highlighted by Giovanni, the namespace of ipred was loaded, but 
the package was not. This would be fine, except that the packages on which 
ipred depends *were* loaded. This seems inconsistent. I guess as long as there 
are packages without namespaces though, this is the only way to proceed. 
Perhaps in the meantime, package authors should be encouraged to use 
importFrom() rather than import()? Or perhaps where packages do have 
namespaces, only the namespace should be loaded in such a case.

Heather

> From: Prof Brian Ripley <[EMAIL PROTECTED]>
> Date: 12 January 2006 08:21:35 GMT
> To: giovanni parrinello <[EMAIL PROTECTED]>
> Cc: r-help@stat.math.ethz.ch
> Subject: Re: [R] Strange behaviour of load
>
> On Wed, 11 Jan 2006, giovanni parrinello wrote:
>
>> Dear All,
>> simetimes when I load an Rdata I get this message
>>
>> ###
>> Code:
>>
>> load('bladder1.RData')
>> Carico il pacchetto richiesto: rpart ( Bad traslastion: Load required 
>> package-...)
>> Carico il pacchetto richiesto: MASS
>> Carico il pacchetto richiesto: mlbench
>> Carico il pacchetto richiesto: survival
>> Carico il pacchetto richiesto: splines
>>
>> Carico il pacchetto richiesto: 'survival'
>>
>>
>>The following object(s) are masked from package:Hmisc :
>>
>> untangle.specials
>>
>> Carico il pacchetto richiesto: class
>> Carico il pacchetto richiesto: nnet
>> #
>>
>> So  I have many unrequired packages loaded.
>> Any idea?
>
> They are required!  My guess is that you have object(s) saved with
> environment the namespace of some package, and loading that namespace 
> is
> pulling these in.  The only CRAN package which requires mlbench 
> appears to
> be ipred, and that requires all of those

Re: [R] sweep() and recycling

2005-06-21 Thread Heather Turner
Agreed my examples may be trivial and I'm sure there are more efficient ways to 
do the same thing, but I disagree that my examples are against the spirit of 
sweep (). In the first example a vector is swept out from the rows, with one 
value for odd rows and one value for even rows. In the second example an array 
of values is swept out across the third dimension. In the third example an 
array of values is swept out from the full array.

The first example is a natural use of recycling. E.g. 

sweep(matrix(1:100, 50, 2), 1, c(1, 1000), "+", give.warning = TRUE)

is a quicker way of writing

sweep(matrix(1:100, 50, 2), 1, rep(c(1, 1000), 25), "+", give.warning = TRUE)

but your code would give a warning in the first case, even though the intent 
and the result are exactly the same as in the second case.

As you say, it is only a warning, that can be ignored. However the warning 
should at least reflect the warning condition used, i.e. warn that the length 
of STATS does not equal the extent of MARGIN, rather warning that STATS does 
not recycle exactly.

Heather

>>> Robin Hankin <[EMAIL PROTECTED]> 06/21/05 02:47pm >>>
Hi

On Jun 21, 2005, at 02:33 pm, Heather Turner wrote:

> I think the warning condition in Robin's patch is too harsh - the 
> following examples seem reasonable to me, but all produce warnings
>
> sweep(array(1:24, dim = c(4,3,2)), 1, 1:2, give.warning = TRUE)
> sweep(array(1:24, dim = c(4,3,2)), 1, 1:12, give.warning = TRUE)
> sweep(array(1:24, dim = c(4,3,2)), 1, 1:24, give.warning = TRUE)
>


The examples above do give warnings (as intended) but I think all three 
cases above
are inimical to the spirit of sweep(): nothing is being "swept" out.

So a warning is appropriate, IMO.

In any case, one can always suppress (or ignore!) a warning if one knows
what one is doing.  YMMV, but if I wanted to do the above operations I 
would
replace


sweep(array(0, dim = c(4,3,2)), c(1,3), 1:12, "+" , give.warning = 
FALSE)

with

  aperm(array(1:12,c(4,2,3)),c(1,3,2))


best wishes

rksh







> I have written an alternative (given below) which does not give 
> warnings in the above cases, but does warn in the following case
>
>> sweep(array(1:24, dim = c(4,3,2)), 1:2, 1:3)
> , , 1
>
>  [,1] [,2] [,3]
> [1,]036
> [2,]039
> [3,]069
> [4,]369
>
> , , 2
>
>  [,1] [,2] [,3]
> [1,]   12   15   18
> [2,]   12   15   21
> [3,]   12   18   21
> [4,]   15   18   21
>
> Warning message:
> STATS does not recycle exactly across MARGIN
>
> The code could be easily modified to warn in other cases, e.g. when 
> length of STATS is a divisor of the corresponding array extent (as in 
> the first example above, with length(STATS) = 2).
>
> The code also includes Gabor's suggestion.
>
> Heather
>
> sweep <- function (x, MARGIN, STATS, FUN = "-", warn = 
> getOption("warn"), ...)
> {
> FUN <- match.fun(FUN)
> dims <- dim(x)
> perm <- c(MARGIN, (1:length(dims))[-MARGIN])
> if (warn >= 0) {
> s <- length(STATS)
> cumDim <- c(1, cumprod(dims[perm]))
> if (s > max(cumDim))
> warning("length of STATS greater than length of array",
> call. = FALSE)
> else {
> upper <- min(ifelse(cumDim > s, cumDim, max(cumDim)))
> lower <- max(ifelse(cumDim < s, cumDim, min(cumDim)))
> if (any(upper %% s != 0, s %% lower != 0))
> warning("STATS does not recycle exactly across MARGIN",
> call. = FALSE)
> }
> }
> FUN(x, aperm(array(STATS, dims[perm]), order(perm)), ...)
> }
>
>>>> Gabor Grothendieck <[EMAIL PROTECTED]> 06/21/05 01:25pm >>>
> \
> Perhaps the signature should be:
>
>sweep(...other args go here..., warn=getOption("warn"))
>
> so that the name and value of the argument are consistent with
> the R warn option.
>
> On 6/21/05, Robin Hankin <[EMAIL PROTECTED]> wrote:
>>
>> On Jun 20, 2005, at 04:58 pm, Prof Brian Ripley wrote:
>>
>>> The issue here is that the equivalent command array(1:5, c(6,6)) (to
>>> matrix(1:5,6,6)) gives no warning, and sweep uses array().
>>>
>>> I am not sure either should: fractional recycling was normally 
>>> allowed
>>> in S3 (S4 tightened up a bit).
>>>
>>> Perhaps someone who thinks sweep() should warn could contribute a
>>> tested patch?
>>>
>>
>>
>> OK,  modified R code and Rd file below (is this the best way to do
>> this?)
>>
&g

Re: [R] sweep() and recycling

2005-06-21 Thread Heather Turner
I think the warning condition in Robin's patch is too harsh - the following 
examples seem reasonable to me, but all produce warnings

sweep(array(1:24, dim = c(4,3,2)), 1, 1:2, give.warning = TRUE)
sweep(array(1:24, dim = c(4,3,2)), 1, 1:12, give.warning = TRUE)
sweep(array(1:24, dim = c(4,3,2)), 1, 1:24, give.warning = TRUE)

I have written an alternative (given below) which does not give warnings in the 
above cases, but does warn in the following case

> sweep(array(1:24, dim = c(4,3,2)), 1:2, 1:3)
, , 1

 [,1] [,2] [,3]
[1,]036
[2,]039
[3,]069
[4,]369

, , 2

 [,1] [,2] [,3]
[1,]   12   15   18
[2,]   12   15   21
[3,]   12   18   21
[4,]   15   18   21

Warning message:
STATS does not recycle exactly across MARGIN

The code could be easily modified to warn in other cases, e.g. when length of 
STATS is a divisor of the corresponding array extent (as in the first example 
above, with length(STATS) = 2).

The code also includes Gabor's suggestion.

Heather

sweep <- function (x, MARGIN, STATS, FUN = "-", warn = getOption("warn"), ...) 
{
FUN <- match.fun(FUN)
dims <- dim(x)
perm <- c(MARGIN, (1:length(dims))[-MARGIN])
if (warn >= 0) {
s <- length(STATS)
cumDim <- c(1, cumprod(dims[perm]))
if (s > max(cumDim))
warning("length of STATS greater than length of array",
call. = FALSE)
else {
upper <- min(ifelse(cumDim > s, cumDim, max(cumDim)))
lower <- max(ifelse(cumDim < s, cumDim, min(cumDim)))
if (any(upper %% s != 0, s %% lower != 0)) 
warning("STATS does not recycle exactly across MARGIN",
call. = FALSE)
}
}
FUN(x, aperm(array(STATS, dims[perm]), order(perm)), ...)
}

>>> Gabor Grothendieck <[EMAIL PROTECTED]> 06/21/05 01:25pm >>>
\
Perhaps the signature should be:

   sweep(...other args go here..., warn=getOption("warn"))

so that the name and value of the argument are consistent with
the R warn option.

On 6/21/05, Robin Hankin <[EMAIL PROTECTED]> wrote:
> 
> On Jun 20, 2005, at 04:58 pm, Prof Brian Ripley wrote:
> 
> > The issue here is that the equivalent command array(1:5, c(6,6)) (to
> > matrix(1:5,6,6)) gives no warning, and sweep uses array().
> >
> > I am not sure either should: fractional recycling was normally allowed
> > in S3 (S4 tightened up a bit).
> >
> > Perhaps someone who thinks sweep() should warn could contribute a
> > tested patch?
> >
> 
> 
> OK,  modified R code and Rd file below (is this the best way to do
> this?)
> 
> 
> 
> 
> "sweep" <-
>   function (x, MARGIN, STATS, FUN = "-", give.warning = FALSE, ...)
> {
>   FUN <- match.fun(FUN)
>   dims <- dim(x)
>   if(give.warning & length(STATS)>1 & any(dims[MARGIN] !=
> dim(as.array(STATS{
> warning("array extents do not recycle exactly")
>   }
>   perm <- c(MARGIN, (1:length(dims))[-MARGIN])
>   FUN(x, aperm(array(STATS, dims[perm]), order(perm)), ...)
> }
> 
> 
> 
> 
> 
> 
> 
> \name{sweep}
> \alias{sweep}
> \title{Sweep out Array Summaries}
> \description{
>   Return an array obtained from an input array by sweeping out a summary
>   statistic.
> }
> \usage{
> sweep(x, MARGIN, STATS, FUN="-", give.warning = FALSE, \dots)
> }
> \arguments{
>   \item{x}{an array.}
>   \item{MARGIN}{a vector of indices giving the extents of \code{x}
> which correspond to \code{STATS}.}
>   \item{STATS}{the summary statistic which is to be swept out.}
>   \item{FUN}{the function to be used to carry out the sweep.  In the
> case of binary operators such as \code{"/"} etc., the function name
> must be quoted.}
>   \item{give.warning}{Boolean, with default \code{FALSE} meaning to
>   give no warning, even if array extents do not match.  If
>   \code{TRUE}, check for the correct dimensions and if a
>   mismatch is detected, give a suitable warning.}
>   \item{\dots}{optional arguments to \code{FUN}.}
> }
> \value{
>   An array with the same shape as \code{x}, but with the summary
>   statistics swept out.
> }
> \note{
>   If \code{STATS} is of length 1, recycling is carried out with no
>   warning irrespective of the value of \code{give.warning}.
> }
> 
> \references{
>   Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988)
>   \emph{The New S Language}.
>   Wadsworth \& Brooks/Cole.
> }
> \seealso{
>   \code{\link{apply}} on which \code{sweep} used to be based;
>   \code{\link{scale}} for centering and scaling.
> }
> \examples{
> require(stats) # for median
> med.att <- apply(attitude, 2, median)
> sweep(data.matrix(attitude), 2, med.att)# subtract the column medians
> 
> a <- array(0, c(2, 3, 4))
> b <- matrix(1:8, c(2, 4))
> sweep(a, c(1, 3), b, "+", give.warning = TRUE)  # no warning:
> all(dim(a)[c(1,3)] == dim(b))
> sweep(a, c(1, 2), b, "+", give.warning = TRUE)  # warning given
> 
> }
> \keyword{array}
> \keyword{iteration}
> 
> 
> 
> 
> --
> Robin Hankin
> Uncertainty Analyst
> National Oceanography

Re: [R] sweep() and recycling

2005-06-20 Thread Heather Turner


Dr H Turner
Research Assistant
Dept. of Statistics
The University of Warwick
Coventry
CV4 7AL

Tel: 024 76575870

>>> Robin Hankin <[EMAIL PROTECTED]> 06/20/05 10:32am >>>
Hi


I had a hard-to-find bug in some of my code the other day, which I 
eventually
traced to my misusing of sweep().

I would expect sweep() to give
me a warning if the elements don't recycle nicely, but

  X <- matrix(1:36,6,6)
  sweep(X,1,1:5,"+")
  [,1] [,2] [,3] [,4] [,5] [,6]
[1,]29   16   23   30   32
[2,]4   11   18   25   27   34
[3,]6   13   20   22   29   36
[4,]8   15   17   24   31   38
[5,]   10   12   19   26   33   40
[6,]7   14   21   28   35   37

gives no warning, even though (5,36)=1.

Also,


sweep(X,1,c(1,1000),"+")

and

sweep(X,2,c(1,1000),"+")


behave as expected,   But


sweep(X,1,1:5,"+")

and

  sweep(X,2,1:5,"+")



are identical!

...which is also expected since the matrix is square. Perhaps the
following will help you see why:
> matrix(1:5,6,6)
 [,1] [,2] [,3] [,4] [,5] [,6]
[1,]123451
[2,]234512
[3,]345123
[4,]451234
[5,]512345
[6,]123451
Warning message:
data length [5] is not a sub-multiple or multiple of the number of rows
[6] in matrix 
> matrix(1:5,6,6, byrow = TRUE)
 [,1] [,2] [,3] [,4] [,5] [,6]
[1,]123451
[2,]234512
[3,]345123
[4,]451234
[5,]512345
[6,]123451
Warning message:
data length [5] is not a sub-multiple or multiple of the number of rows
[6] in matrix 
i.e. recycling a vector of length 5 down the columns gives the same
result as recycling the same vector along the rows.

I agree with your main point however, it would be useful if 'sweep'
would give a warning when the length of the vector of statistics to be
swept out was not a sub-multiple or multiple of the corresponding array
dimension.

Heather

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] using eval() with pre-built expression inside function

2005-01-25 Thread Heather Turner
Sorry, I forgot to add to my original post that I want to use the "envir" 
argument of eval() so that I can specify a secondary data source as follows:

eval(expression(model.frame(formula, data = lookHereFirst), envir = 
lookHereSecond, enclos = lookHereThird)

Your version of f1 does produce the desired model frame, but does this by 
passing all the data to model.frame() directly, i.e. it evaluates

model.frame(formula = y~x, data = dataset)

However, if I adapt your f1 as follows:

 f1 <- function(formula, data) {
 mf <- match.call(expand.dots = FALSE)
 mf[[1]] <- as.name("model.frame")
 print(mf)
 mf
 }
 
it still works if some of the data is provided through the "data" argument of 
model.frame() and the rest of the data is provided through the "envir" argument 
of eval():
> test1 <- f1(formula = y ~ x, data = dataset[,1, drop = FALSE])
model.frame(formula = y ~ x, data = dataset[, 1, drop = FALSE])
> eval(test1, dataset[,2,drop = FALSE], parent.frame())
  y x
1  66.71259 1
2  83.21937 1
3  85.24412 1
4  88.10189 1
5  75.71918 1


So you have provided a solution, but I still don't understand why I can't 
construct the call as I was trying to before:

f2 <- function(formula, data) {
mf <- as.call(c(as.name("model.frame"), formula = formula, 
data = substitute(data)))
print(mf)
mf
}

This doesn't work:

> test2 <- f2(formula = y ~ x, data = dataset[,1, drop = FALSE])
model.frame(formula = y ~ x, data = dataset[, 1, drop = FALSE])
> eval(test2, dataset[,2,drop = FALSE], parent.frame())
Error in eval(expr, envir, enclos) : Object "x" not found

Apparently test1 and test2 are not identical, but I can't see why, which may be 
the root of my problem here. Can anyone explain why they are different?

Thanks

Heather

>>> "Roger D. Peng" <[EMAIL PROTECTED]> 01/24/05 04:48pm >>>
If you look at the beginning of lm(), you'll see that match.call() is 
used and name of the function (in this case "f1") is replaced with 
"model.frame".  Does something like this work?

f1 <- function(formula, data) {
mf <- match.call(expand.dots = FALSE)
mf[[1]] <- as.name("model.frame")
eval(mf, parent.frame())
}

-roger

Heather Turner wrote:
> I'm trying to evaluate a pre-built expression using eval(), e.g.
> 
> dataset <- data.frame(y = runif(30, 50,100), x = gl(5, 6))
> 
> # one like this
> mf <- expression(model.frame(y~x))
> eval(mf, dataset, parent.frame())
> 
> # rather than this
> eval(expression(model.frame(y~x)), dataset, parent.frame())
> 
> In the example above there is no problem, the problem comes when I try to do 
> a similar thing within a function, e.g.
> 
> f1 <- function(formula, data) {
> mt <- terms(formula)
> mf <- as.expression(as.call(c(as.name("model.frame"), formula = mt)))
> eval(mf, data, parent.frame())
> }
> 
> 
>>f1(formula = y ~ x, data = dataset)
> 
> Error in eval(expr, envir, enclos) : Object "y" not found
> 
> I can get round this by building a call to eval using paste, e.g.
> 
> f2 <- function(formula, data) {
> mt <- terms(formula)
> mf <- as.expression(as.call(c(as.name("model.frame"), formula = mt)))
> direct <- parse(text = paste("eval(expression(", mf,
>"), data, parent.frame())"))
> print(direct)
> eval(direct)
> }
> 
> 
>>f2(formula = y ~ x, data = dataset)
> 
> expression(eval(expression(model.frame(formula = y ~ x)), data, 
> parent.frame()))
>   y x
> 1  92.23087 1
> 2  63.43658 1
> 3  55.24448 1
> 4  72.75650 1
> 5  67.58781 1
> ...
> 
> but this seems rather convoluted. Can anyone explain why f1 doesn't work 
> (when f2 does) and/or suggest a neater way of dealing with this?
> 
> Thanks
> 
> Heather
> 
> Mrs H Turner
> Research Assistant
> Dept. of Statistics
> University of Warwick
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help 
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html 
> 

-- 
Roger D. Peng
http://www.biostat.jhsph.edu/~rpeng/

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] using eval() with pre-built expression inside function

2005-01-24 Thread Heather Turner
I'm trying to evaluate a pre-built expression using eval(), e.g.

dataset <- data.frame(y = runif(30, 50,100), x = gl(5, 6))

# one like this
mf <- expression(model.frame(y~x))
eval(mf, dataset, parent.frame())

# rather than this
eval(expression(model.frame(y~x)), dataset, parent.frame())

In the example above there is no problem, the problem comes when I try to do a 
similar thing within a function, e.g.

f1 <- function(formula, data) {
mt <- terms(formula)
mf <- as.expression(as.call(c(as.name("model.frame"), formula = mt)))
eval(mf, data, parent.frame())
}

> f1(formula = y ~ x, data = dataset)
Error in eval(expr, envir, enclos) : Object "y" not found

I can get round this by building a call to eval using paste, e.g.

f2 <- function(formula, data) {
mt <- terms(formula)
mf <- as.expression(as.call(c(as.name("model.frame"), formula = mt)))
direct <- parse(text = paste("eval(expression(", mf,
   "), data, parent.frame())"))
print(direct)
eval(direct)
}

> f2(formula = y ~ x, data = dataset)
expression(eval(expression(model.frame(formula = y ~ x)), data, 
parent.frame()))
  y x
1  92.23087 1
2  63.43658 1
3  55.24448 1
4  72.75650 1
5  67.58781 1
...

but this seems rather convoluted. Can anyone explain why f1 doesn't work (when 
f2 does) and/or suggest a neater way of dealing with this?

Thanks

Heather

Mrs H Turner
Research Assistant
Dept. of Statistics
University of Warwick

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] Advice on parsing formulae

2004-12-16 Thread Heather Turner
Okay, well you could define a tvar function as follows
tvar <- function(term) term

Then stick to fm1
X <- model.matrix(fm1,keep.order=TRUE)
pAssign <- attr(X, "assign")

tvar.terms <- terms( fm1, specials = "tvar",keep.order=TRUE )
idx <- attr(tvar.terms,"specials")$tvar
if (attr(tvar.terms,"response")) idx <- idx -1
all.labels <- attr(tvar.terms, "term.labels")
tvar.labels <- all.labels[idx]
tvarAssign <- match(pAssign, match(tvar.labels, all.labels))
tvarAssign[is.na(tvarAssign)] <- 0

I think that the specials attributes is an index of the variables attribute, 
hence I have replaced
if (attr(tvar.terms,"intercept")) idx <- idx -1
with
if (attr(tvar.terms,"response")) idx <- idx -1
check e.g.
fm1 <-  Y ~  tvar(x:A) + tvar(z) + u + tvar(B) + tvar(poly(v,3))
fm1 <-  ~ 1 + tvar(x:A) + tvar(z) + u + tvar(B) + tvar(poly(v,3))

Heather

From:   "Claus Dethlefsen" <[EMAIL PROTECTED]>
To: "'Heather Turner'" <[EMAIL PROTECTED]>, <[EMAIL PROTECTED]>
Date:   12/16/04 8:33am
Subject:RE: [R] Advice on parsing formulae

Thank you for the advice. I have now boiled my problem down to the
following:

How do I create fm2 from fm1 ?

fm1 <-  Y ~ 1 + tvar(x:A) + tvar(z) + u + tvar(B) + tvar(poly(v,3))
fm2 <-  Y ~ 1 + x:A + z + u + B + poly(v, 3)

Thus, how do I simply remove tvar( * ) from a formula? Do I have to write a
function to parse the string and then re-create the formula? Is there an
easy way of doing this?


When my above problem is solved, I can (with the help from Heather Turner
and Chuck Berry) do the following

## define som data
x <- z <- u <- v <- rnorm(5)
A <- B <- factor( rep( c(1,2), c(3,2) ) )

## define my formula fm1 and manually create fm2.
fm1 <-  Y ~ 1 + tvar(x:A) + tvar(z) + u + tvar(B) + tvar(poly(v,3))
fm2 <-  Y ~ 1 + x:A + z + u + B + poly(v, 3)

## extract the term.labels from fm2, make the design matrix and extract
'assign'
term.labels <- unname(sapply(attr(terms(fm2), "term.labels"),
 removeSpace))
X <- model.matrix(fm2,keep.order=TRUE)
pAssign <- attr(X, "assign")

## Now, extract the tvar-terms from fm1
tvar.terms <- terms( fm1, specials = "tvar",keep.order=TRUE )
idx <- attr(tvar.terms,"specials")$tvar
if (attr(tvar.terms,"intercept")) idx <- idx -1
tvar <- attr(terms(fm2,keep.order=TRUE),"term.labels")[idx]
tvar <- unname( sapply( tvar, removeSpace) )

## Finally, combine the information to get the vector I asked for
tvarAssign <- match(pAssign, sort(match(tvar, term.labels)))
tvarAssign[is.na(tvarAssign)] <- 0

Mrs H Turner
Research Assistant
Dept. of Statistics
University of Warwick

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Advice on parsing formulae

2004-12-15 Thread Heather Turner
I think this will do what you want:

# Need this function to remove spaces from term labels later on
> removeSpace <-  function(string) gsub("[[:space:]]", "", string)

# Specify which terms are in a "tvar" group
# (could remove spaces separately)
> tvar <- unname(sapply(c("x:A", "z", "B", "poly(v,3)"), removeSpace))

# Use terms to get term labels from formula
> formula <- Y ~ 1 + x:A + z + u + B + poly(v,3)
> term.labels <- unname(sapply(attr(terms(formula), "term.labels"), 
> removeSpace))
> tvar
[1] "x:A"   "z" "B" "poly(v,3)"
> term.labels
[1] "z" "u" "B" "poly(v,3)" "x:A"

# Get assign variable for parameters
# (You would use first two lines, but I don't have data so defined assign 
variable myself)
> #X <- model.matrix(formula)
> #pAssign <- attr(X, "assign")
> pAssign <- c(0,1,2,3,4,4,4,5,5)  

# Define "tvarAssign"
> tvarAssign <- match(pAssign, sort(match(tvar, term.labels)))
> tvarAssign[is.na(tvarAssign)] <- 0
> tvarAssign
[1] 0 1 0 2 3 3 3 4 4

HTH

Heather

Mrs H Turner
Research Assistant
Dept. of Statistics
University of Warwick

>>> "Claus Dethlefsen" <[EMAIL PROTECTED]> 12/13/04 04:10pm >>>
Dear list

I would like to be able to group terms in a formula using a function that I
will call tvar(), eg. the formula

Y ~ 1 + tvar(x:A) + tvar(z) + u + tvar(B) + tvar(poly(v,3))

where x,u and v are numeric and A and B are factors - binary, say.

As output, I want the model.matrix as if tvar had not been there at all. In
addition, I would like to have information on the grouping, as a vector as
long as ncol( model.matrix ) with zeros corresponding to terms outside tvar
and with an index grouping the terms inside each tvar(). In the (sick)
example:


> model.matrix(Y ~ 1 + tvar(x:A) + tvar(z) + u + tvar(B) + tvar(poly(v,3)))
   (Intercept) z u B2 poly(v, 3)1 poly(v, 3)2 poly(v, 3)3  x:A1
x:A2
11 -1.55 -1.03  0   0.160  -0.350  -0.281  0.66
0.00
21 -1.08  0.55  0  -0.164  -0.211   0.340  0.91
0.00
31  0.29 -0.26  0  -0.236  -0.073   0.311 -1.93
0.00
41 -1.11  0.96  0   0.222  -0.285  -0.385 -0.23
0.00
51  0.43 -0.76  1  -0.434   0.515  -0.532  0.22
0.00

I would like the vector

c(0,1,0,2,3,3,3,4,4)

pointing to the tvar-grouped terms.

Thus what I would like, looks a bit like the 'assign' attribute of the
model.matrix() output. I have not figured out a way of doing this in a nice
way and would like some help, please.

I hope somebody can help me (or point the manual-pages I should read),

Best, 

Claus Dethlefsen

---
Assistant Professor, Claus Dethlefsen, Ph.D.
mailto:[EMAIL PROTECTED], http://www.math.auc.dk/~dethlef 
Dpt. of Mathematical Sciences, Aalborg University
Fr. Bajers Vej 7G, 9220 Aalborg East
Denmark

-- 
No virus found in this outgoing message.
Checked by AVG Anti-Virus.

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html