On 6/15/07, Philipp Benner <[EMAIL PROTECTED]> wrote: > > Thanks for your explanation! > > > With this in mind, either of the following might do what you want: > > > > badFunction <- function(mydata, myformula) { > > mydata$myweight <- abs(rnorm(nrow(mydata))) > > hyp <- > > rpart(myformula, > > data=mydata, > > weights=myweight, > > method="class") > > prev <- hyp > > } > > > > > > badFunction <- function(mydata, myformula) { > > myweight <- abs(rnorm(nrow(mydata))) > > environment(myformula) <- environment() > > hyp <- > > rpart(myformula, > > data=mydata, > > weights=myweight, > > method="class") > > prev <- hyp > > } > > OK, this is what I have now: > > adaboostBad <- function(formula, data) { > ## local definition of the weight vector (won't work because pima.formula > is not defined within this function) > w <- abs(rnorm(nrow(data))) > rpart(formula, data=data, weights=w) > } > > adaboostGood <- function(formula, data) { > ## create weight vector in the data object > data$w <- abs(rnorm(nrow(data))) > rpart(formula, data=data, weights=w) > } > > adaboostBest <- function(formula, data) { > ## associate the current environment (this function's one) with the object > `formula' > environment(formula) <- environment() > w <- abs(rnorm(nrow(data))) > rpart(formula, data=data, weights=w) > } >
> As far as I understand this non-standard evaluation stuff, > adaboostGood() and adaboostBest() are the only two possibilities to > call rpart() with weight vectors. Now suppose that I don't know what > `data' contains and suppose further that it already contains a > column called `w'. adaboostGood() would overwrite that column with > new data which is then used as weight vector and as training data > for rpart(). adaboostBest() would just use the wrong data as weight > vector as it finds data$w before the real weight vector. So, in both > cases I have to check for `names(data) == "w"` and stop if TRUE? Or > is there a better way? Well, that depends on what you want to happen when there is a column called 'w' in data. I don't see a situation where it makes sense to use data$w as weights ('w' is just a name you happen to choose inside adaboostBest), so I would just go with adaboostGood. In case you are worried about overwriting the original data, that may not be happening in the sense you are thinking. When you say data$w <- abs(rnorm(nrow(data))) inside adaboostGood, that modifies a local copy of the data argument, not the original (R argument semantics are call by value, not call by reference). You are losing data$w in the local copy in your function, but why would you care if you are not using it anyway. Of course, if your formula contains a reference to 'w' then you will get wrong results, so checking for a unique name is always safer. In addition, use an obfuscated name like '.__myWeights' instead of 'w', and the check will be almost always irrelevant. -Deepayan ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.