I can't grasp how it can be that the mean prediction at terminal nodes
perfectly fit the true mean values of the observed variable at the terminal
nodes -
I'm afraid I'm missing something completely obviuos here:
# make a regression tree:
rt <- ctree(Ozone ~ ., data = airq)
# Validate:
Prediction <- unlist(treeresponse(rt))
(Val <- data.frame(Node = rt@where,
Prediction, True = airq$Ozone))
# compare mean prediction per node
# with observed mean values per node:
options(scipen = 999)
cbind(aggregate(True ~ Node, FUN = mean, data = Val),
Pred = aggregate(Prediction ~ Node, FUN = mean, data = Val)[, 2])
# also, plot predictions vs. true values:
plot(Val$Prediction, Val$True)
coef <- coef(lm(Val$Prediction ~ Val$True))
abline(c(0, coef[1]), c(1, coef[2]))
myseq <- seq(0, 75, 25)
abline(v = myseq, h = myseq)
[[alternative HTML version deleted]]
_______________________________________________
R-sig-ecology mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology