Hello list,

I'm trying to generate classifiers for a certain task using several
methods, one of them being decision trees. The doubts come when I want to
estimate the cross-validation error of the generated tree:

tree <- rpart(y~., data=data.frame(xsel, y), cp=0.00001)
ptree <- prune(tree,
cp=tree$cptable[which.min(tree$cptable[,"xerror"]),"CP"])
ptree$cptable


           CP nsplit rel error xerror       xstd
1  0.33120000      0    1.0000 1.0000 0.02856022
2  0.08640000      1    0.6688 0.6704 0.02683544
3  0.02986667      2    0.5824 0.5856 0.02584564
4  0.02880000      5    0.4928 0.5760 0.02571738
5  0.01920000      6    0.4640 0.5168 0.02484761
6  0.01440000      8    0.4256 0.5056 0.02466708
7  0.00960000     12    0.3552 0.5024 0.02461452
8  0.00880000     15    0.3264 0.4944 0.02448120
9  0.00800000     17    0.3088 0.4768 0.02417800
10 0.00480000     25    0.2448 0.4672 0.02400673


If I got it right, "xerror" stands for the cross-validation error (using
10-fold by default), this is pretty high (0.4672 over 1). However, if I do
something similar using tune from e1071 I get a much lower error:


treetune <- tune(rpart, y~., data=data.frame(xsel, y), predict.func =
treeClassPrediction, cp=0.0048)

> treetune$best.performance[1] 0.2243049


I'm also assuming that the performance returned by "tune" is the
cross-validation error (also 10-fold by default). So where does this
enormous difference come from? What am I missing?

Also, "rel error" is the relative error in the training set? The
documentation is not very descriptive:
cptable- the table of optimal prunings based on a complexity parameter.

Thanks and happy pre-new year,
-- israel

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to