I am trying to understand ``deviance'' in classification tree output
from tree package.

library(tree)

set.seed(911)
mydf <- data.frame(
    name = as.factor(rep(c("A", "B"), c(10, 10))),
    x = c(rnorm(10, -1), rnorm(10, 1)),
    y = c(rnorm(10, 1), rnorm(10, -1)))

mytree <- tree(name ~ ., data = mydf)

mytree
# node), split, n, deviance, yval, (yprob)
#       * denotes terminal node

# 1) root 20 27.730 A ( 0.5 0.5 )  
#   2) y < -0.00467067 10  6.502 B ( 0.1 0.9 )  
#     4) x < 1.50596 5  5.004 B ( 0.2 0.8 ) *
#     5) x > 1.50596 5  0.000 B ( 0.0 1.0 ) *
#   3) y > -0.00467067 10  6.502 A ( 0.9 0.1 )  
#     6) x < -0.578851 5  0.000 A ( 1.0 0.0 ) *
#     7) x > -0.578851 5  5.004 A ( 0.8 0.2 ) *

# Replicate results for node 2
# Probabilities tie out
with(subset(mydf, y < -0.00457), table(name))
# name
# A B 
# 1 9

# Cannot replicate deviance = -1 * sum(p_mk * log(p_mk))
0.1 * log(0.1) + 0.9 * log(0.9)
# [1] 0.325083

1.  In the documentation, is it possible to find the definition of
deviance?
2.  Is it possible to see the code where it calculates deviance?

Thanks,
Naresh

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to