Hi, there:
Following yesterday's question ( i had a new level for a categorical
variable occurred in validation dataset and predict() complains about
it: i made some python code to solve the problem), but here, I am just
curious about some details about the mechanism:

I believed rpart follows CART and for a categorical variable, the
splitting criteria should be like,
is it A or not?
   --yes, go to left branch
   --no, go to right

So, when you predict, if you have a new level C,for example,
the predict() should not complain about the occurrence of "C" (of
course, if there are many "C"'s in validation, it should complain).
Maybe for robustness, predict() has to check first if there is new
level or not.

I am not sure if my understanding is right or not, please be advised!

Thanks,

-- 
Weiwei Shi, Ph.D

"Did you always know?"
"No, I did not. But I believed..."
---Matrix III

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Reply via email to