[R] Importance of levels in a factor variable

Saeed Abu Nimeh Thu, 26 Aug 2010 12:44:55 -0700

I have a dataset of multiple variables and a response. For example,
> str(x)
'data.frame':   3557238 obs. of  44 variables:
 $ response :  Factor w/ 2 levels
 $ var2: Factor w/5000 levels



If var2 for example is a factor with 5000 levels, what is the best
approach to determine which of these levels is the most important to
include in building the model, and which ones to discard. Assuming
there is a way to do that, is it accurate to only include the
important levels and discard the rest for that variable when building
the model.
Thansk,
Saeed

---
> sessionInfo()
R version 2.10.1 (2009-12-14)
x86_64-pc-linux-gnu
32 GB RAM

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Importance of levels in a factor variable

Reply via email to