> On 20 Sep 2016, at 11:34, Michael Haenlein <haenl...@escpeurope.eu> wrote: > > Dear all, > > I am trying to estimate a lm model with one continuous dependent variable > and 11 independent variables that are all categorical, some of which have > many categories (several dozens in some cases).
If I’m not wrong, ( I assume that categorical variables are in factor form) lm will pick the most crowded categories and will try to fit a linear model over them. (This might be wrong, please correct me somebody) > > I am not interested in statistical inference to a larger population. The > objective of my model is to find a way to best predict my continuous > variable within the sample. The best pick would be a CART ( Classification and Reg. Tree, rpart) or CIT (Conditional Inference Tree, ctree) model to predict continous response variable by categorical variables. Please, see new partykit (old party) package for CIT. > > When I run the lm model I evidently get many regression coefficients that > are not significant. Is there some way to automatically combine levels of a > categorical variable together if the regression coefficients for the > individual levels are not significant? > > My idea is to find some form of grouping of the different categories that > allows me to work with less levels while keeping or even improving the > quality of predictions. I also want to mention cforest here, you can measure the importance of your predictor variables. I would recommend partykit package for categorical predictors, but also you can give it a try to rpart. > > Thanks, > > Michael > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.