> On 20 Sep 2016, at 11:34, Michael Haenlein <haenl...@escpeurope.eu> wrote:
> 
> Dear all,
> 
> I am trying to estimate a lm model with one continuous dependent variable
> and 11 independent variables that are all categorical, some of which have
> many categories (several dozens in some cases).

If I’m not wrong, ( I assume that categorical variables are in factor form) lm 
will pick the most crowded categories and will try to fit a linear model over 
them. (This might be wrong, please correct me somebody)

> 
> I am not interested in statistical inference to a larger population. The
> objective of my model is to find a way to best predict my continuous
> variable within the sample.

The best pick would be a CART ( Classification and Reg. Tree, rpart) or CIT 
(Conditional Inference Tree, ctree) model to predict continous response 
variable by categorical variables. Please, see new partykit (old party) package 
for CIT.

> 
> When I run the lm model I evidently get many regression coefficients that
> are not significant. Is there some way to automatically combine levels of a
> categorical variable together if the regression coefficients for the
> individual levels are not significant?


> 
> My idea is to find some form of grouping of the different categories that
> allows me to work with less levels while keeping or even improving the
> quality of predictions.

I also want to mention cforest here, you can measure the importance of your 
predictor variables. I would recommend partykit package for categorical 
predictors, but also you can give it a try to rpart.

> 
> Thanks,
> 
> Michael
> 
>       [[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to