You need statistical help, which is generally off topic here. I suggest you post to a statistcal site like stats.stackexchange.com instead. Better yet, find a local statistical expert with whom you can consult.
Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Tue, Sep 20, 2016 at 1:34 AM, Michael Haenlein <haenl...@escpeurope.eu> wrote: > Dear all, > > I am trying to estimate a lm model with one continuous dependent variable > and 11 independent variables that are all categorical, some of which have > many categories (several dozens in some cases). > > I am not interested in statistical inference to a larger population. The > objective of my model is to find a way to best predict my continuous > variable within the sample. > > When I run the lm model I evidently get many regression coefficients that > are not significant. Is there some way to automatically combine levels of a > categorical variable together if the regression coefficients for the > individual levels are not significant? > > My idea is to find some form of grouping of the different categories that > allows me to work with less levels while keeping or even improving the > quality of predictions. > > Thanks, > > Michael > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.