Re: [R] Logistic regression X^2 test with large sample size (fwd)

David Winsemius Tue, 31 Jul 2012 12:24:37 -0700


On Jul 31, 2012, at 10:25 AM, M Pomati wrote:

Marc, thank you very much for your help.
I've posted in on
<http://math.stackexchange.com/questions/177252/x2-tests-to-compare-the-fit-of-large-samples-logistic-models>
and added details.

I think you might have gotten a more statistically knowledgeableaudience at:


http://stats.stackexchange.com/

(And I suggested to the moderators at math-SE that it be migrated.)

--
David.

Many thanks

Marco
--On 31 July 2012 11:50 -0500 Marc Schwartz <marc_schwa...@me.com>wrote:
On Jul 31, 2012, at 10:35 AM, M Pomati <marco.pom...@bristol.ac.uk>wrote:
Does anyone know of any X^2 tests to compare the fit of logisticmodelswhich factor out the sample size? I'm dealing with a very largesample andI fear the significant X^2 test I get when adding a variable tothe model
is simply a result of the sample size (>200,000 cases).
I'd rather use the whole dataset instead of taking (small) randomsamples
as it is highly skewed. I've seen things like Phi and Cramer's V for
crosstabs but I'm not sure whether they have been used before onlogistic
regression, if there are better ones and if there are any packages.


Many thanks

Marco
Sounds like you are bordering on some type of stepwise approach to
including or not including covariates in the model. You can searchthe listarchives for a myriad of discussions as to why that is a poorapproach.
You have the luxury of a large sample. You also have the challenge of
interpreting covariates that appear to be statistically significant,butmay have a rather small *effect size* in context. That is wheresubjectmatter experts need to provide input as to interpretation of thecontextualsignificance of the variable, as opposed to the statisticalsignificance of
that same variable.
A general approach, is to simply pre-specify your model based uponrather
simple considerations. Also, you need to determine if your goal forthe
model is prediction or explanation.
What is the incidence of your 'event' in the sample? If it is say10%,
then you should have around 20,000 events. The rule of thumb forlogisticregression is to have around 20 events per covariate degree offreedom (df)
to minimize the risk of over-fitting the model to your dataset. A
continuous covariate is 1 df, a k-level factor is k-1 df. So with20,000events, your model could feasibly have 1,000 covariate df's. I amguessing
that you don't have that much independent data to begin with.
So, pre-specfy your model on the full dataset and stick with it.Interact
with subject matter experts on the interpretation of the model.
BTW, this question is really about statistical modeling generally,not
really R specific. Such queries are best posed to general statistical
lists/forums such as Stack Exchange. I would also point you to Frank
Harrell's book, Regression Modeling Strategies.
Regards,

Marc Schwartz
----------------------
M Pomati
University of Bristol



David Winsemius, MD
Alameda, CA, USA

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Logistic regression X^2 test with large sample size (fwd)

Reply via email to