I suggest that you use the P-values from the full-sample
fit.

Do know that proportion correctly classified is an
improper scoring rule, i.e., it can easily be
optimized by a model that in most other ways is
inferior, especially by a model that is very
poorly calibrated.  One can add a significant
predictor to the model and have the proportion
correctly classified decrease.

10-fold cross-validation is not as accurate as
the bootstrap for estimating likely forecast accuracy.
One can average roughly 20 10-fold CV estimates to
get the same accuracy as a bootstrap with fewer than
200 resamples.  The bootstrap is easy to program
and can correctly penalize for model uncertainty
caused by variable selection (unlike cross-val).

Frank Harrell
-- 
Frank E Harrell Jr              Prof. of Biostatistics & Statistics
Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences
U. Virginia School of Medicine  http://hesweb1.med.virginia.edu/biostat

Roberta Nacif wrote:
> 
> Dear List members,
> 
> I am studying customers' repatronage decisions using logistic regression
> and
> tobit.
> 
> I analysed a data set of 516 observations (516 customers' purchasing
> history
> and survey measures) using 10 fold cross-validation and logistic
> regression.
> To validate the logistic regression results, the logistic regression
> classifiers were trained on 9 folds and tested on 1 fold.  This was
> performed 10 times, each time using other training and testing folds.
> Hence, I obtained 10 p-values and coefficients for each input.
> 
> With this procedure I averaged the several clasification matrix to
> obtain a
> single meausre (as recomended in Hair et al., p. 275). However, my
> problem
> is that I want to aggregate these 10 p-values and coefficients into one
> measure of input significance. Unlike the classification matrix, I have
> the
> feeling that I can not average these p-values.... but I am not sure.
> 
> Do you have any suggestions how I can do this? For marketing purposes
> the
> classification accuracy is important, but the p-values (and log-odds)
> are as
> important because they will allow me to test different hypotheses.
> 
> I thank you for any literature reference or idea on how I can solve this
> 
> problem.
> 
> --
> Remove the string "dot" from e-mail address.


=================================================================
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
                  http://jse.stat.ncsu.edu/
=================================================================

Reply via email to