On Mon, 10 May 2004 15:22:14 +0000 (UTC), [EMAIL PROTECTED] wrote:

> A simulation paper by Steyerberg a few years ago showed that the split
> half approach is probably too conservative.  You're better off estimating 
> an a priori model and then using the bootstrap for validation.  If you've 
> "always" had success before wiht the split half method, I'd say you've 
> been a very lucky fellow till now :-)
> 
> Mike Babyak 

Straightforward:  the failure of replication means that the
apparent success in half the sample was due to overfitting.

Anyone can check my stats-FAQ  for some comments posted
years ago about stepwise selection, which is the popular error.

Here is something that I posted June 6, 2003, relevant to
stepwise.

===== from sci.stat.consult .
I was impressed by an argument in this:
"Linear model selection by cross-validation", Jun Shao,  JASA,
vol 88, issue 422 (June 1993), 486-494.  available online 
through JSTOR if you subscribe.

It seemed to make a *certain*  amount of sense -- he argues,
as I understand it, that as N (sample size) gets larger, 
the training-fraction  should approach zero.  [He paints some
logic, and he satisfies my prejudice, that  you need a lot more
replication than most folks figure.]

I like that conclusion,  that replication is tough;
I know that I haven't followed all the reasoning.
======
I never did study that paper more, or hear more about it.
It seems to me that the approach in decision-trees that
was called 'random forests'  is perhaps implicitly using
tiny samples and searching for multiple replications.
 - My doubts about tiny samples-plus-replication are 
summed up by this observation:  If you have evidence 
apparent in multiple, small sections of the sample, then
you will have the same effect measured as, say, 0.0001 (or
better) in the full sample.  Isn't that just another way to 
achieve 'Bonferroni correction' for doing multiple tests?

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html
.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Reply via email to