Re: [R] Coefficients of Logistic Regression from bootstrap - how to get them?

Michal Figurski Wed, 30 Jul 2008 13:08:11 -0700

Tim,

If I understand correctly, you are saying that one can't improve onestimating a mean by doing bootstrap and summarizing means of many suchsteps. As far as I understand (again), you're saying that this way onecan only add bias without any improvement...

Well, this is in contradiction to some guides to bootstrap, that I foundon the web (I did my homework), for example to this one:http://people.revoledu.com/kardi/tutorial/Bootstrap/Lyra/BootstrapStatistic Mean.htm

It is all confusing, guys... Once somebody said, that there are as manyopinions on a topic, as there are statisticians...

Also, translating your statements into the example of hammer and rock,you are saying that one cannot use hammer to break rocks because it wascreated to drive nails.


With all respect, despite my limited knowledge, I do not agree.

The big point is that the mean, or standard error, or confidenceintervals of the data itself are *meaningless* in the pharmacokineticdataset. These data are time series of a highly variable quantity, thatis known to display a peak (or two in the case of Pawinski's paper). Itis as if you tried to calculate a mean of a chromatogram (example forchemists, sorry).

Nevertheless, I thank all of you, experts, for your insight and advice.In the end, I learned a lot, though I keep my initial view. Summarizingyour criticism of the procedure described in Pawinski's paper:- Some of you say that this isn't bootstrap at all. In terms ofterminology I totally submit to that, because I know too little. Wouldanyone suggest a name?- Most of you say that this procedure is not the best one, that thereare better ways. I will definitely do my homework on penalizedregression, though no one of you has actually discredited thismethodology. Therefore, though possibly not optimal, it remains valid.- The criticism on "predictive performance" is that one has to takeinto account also other important quantities, like bias, variance, etc.Fortunately I did that in my work: using RMSE and log residuals from thevalidation process. I just observed that models with relatively smallRMSE and log residuals (compared to other models) usually possess goodpredictive performance. And vice versa.Predictive performance has also a great advantage over RMSE or varianceor anything else suggested here - it is easily understood bynon-statisticians. I don't think it is /too simple/ in Einstein's terms,it's just simple.


Kind regards,

--
Michal J. Figurski


Tim Hesterberg wrote:

I'll address the question of whether you can use the bootstrap to
improve estimates, and whether you can use the bootstrap to "virtually
increase the size of the sample".

Short answer - no, with some exceptions (bumping / Random Forests).

Longer answer:
Suppose you have data (x1, ..., xn) and a statistic ThetaHat,
that you take a number of bootstrap samples (all of size n) and
let ThetaHatBar be the average of those bootstrap statistics from
those samples.

Is ThetaHatBar better than ThetaHat?  Usually not.  Usually it
is worse.  You have not collected any new data, you are just using the
existing data in a different way, that is usually harmful:
* If the statistic is the sample mean, all this does is to add
  some noise to the estimate
* If the statistic is nonlinear, this gives an estimate that
  has roughly double the bias, without improving the variance.

What are the exceptions?  The prime example is tree models (random
forests) - taking bootstrap averages helps smooth out the
discontinuities in tree models.  For a simple example, suppose that a
simple linear regression model really holds:
        y = beta x + epsilon
but that you fit a tree model; the tree model predictions are
a step function.  If you bootstrap the data, the boundaries of
the step function will differ from one sample to another, so
the average of the bootstrap samples smears out the steps, getting
closer to the smooth linear relationship.

Aside from such exceptions, the bootstrap is used for inference
(bias, standard error, confidence intervals), not improving on
ThetaHat.

Tim Hesterberg

Hi Doran,

Maybe I am wrong, but I think bootstrap is a general resampling method which
can be used for different purposes...Usually it works well when you do not
have a presentative sample set (maybe with limited number of samples).
Therefore, I am positive with Michal...

P.S., overfitting, in my opinion, is used to depict when you got a model
which is quite specific for the training dataset but cannot be generalized
with new samples......

Thanks,

--Jerry
2008/7/21 Doran, Harold <[EMAIL PROTECTED]>:

I used bootstrap to virtually increase the size of my
dataset, it should result in estimates more close to that
from the population - isn't it the purpose of bootstrap?

No, not really. The bootstrap is a resampling method for variance
estimation. It is often used when there is not an easy way, or a closed
form expression, for estimating the sampling variance of a statistic.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Coefficients of Logistic Regression from bootstrap - how to get them?

Reply via email to