Hi Chris,


I often do the following:

·        Look at MSE

·        Plot the residuals

·        Plot predicted vs. actual (should be approximately linear)

o   (very similar to the residual plot, but clients seem to understand it
better)

·        Look at the min and max residuals, and think about whether they are
acceptable

·        Look at various quartiles for the residuals  eg 5% and 95%, this
gives me some idea as to what the middle 90% of my residuals are. I then
think about wether this interval is acceptable.

·        If its spatial data I’ll create a map of residuals so I can see if
there are areas where the model is performing poorly.



Chris Howden

Founding Partner

Tricky Solutions

Tricky Solutions 4 Tricky Problems

Evidence Based Strategic Development, IP development, Data Analysis,
Modelling, and Training

(mobile) 0410 689 945

(fax / office) (+618) 8952 7878

ch...@trickysolutions.com.au



*From:* Chris Mcowen [mailto:chrismco...@me.com]
*Sent:* Thursday, 5 August 2010 2:54 PM
*To:* Chris Howden
*Cc:* bbol...@gmail.com; Chris Mcowen; r-sig-ecology@r-project.org
*Subject:* Re: [R-sig-eco] AIC / BIC vs P-Values / MAM



 Then I evaluate the predictive ability of the best few models on a “test
data set” which wasn’t used to create them.

 Hi Chris and Ben,



  This is exactly what intended to do, I took 20 percent of my data set and
left it out of the data I used to build the model.



  I am relatively new to models in general and my PhD supervisors are both
ecology/conservation based. I was therefore wondering if you could offer
some advice as to the best method of evaluating the predictive ability of a
model, both the method for actually predicting the result and then how to
check the confidence. If this is too much to ask ( a workflow) then a few
steps from which I can build upon would be gratefully received.



  Thanks again for your help,



  Chris





Sent from my iPhone


On 5 Aug 2010, at 02:12, Chris Howden <ch...@trickysolutions.com.au> wrote:

 Hi Ben,



Your absolutely right.



Which was why I said you should test the models predictive ability on the
“test data set”. I likely should have it a bit more clear that the “test
data set” isn’t used when building the model. And I agree that Cross
Validation is best, if U have the time and code that does it.



It’s also why I said that using AIC to decide which models to actually
bother testing would be a good idea.



At least that’s the approach I usually use i.e.



1.      Create the model and initially evaluate which are best using AIC,
comparing each models log-likelihood to the Null model and other applicable
models,  and some common sense.



2.      Then I evaluate the predictive ability of the best few models on a
“test data set” which wasn’t used to create them.



Chris Howden

Founding Partner

Tricky Solutions

Tricky Solutions 4 Tricky Problems

Evidence Based Strategic Development, IP development, Data Analysis,
Modelling, and Training

(mobile) 0410 689 945

(fax / office) (+618) 8952 7878

ch...@trickysolutions.com.au



*From:* bbol...@gmail.com [mailto:bbol...@gmail.com]
*Sent:* Thursday, 5 August 2010 10:17 AM
*To:* Chris Howden
*Cc:* Chris Mcowen; r-sig-ecology@r-project.org
*Subject:* Re: Re: [R-sig-eco] AIC / BIC vs P-Values / MAM



On Aug 4, 2010 8:13pm, Chris Howden <ch...@trickysolutions.com.au> wrote:
> Hi Chris,
>
> If u want good predictive ability, which is exactly what u do want when
> using a model for prediction, then why not use its predictive ability as a
> model selection criteria?

Because this will typically lead to overfitting the data, i.e. getting a
great
fit to the 'training' set but then doing miserably on future data? Unless
you do
something like split the data set into a training and a validation set, or
use cross-validation (which is a more sophisticated version of the same
idea),
just finding the model with the best predictive capability on a specified
data set will *not* give you a good model in general. That's why approaches
such as AIC, corrected R^2, and so forth, include a penalty for model
complexity.

Unless I'm missing something really obvious, in which case I apologize.

Ben Bolker

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

Reply via email to