Re: [R] Fitting exponential curve to data points

2007-07-30 Thread Stephen Tucker
Sorry, just got back into town.

I wonder if AIC, BIC, or cross-validation scoring couldn't also be used as
criteria for model selection - I've seen it mostly in the context of variable
selection rather than 'form' selection but in principle might apply here?


--- Dieter Menne <[EMAIL PROTECTED]> wrote:

> Andrew Clegg  gmail.com> writes:
> 
> > 
> > ... If I want to demonstrate that a non-linear curve fits
> > better than an exponential, what's the best measure for that? Given
> > that neither of nls() or optim() provide R-squared. 
> 
> To supplement Karl's comment, try Douglas Bates' (author of nls) comments
> on the
> matter
> 
> http://www.ens.gu.edu.au/ROBERTK/R/HELP/00B/0399.HTML
> 
> Short summary:
> * ... "the lack of automatic ANOVA, R^2 and adj. R^2 from nls is a
> feature,
> not a bug :-)"
> * My best advice regarding R^2 statistics with nonlinear models is, as
> Nancy
> Reagan suggested, "Just say no."
> 
> Dieter
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 



  


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Fitting exponential curve to data points

2007-07-24 Thread Dieter Menne
Andrew Clegg  gmail.com> writes:

> 
> ... If I want to demonstrate that a non-linear curve fits
> better than an exponential, what's the best measure for that? Given
> that neither of nls() or optim() provide R-squared. 

To supplement Karl's comment, try Douglas Bates' (author of nls) comments on the
matter

http://www.ens.gu.edu.au/ROBERTK/R/HELP/00B/0399.HTML

Short summary:
* ... "the lack of automatic ANOVA, R^2 and adj. R^2 from nls is a feature,
not a bug :-)"
* My best advice regarding R^2 statistics with nonlinear models is, as Nancy
Reagan suggested, "Just say no."

Dieter

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Fitting exponential curve to data points

2007-07-24 Thread Karl Ove Hufthammer
Andrew Clegg:

> Great, thanks. If I want to demonstrate that a non-linear curve fits
> better than an exponential, what's the best measure for that? Given
> that neither of nls() or optim() provide R-squared.

You really need to *very* careful when trying to interprete R² (which can
be defined in many nonequivalent ways) in the nonlinear case. Recommended
(and, dare I say, *required* reading):

Anderson-Sprecher R. (1994). ‘Model comparisons and R²’. The American
  Statistician, volume 48, no. 2, pages 113–117. DOI: 10.2307/2684259

Kvålseth T.O. (1985). ‘Cautionary note about R²’. The American Statistician,
  volume 39, no. 4, pages 279–285. DOI: 10.2307/2683704

Scott A. and Wild C. (1991). ‘Transformations and R²’. The American
  Statistician, volume 45, no. 2, pages 127–129. ISSN 0003-1305.
  DOI: 10.2307/2684375

The Scott & Wild paper has an example that looks very similar to yours, and
that may be instructive.

FYI, in case you’re not used to DOIs: you can resolve the above DOIs to
fulltext URLs using http://dx.doi.org/

-- 
Karl Ove Hufthammer

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Fitting exponential curve to data points

2007-07-24 Thread Andrew Clegg
On 7/24/07, Stephen Tucker <[EMAIL PROTECTED]> wrote:
> Hope these help for alternatives to lm()? I show the use of a 2nd order
> polynomial as an example to generalize a bit.

Great, thanks. If I want to demonstrate that a non-linear curve fits
better than an exponential, what's the best measure for that? Given
that neither of nls() or optim() provide R-squared. Sorry if these are
really silly questions.

> Sometimes from the subject line two separate responses can appear as reposts
> when in fact they are not... (though there are identical reposts too). I
> should probably figure a way around that.

Nope, my fault, I didn't read them properly and thought you were
demonstrating a different way to do exponential curves.

Cheers,

Andrew.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Fitting exponential curve to data points

2007-07-24 Thread Andrew Clegg
Stephen, Ted -- thanks for your input. I'm glad to know I was barking
up the right-ish tree at least.

On 7/24/07, Ted Harding <[EMAIL PROTECTED]> wrote:
>
> There are not enough data to properly identify the non-linearity,
> but the overall appearance of the data plot suggests to me that
> you should be considering one of the "growth curve" models.
>
> Many such models start of with an increasing rate of growth,
> which then slows down, and typically levels off to an asymptote.
> The apparent large discrepancy of your final data point could
> be compatible with this kind of behaviour.

You may have hit the nail on the head there. At least I now know that
my method would be reasonable *if* I had a genuine exponential curve.
Bound to come in handy.

> At this point, knowledge of what kind of thing is represented
> by your "count" variable might be helpful. If, for instance,
> it is the count of the numbers of individuals of a species in
> an area, then independent knowledge of growth mechanisms may
> help to narrow down the kind of model you should be tring to fit.

It's the cumulative number of citations in the MEDLINE literature
database about a particular topic (natural language processing in
biomedicine). So indeed, it can't maintain an exponential growth rate
for long, and an initial spurt while the field is novel and trendy,
followed by a levelling-off, is just what we'd expect. There was a
review a year or so ago that showed a very good exponential fit *then*
but if I could show the last point was indicative of a slowdown, that
would be news at least.

Can anyone point me at a better modelling framework than lm(), in that case?

Thanks again,

Andrew.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Fitting exponential curve to data points

2007-07-24 Thread Stephen Tucker
Well spoken. And since log transformations are nonlinear and 'compresses' the
data, it's not surprising to find that the fit doesn't look so nice while the
fit metrics tell you that a model does a good job.

--- [EMAIL PROTECTED] wrote:

> On 24-Jul-07 01:09:06, Andrew Clegg wrote:
> > Hi folks,
> > 
> > I've looked through the list archives and online resources, but I
> > haven't really found an answer to this -- it's pretty basic, but I'm
> > (very much) not a statistician, and I just want to check that my
> > solution is statistically sound.
> > 
> > Basically, I have a data file containing two columns of data, call it
> > data.tsv:
> > 
> > year  count
> > 1999  3
> > 2000  5
> > 2001  9
> > 2002  30
> > 2003  62
> > 2004  154
> > 2005  245
> > 2006  321
> > 
> > These look exponential to me, so what I want to do is plot these
> > points on a graph with linear axes, and add an exponential curve over
> > the top. I also want to give an R-squared for the fit.
> > 
> > The way I did it was like so:
> > 
> > 
> ># Read in the data, make a copy of it, and take logs
> > data = read.table("data.tsv", header=TRUE)
> > log.data = data
> > log.data$count = log(log.data$count)
> > 
> ># Fit a model to the logs of the data
> > model = lm(log.data$count ~ year, data = log.data)
> > 
> ># Plot the original data points on a graph
> > plot(data)
> > 
> ># Draw in the exponents of the model's output
> > lines(data$year, exp(fitted(model)))
> > 
> > 
> > Is this the right way to do it? log-ing the data and then exp-ing the
> > results seems like a bit of a long-winded way to achieve the desired
> > effect. Is the R-squared given by summary(model) a valid measurement
> > of the fit of the points to an exponential curve, and should I use
> > multiple R-squared or adjusted R-squared?
> > 
> > The R-squared I get from this method (0.98 multiple) seems a little
> > high going by the deviation of the last data point from the curve --
> > you'll see what I mean if you try it.
> 
> I just did. From the plot of log(count) against year, with the plot
> of the linear fit of log(count)~year superimposed, I see indications
> of a non-linear relationship.
> 
> The departures of the data from the fit follow a rather systematic
> pattern. Initially the data increase more slowly than the fit,
> and lie below it. Then they increase faster and corss over above it.
> Then the data increase less fast than the fit, and the final data
> point is below the fit.
> 
> There are not enough data to properly identify the non-linearity,
> but the overall appearance of the data plot suggests to me that
> you should be considering one of the "growth curve" models.
> 
> Many such models start of with an increasing rate of growth,
> which then slows down, and typically levels off to an asymptote.
> The apparent large discrepancy of your final data point could
> be compatible with this kind of behaviour.
> 
> At this point, knowledge of what kind of thing is represented
> by your "count" variable might be helpful. If, for instance,
> it is the count of the numbers of individuals of a species in
> an area, then independent knowledge of growth mechanisms may
> help to narrow down the kind of model you should be tring to fit.
> 
> As to your question about "Is this the right way to do it"
> (i.e. fitting an exponential curve by doing a linear fit of the
> logarithm), generally speaking the answer is "Yes". But of course
> you need to be confident that "exponential" is the right curve
> to be fitting in the first place. If it's the wrong type of
> curve to be considering, then it's not "the right way to do it"!
> 
> Hoping this help[s,
> Ted.
> 
> 
> E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
> Fax-to-email: +44 (0)870 094 0861
> Date: 24-Jul-07   Time: 10:08:33
> -- XFMail --
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Fitting exponential curve to data points

2007-07-24 Thread Stephen Tucker
I think your way is probably the easiest (shockingly). For instance, here are
some alternatives - I think in both cases you have to calculate the
coefficient of determination (R^2) manually. My understanding is that
multiple R^2 in your case is the usual R^2 because you only have one
predictor variable, and the adjusted R^2 considers the degrees of freedom and
penalizes for additional predictors. Which is better... depends? (Perhaps
more stats-savvy people can help you on that one. I'm a chemical engineer so
I unjustifiably claim ignorance).

## Data input
input <-
"Year   Count
19993
20005
20019
200230
200362
2004154
2005245
2006321"

dat <- read.table(textConnection(input),header=TRUE)
dat[,] <- lapply(dat,function(x) x-x[1])
  # shifting in origin; will need to add back in later

## Nonlinear least squares
plot(dat)
out <- nls(Count~b0*exp(b1*Year),data=dat,
   start=list(b0=1,b1=1))
lines(dat[,1],fitted(out),col=2)
out <- nls(Count~b0+b1*Year+b2*Year^2,data=dat, #polynomial
   start=list(b0=0,b1=1,b2=1))
lines(dat[,1],fitted(out),col=3)

## Optim
f <- function(.pars,.dat,.fun) sum((.dat[,2]-.fun(.pars,.dat[,1]))^2)
fitFun <- function(b,x) cbind(1,x,x^2)%*%b
expFun <- function(b,x) b[1]*exp(b[2]*x)

plot(dat)
out <- optim(c(0,1,1),f,.dat=dat,.fun=fitFun)
lines(dat[,1],fitFun(out$par,dat[,1]),col=2)
out <- optim(c(1,1),f,.dat=dat,.fun=expFun)
lines(dat[,1],expFun(out$par,dat[,1]),col=3)


--- Andrew Clegg <[EMAIL PROTECTED]> wrote:

> Hi folks,
> 
> I've looked through the list archives and online resources, but I
> haven't really found an answer to this -- it's pretty basic, but I'm
> (very much) not a statistician, and I just want to check that my
> solution is statistically sound.
> 
> Basically, I have a data file containing two columns of data, call it
> data.tsv:
> 
> year  count
> 1999  3
> 2000  5
> 2001  9
> 2002  30
> 2003  62
> 2004  154
> 2005  245
> 2006  321
> 
> These look exponential to me, so what I want to do is plot these
> points on a graph with linear axes, and add an exponential curve over
> the top. I also want to give an R-squared for the fit.
> 
> The way I did it was like so:
> 
> 
> # Read in the data, make a copy of it, and take logs
> data = read.table("data.tsv", header=TRUE)
> log.data = data
> log.data$count = log(log.data$count)
> 
> # Fit a model to the logs of the data
> model = lm(log.data$count ~ year, data = log.data)
> 
> # Plot the original data points on a graph
> plot(data)
> 
> # Draw in the exponents of the model's output
> lines(data$year, exp(fitted(model)))
> 
> 
> Is this the right way to do it? log-ing the data and then exp-ing the
> results seems like a bit of a long-winded way to achieve the desired
> effect. Is the R-squared given by summary(model) a valid measurement
> of the fit of the points to an exponential curve, and should I use
> multiple R-squared or adjusted R-squared?
> 
> The R-squared I get from this method (0.98 multiple) seems a little
> high going by the deviation of the last data point from the curve --
> you'll see what I mean if you try it.
> 
> Thanks in advance for any help!
> 
> Yours gratefully,
> 
> Andrew.
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Fitting exponential curve to data points

2007-07-24 Thread Ted Harding
On 24-Jul-07 01:09:06, Andrew Clegg wrote:
> Hi folks,
> 
> I've looked through the list archives and online resources, but I
> haven't really found an answer to this -- it's pretty basic, but I'm
> (very much) not a statistician, and I just want to check that my
> solution is statistically sound.
> 
> Basically, I have a data file containing two columns of data, call it
> data.tsv:
> 
> year  count
> 1999  3
> 2000  5
> 2001  9
> 2002  30
> 2003  62
> 2004  154
> 2005  245
> 2006  321
> 
> These look exponential to me, so what I want to do is plot these
> points on a graph with linear axes, and add an exponential curve over
> the top. I also want to give an R-squared for the fit.
> 
> The way I did it was like so:
> 
> 
># Read in the data, make a copy of it, and take logs
> data = read.table("data.tsv", header=TRUE)
> log.data = data
> log.data$count = log(log.data$count)
> 
># Fit a model to the logs of the data
> model = lm(log.data$count ~ year, data = log.data)
> 
># Plot the original data points on a graph
> plot(data)
> 
># Draw in the exponents of the model's output
> lines(data$year, exp(fitted(model)))
> 
> 
> Is this the right way to do it? log-ing the data and then exp-ing the
> results seems like a bit of a long-winded way to achieve the desired
> effect. Is the R-squared given by summary(model) a valid measurement
> of the fit of the points to an exponential curve, and should I use
> multiple R-squared or adjusted R-squared?
> 
> The R-squared I get from this method (0.98 multiple) seems a little
> high going by the deviation of the last data point from the curve --
> you'll see what I mean if you try it.

I just did. From the plot of log(count) against year, with the plot
of the linear fit of log(count)~year superimposed, I see indications
of a non-linear relationship.

The departures of the data from the fit follow a rather systematic
pattern. Initially the data increase more slowly than the fit,
and lie below it. Then they increase faster and corss over above it.
Then the data increase less fast than the fit, and the final data
point is below the fit.

There are not enough data to properly identify the non-linearity,
but the overall appearance of the data plot suggests to me that
you should be considering one of the "growth curve" models.

Many such models start of with an increasing rate of growth,
which then slows down, and typically levels off to an asymptote.
The apparent large discrepancy of your final data point could
be compatible with this kind of behaviour.

At this point, knowledge of what kind of thing is represented
by your "count" variable might be helpful. If, for instance,
it is the count of the numbers of individuals of a species in
an area, then independent knowledge of growth mechanisms may
help to narrow down the kind of model you should be tring to fit.

As to your question about "Is this the right way to do it"
(i.e. fitting an exponential curve by doing a linear fit of the
logarithm), generally speaking the answer is "Yes". But of course
you need to be confident that "exponential" is the right curve
to be fitting in the first place. If it's the wrong type of
curve to be considering, then it's not "the right way to do it"!

Hoping this help[s,
Ted.


E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 24-Jul-07   Time: 10:08:33
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Fitting exponential curve to data points

2007-07-23 Thread Andrew Clegg
Hi folks,

I've looked through the list archives and online resources, but I
haven't really found an answer to this -- it's pretty basic, but I'm
(very much) not a statistician, and I just want to check that my
solution is statistically sound.

Basically, I have a data file containing two columns of data, call it data.tsv:

yearcount
19993
20005
20019
200230
200362
2004154
2005245
2006321

These look exponential to me, so what I want to do is plot these
points on a graph with linear axes, and add an exponential curve over
the top. I also want to give an R-squared for the fit.

The way I did it was like so:


# Read in the data, make a copy of it, and take logs
data = read.table("data.tsv", header=TRUE)
log.data = data
log.data$count = log(log.data$count)

# Fit a model to the logs of the data
model = lm(log.data$count ~ year, data = log.data)

# Plot the original data points on a graph
plot(data)

# Draw in the exponents of the model's output
lines(data$year, exp(fitted(model)))


Is this the right way to do it? log-ing the data and then exp-ing the
results seems like a bit of a long-winded way to achieve the desired
effect. Is the R-squared given by summary(model) a valid measurement
of the fit of the points to an exponential curve, and should I use
multiple R-squared or adjusted R-squared?

The R-squared I get from this method (0.98 multiple) seems a little
high going by the deviation of the last data point from the curve --
you'll see what I mean if you try it.

Thanks in advance for any help!

Yours gratefully,

Andrew.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.