Re: [R] data extrapolation function

Dennis Murphy Sat, 29 Jan 2011 02:41:40 -0800

Hi:

This 'works':

> di <- read.csv(textConnection("
+ 10,2000
+ 12,2001
+ 13,2002
+ 15,2003
+ 17,2004"), header = FALSE)
> names(di) <- c('y', 'year')
> m <- lm(y ~ year, data = di)
> diextra <- data.frame(year = c(1990, 1991, 1993))
> predict(m, new = diextra)
   1    2    3
-7.0 -5.3 -1.9

You didn't describe how you got the original model in the first place, but
predict.lm() expects that the data frame supplied as the newdata =  argument
have the same names as those on the right hand side of the model formula in
lm(). The default output is a vector of predicted values at the settings
specified in each row of the input data frame to predict.lm().

In the above example, I supplied names 'y' and 'year' to the data frame di
before running the linear model. If you're using something like lm(di[, 1] ~
di[, 2]) instead, you're going to have problems when using predict.lm()
because the variable name you give to your prediction data frame is likely
to be different from what lm() is expecting. Without going into the sordid
details, the safest strategy is to name your columns before you input the
data into lm() and use the same variable names for the explanatory variables
in the data frame you feed to predict.lm().

Notice that in the above example, I named the covariate column 'year', and
used the same name in the created data frame to be used for prediction. The
default output, as mentioned above, is a vector of predicted values. If you
want the predictions and covariate values together, you can do something
like

as.data.frame(cbind(yhat = predict(m, new = diextra), diextra))

and use write.csv() on that.

HTH,
Dennis

PS: I am not remotely endorsing extrapolation that far from the range of the
data. If you're actually doing this with your real data, you should think
carefully about the dangers. Much has been written on the topic, so please
consult them if you are unaware of the potential consequences.

On Sat, Jan 29, 2011 at 1:39 AM, e-letter <inp...@gmail.com> wrote:

> Readers,
>
> Data was imported using the read csv command:
>
> dataimport<-read.csv("/path/to/dataimport.csv")
>
> 10,2000
> 12,2001
> 13,2002
> 15,2003
> 17,2004
>
> Using the help contents for 'predict.lm' (i.e. ?predict.lm) a new data
> frame was created
>
> dataimportextra<-data.frame(x=seq(1990,2010,1))
>
predict(lm(dataimport),dataimportextra[,2],se.fit=TRUE)
> write.csv<-(dataimportextraout,"/path/to/dataimportextra.csv")
>
> I was expecting to see in the file dataimportextra.csv something like:
>
> 1,1990
> 2,1991
> 3,1993
> ...
> to previously known data
> 10,2000
> ...
> final extrapolated value, e.g.
> 20,2010
>
> I didn't ; this suggests that I chose the wrong function! Can someone
> please advise me of the correct function to use for this extrapolation
> task?
>
> yours,
> r251
> mandriva2009
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] data extrapolation function

Reply via email to