Re: [R] predict.lm point forecasts with factors

2007-02-14 Thread Marc Schwartz
On Wed, 2007-02-14 at 13:54 -0700, sj wrote:
 hello,
 
 I am trying to use predict.lm to make point forecasts based on a model with
 continuous and categorical independent variables
 I have no problems fitting the model using lm, but when I try to use predict
 to make point predictions. it reverts back to the original dataframe and
 gives me the point predictions for the fitted data rather than for the new
 data, I imagine that I am missing something simple but for whatever reason I
 can't figure out why it does not like the new data and is reverting to the
 fitted data. The following code illustrates the problem I am running in to.
 Any help would be appreciated.
 
 f1 - rep(c(a,b,c,d),25)
 f2 - sample(rep(c(e,f,g,h),250),100)
 x - rnorm(100,100)
 y - rnorm(100,150)
 
 mdl - lm(y~x+f1+f2)
 
 f12 -rep(c(a,b,c,d),5)
 f22 - sample(rep(c(e,f,g,h),250),20)
 x2 - rnorm(20,100)
 
 new - data.frame(cbind(f12[1],f22[1],x2[1]))
 
 
 predict(mdl,new)
 
 
 best,
 
 Spencer

Spencer,

You have two distinct issues going on here:

The initial model that you create 'mdl' is based upon 'f1' and 'f2'
being created as character vectors, not as factors. While the modeling
functions will internally do the coercion, I do not believe that the
predict functions will. 

In fact, you should have noted the following error messages:

 mdl - lm(y~x+f1+f2)
Warning messages:
1: variable 'f1' converted to a factor in: model.matrix.default(mt, mf,
contrasts) 
2: variable 'f2' converted to a factor in: model.matrix.default(mt, mf,
contrasts) 


So you end up with a 'class' conflict between the model frame object and
the new data object, since the latter will default to coercing 'f12' and
'f22' to factors.

Secondly, 'new' needs to have columns created with the SAME names as
those used in the original model.

Thus, a code sequence along the lines of the following should work:

f1 - rep(c(a,b,c,d), 25)
f2 - sample(rep(c(e,f,g,h), 250), 100)
x - rnorm(100, 100)
y - rnorm(100, 150)

# Create a data frame from the data so
# so that f1 and f2 become factors
DF - data.frame(y, x, f1, f2)

mdl - lm(y ~ x + f1 + f2, DF)


f12 -rep(c(a,b,c,d), 5)
f22 - sample(rep(c(e,f,g,h), 250), 20)
x2 - rnorm(20, 100)

# Create 'new' in the same way, but naming the
# columns the same as 'DF above
new - data.frame(f1 = f12, f2 = f22, x = x2)


# Now run predict on the first row in 'new
 predict(mdl, new[1, ])
[1] 150.3273


The number you come up with should be different, since you are using
random data.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] predict.lm variables found question

2006-11-09 Thread Peter Dalgaard
Larry White [EMAIL PROTECTED] writes:

 hello,
 
 I'm trying to predict some values based on a linear regression model.
 I've created the model using one dataframe, and have the prediction
 values in a second data frame (call it newdata). There are 56 rows in
 the dataframe used to create the model and 15 in newdata.
 
 I ran predict(model1, newdata) and get the warning: 'newdata' had 15
 rows but variable(s) found have 56 rows
 
 When i checked help(predict.lm) I found this:
 
 Variables are first looked for in newdata and then searched for in
 the usual way (which will include the environment of the formula used
 in the fit). A warning will be given if the variables found are not of
 the same length as those in newdata if it was supplied. 
 
 My questions are - how can I just get predicted values for the 15 rows
 in the newdata data frame, and if that's not possible, how can I tell
 which of the 56 predicted values are derived from newdata only, if
 any.

You need to have all your predictors represented in newdata. You seem
to have at least one of them missing (a typo in a variable name could
do that). 

-- 
   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - ([EMAIL PROTECTED])  FAX: (+45) 35327907

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] predict.lm

2006-05-02 Thread Christos Hatzis
I think you got it right.

The mean of the (weighted) sum of a set of random variables is the
(weighted) sum of the means and its variance is the (weighted) sum of the
individual variances (using squared weights).  Here you don't have to worry
about weights.

So what you proposed does exactly this.

-Christos

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Bill Szkotnicki
Sent: Tuesday, May 02, 2006 2:59 PM
To: 'R-Help help'
Subject: [R] predict.lm

I have a model with a few correlated explanatory variables.
i.e.
 m1=lm(y~x1+x2+x3+x4,protdata)
and I have used predict as follows:

 x=data.frame(x=1:36)
 yp=predict(m1,x,se.fit=T)
 tprot=sum(yp$fit) # add up the predictions tprot

tprot is the sum of the 36 predicted values and I would like the se of that
prediction.
I think  
 sqrt(sum(yp$se.fit^2))
is not correct.

Would anyone know the correct approach?
i.e. How to get the se of a function of predicted values (in this case sum)
 
Thanks, Bill

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] predict.lm

2006-05-02 Thread Prof Brian Ripley
On Tue, 2 May 2006, Christos Hatzis wrote:

 I think you got it right.

 The mean of the (weighted) sum of a set of random variables is the
 (weighted) sum of the means and its variance is the (weighted) sum of the
 individual variances (using squared weights).  Here you don't have to worry
 about weights.

 So what you proposed does exactly this.

Yes, but the theory has assumptions which are not met here: the random 
variables are correlated (in almost all case).

 -Christos

 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Bill Szkotnicki
 Sent: Tuesday, May 02, 2006 2:59 PM
 To: 'R-Help help'
 Subject: [R] predict.lm

 I have a model with a few correlated explanatory variables.
 i.e.
 m1=lm(y~x1+x2+x3+x4,protdata)
 and I have used predict as follows:

 x=data.frame(x=1:36)
 yp=predict(m1,x,se.fit=T)

How can this work?  You fitted the model to x1...x4 and supplied x.

 tprot=sum(yp$fit) # add up the predictions tprot

 tprot is the sum of the 36 predicted values and I would like the se of that
 prediction.
 I think
 sqrt(sum(yp$se.fit^2))
 is not correct.

 Would anyone know the correct approach?
 i.e. How to get the se of a function of predicted values (in this case sum)

You need to go back to the theory: it is easy to do for a linear function, 
otherwise you will need to linearize.

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] predict.lm

2006-05-02 Thread Bill Szkotnicki
I did mean to use x1,x2,x3,x4 in the new data frame.

And I think the theory would be something like

yhat = 1' K' bhat   
and so the variance should be  1' K'CK 1  where C=(X'X)-1   
and 1 is a 1 vector.

The question is do I need to form these matrices and grind through it or is
there an easier way?

 
Bill
 

-Original Message-
From: Prof Brian Ripley [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, May 02, 2006 2:54 PM
To: Christos Hatzis
Cc: 'Bill Szkotnicki'; 'R-Help help'
Subject: Re: [R] predict.lm

On Tue, 2 May 2006, Christos Hatzis wrote:

 I think you got it right.

 The mean of the (weighted) sum of a set of random variables is the
 (weighted) sum of the means and its variance is the (weighted) sum of the
 individual variances (using squared weights).  Here you don't have to
worry
 about weights.

 So what you proposed does exactly this.

Yes, but the theory has assumptions which are not met here: the random 
variables are correlated (in almost all case).

 -Christos

 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Bill Szkotnicki
 Sent: Tuesday, May 02, 2006 2:59 PM
 To: 'R-Help help'
 Subject: [R] predict.lm

 I have a model with a few correlated explanatory variables.
 i.e.
 m1=lm(y~x1+x2+x3+x4,protdata)
 and I have used predict as follows:

 x=data.frame(x=1:36)
 yp=predict(m1,x,se.fit=T)

How can this work?  You fitted the model to x1...x4 and supplied x.

 tprot=sum(yp$fit) # add up the predictions tprot

 tprot is the sum of the 36 predicted values and I would like the se of
that
 prediction.
 I think
 sqrt(sum(yp$se.fit^2))
 is not correct.

 Would anyone know the correct approach?
 i.e. How to get the se of a function of predicted values (in this case
sum)

You need to go back to the theory: it is easy to do for a linear function, 
otherwise you will need to linearize.

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] predict.lm - standard error of predicted means?

2005-07-20 Thread Peter Dalgaard
[EMAIL PROTECTED] writes:

 Simple question.
 
 For a simple linear regression, I obtained the standard error of
 predicted means, for both a confidence and prediction interval:
 
 x-1:15
 y-x + rnorm(n=15)
 model-lm(y~x)
 predict.lm(model,newdata=data.frame(x=c(10,20)),se.fit=T,interval=confidence)$se.fit
 1 2
 0.2708064 0.7254615
 
 predict.lm(model,newdata=data.frame(x=c(10,20)),se.fit=T,interval=prediction)$se.fit
 1 2
 0.2708064 0.7254615
 
 
 I was surprised to find that the standard errors returned were in fact the
 standard errors of the sampling distribution of Y_hat:
 
 sqrt(MSE(1/n + (x-x_bar)^2/SS_x)),
 
 not the standard errors of Y_new (predicted value):
 
 sqrt(MSE(1 + 1/n + (x-x_bar)^2/SS_x)).
 
 Is there a reason this quantity is called the standard error of predicted
 means if it doesn't relate to the prediction distribution?

Yes. Yhat is the predicted mean and se.fit is its standard deviation.
It doesn't change its meaning because you desire another kind of
prediction interval.

 
 Turning to Neter et al.'s Applied Linear Statistical Models, I note that
 if we have multiple observations, then the standard error of the mean of
 the predicted value:
 
 sqrt(MSE(1/m + 1/n + (x-x_bar)^2/SS_x)),
 
 reverts to the standard error of the sampling distribution of Y-hat, as m,
 the number of samples, gets large. Still, this doesn't explain the result
 for small sample sizes.

You can make completely similar considerations regarding the standard
errors of and about an estimated mean: sigma*sqrt(1+1/n) vs.
sigma*sqrt(1/m + 1/n) vs. sigma*sqrt(1/n). SEM is still the latter
quantity even if you are interested in another kind of prediction limit. 

-- 
   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - ([EMAIL PROTECTED])  FAX: (+45) 35327907

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] predict.lm - standard error of predicted means?

2005-07-20 Thread mark salsburg
Can someone please refer me to a function or method that resolves this
structuring issue:

I have two matrices with identical colnames (89), but varying number
of observations:

matrix Amatrix B 

217 x 89  16063 x 89

I want to creat one matrix C that has both matrices adjacent to one
another, where matrix A is duplicated many times to create the same
row number for matrix B, i.e. 16063.

matrixA matrix B
matrixA
matrixA

so matrix C will be 16063 x 178

I've tried cbind() and merge() with no success..

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] predict.lm - standard error of predicted means?

2005-07-20 Thread Peter Dalgaard
mark salsburg [EMAIL PROTECTED] writes:

 Can someone please refer me to a function or method that resolves this
 structuring issue:
 
 I have two matrices with identical colnames (89), but varying number
 of observations:
 
 matrix Amatrix B 
 
 217 x 89  16063 x 89
 
 I want to creat one matrix C that has both matrices adjacent to one
 another, where matrix A is duplicated many times to create the same
 row number for matrix B, i.e. 16063.
 
 matrixA matrix B
 matrixA
 matrixA
 
 so matrix C will be 16063 x 178
 
 I've tried cbind() and merge() with no success..

A: What the !!##¤ does this have to do with the subject line?

B: This should do it:

cbind(A[rep(1:217,length=16063),], B)

-- 
   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - ([EMAIL PROTECTED])  FAX: (+45) 35327907

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] predict.lm - standard error of predicted means?

2005-07-20 Thread Prof Brian Ripley

On Wed, 20 Jul 2005, Peter Dalgaard wrote:


mark salsburg [EMAIL PROTECTED] writes:


Can someone please refer me to a function or method that resolves this
structuring issue:

I have two matrices with identical colnames (89), but varying number
of observations:

matrix Amatrix B

217 x 89  16063 x 89

I want to creat one matrix C that has both matrices adjacent to one
another, where matrix A is duplicated many times to create the same
row number for matrix B, i.e. 16063.

matrixA matrix B
matrixA
matrixA

so matrix C will be 16063 x 178

I've tried cbind() and merge() with no success..


A: What the !!##¤ does this have to do with the subject line?

B: This should do it:

cbind(A[rep(1:217,length=16063),], B)


But note that makes 74 + 5/217 copies of A, and I did wonder if that was 
the intention (or if not, what was intended).


--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] predict.lm with (logical) NA vector

2003-11-10 Thread Prof Brian Ripley
On Mon, 10 Nov 2003, Edzer J. Pebesma wrote:

 I was surprised by the following (R 1.8.0):
 
 R lm.fit = lm(y~x, data.frame(x=1:10, y=1:10))
 R predict(lm.fit, data.frame(x = rep(NA, 10)))
  1  2  3  4  5
 -1.060998e-314 -1.060998e-314 -1.060998e-314 -1.060998e-314 -1.060998e-314
  6  7  8  9 10
   0.00e+00  1.406440e-269  6.715118e-265  4.940656e-323  1.782528e-265
 R predict(lm.fit, data.frame(x = as.numeric(rep(NA, 10
  1  2  3  4  5  6  7  8  9 10
 NA NA NA NA NA NA NA NA NA NA
 
 shouldn't the first predict() call return NA's, or else issue an error 
 message?

The prediction methods do not in general check that new variables you give 
are of the correct type: the type used in the fit is not recorded in the 
model object.  In this case a logical column will `work' provided it has 
two values (even with NAs).  We can probably trap this exact case, but 
there will remain a lot of scope for user error.

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help