If you have missing data in your data frame and want residuals for all observations, you need to use na.action=na.exclude, not the default na.omit.
As for lag, its description says Description: Compute a lagged version of a time series, shifting the time base back by a given number of observations. and you don't have a time series. It works by shifting the time base for a time series, not by moving the contents of a vector. On Mon, 8 Mar 2004, Ajay Shah wrote: > Folks, > > I'm most confused in trying to do something that (I thought) out to be > mainstream and straightforward R. :-) Could you please help? > > I am doing an ordinary linear regression. My goal is: After a > regression, to make residuals, and make a new variable which is the > lagged residuals (lagged by 1). I will use this variable in a 2nd > stage regression (for an error-correcting model). > > This sounds simple and reasonable, and should be right up R's alley, > but I am just not able to do this. Can I please show you the steps > which I'm trying and failing in? > > I start with: > > > m = lm(NNDA ~ NFA + NFA.x.d1 + NFA.x.d2 + IIP.n + CRR, D.f) > > e = residuals(m) > > print(e) > 34 35 36 37 38 39 > -5073.24843 -4210.27886 -8218.01782 -1489.10583 -4426.11738 -11332.56052 > (lines deleted) > 64 65 66 67 68 69 > 8362.93776 7564.14324 2311.41208 7660.00638 -1271.04645 -10917.29418 > (lines deleted) > 160 161 162 163 164 165 > 3858.94591 -11783.04370 -21438.33646 1859.49628 -4988.82853 -25172.43241 > > Here, the residuals only started at the 34th observation owing to > missing data in my data frame. This is correct and sensible. The > dataset is 167 observations, but 166 and 167 are also missing data and > dropped. > > I tried to use lag(e,1) to make a new vector and failed. I think I am > just not understanding the R concept of lag(). In my notion of a > lagged vector, I want a vector f where f[35] is e[34], i.e. is the > first residual above of -5073.24843. This is just not what I get by > saying lag(e,1) - I am just not understanding lag(). I would be very > happy if someone could educate me on how to utilise lag(). > > Okay, I try to get my way in a different way: > > > print(T) > [1] 167 > > f = numeric(T) > > f[1] = NA > > f[2:T] = e[1:(T-1)] > > This looks reasonable? I thought this should do the trick. I am > hand-initialising a T-length vector with NA in the 1st elem, and I > copy out the values of e[] from 1 till 166 into f[2:T]. I thought this > should give me a lagged e. It doesn't -- > > > print(f) > [1] NA -5073.24843 -4210.27886 -8218.01782 -1489.10583 > (lines deleted) > [131] 1859.49628 -4988.82853 -25172.43241 NA NA > (lines deleted) > [166] NA NA > > I thought "Okay, what seems to be happening is that the e[1] that I > have is `actually' the e[34] of my thoughts". So I try: > > > f=rep(NA, T) # zap out f > > f[35:T] = e[34:(T-1)] # copy out useful stuff into 35..T > > print(f) > [1] NA NA NA NA NA > (lines deleted) > [31] NA NA NA NA 7660.00638 > [36] -1271.04645 -10917.29418 -11111.60144 -1597.98355 -1066.01901 > (lines deleted) > [131] 1859.49628 -4988.82853 -25172.43241 NA NA > (lines deleted) > [166] NA NA > > This is wrong!! > > Recall (from upstairs) that e[34] was -5073.24843. That value seems to > have mysteriously vanished. Instead, the first non-NA in f - which is > f[35] - is 7660.00638, which (incidentally) was e[67]. I just don't > know how that value got here. And, the values in f[] seem to peter out > at 133! After 133, they are all NA until the end. > > I guess I'm _just_ not understanding what is the animal that is > returned by residual(lm()). I know I am missing something basic, > because lots of people must be doing what I am trying: I.e. to run a > regression, extract a residual, lag it, and use it for a 2nd stage > regression. > > I know that the vector e (returned by residual(lm())) is different > from a simple vector, for when I say: > > > print(f[35]) > [1] 7660.006 > > print(e[35]) > 68 > -1271.046 > > the two animals seem to be different. I don't understand e[35] - why > is it not just a number - there seems to be some index tagging along? > How do I get at the pure numbers of the residuals? > > Thanks much, > > -ans. > > -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 ______________________________________________ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html