Folks, I'm most confused in trying to do something that (I thought) out to be mainstream and straightforward R. :-) Could you please help?
I am doing an ordinary linear regression. My goal is: After a regression, to make residuals, and make a new variable which is the lagged residuals (lagged by 1). I will use this variable in a 2nd stage regression (for an error-correcting model). This sounds simple and reasonable, and should be right up R's alley, but I am just not able to do this. Can I please show you the steps which I'm trying and failing in? I start with: > m = lm(NNDA ~ NFA + NFA.x.d1 + NFA.x.d2 + IIP.n + CRR, D.f) > e = residuals(m) > print(e) 34 35 36 37 38 39 -5073.24843 -4210.27886 -8218.01782 -1489.10583 -4426.11738 -11332.56052 (lines deleted) 64 65 66 67 68 69 8362.93776 7564.14324 2311.41208 7660.00638 -1271.04645 -10917.29418 (lines deleted) 160 161 162 163 164 165 3858.94591 -11783.04370 -21438.33646 1859.49628 -4988.82853 -25172.43241 Here, the residuals only started at the 34th observation owing to missing data in my data frame. This is correct and sensible. The dataset is 167 observations, but 166 and 167 are also missing data and dropped. I tried to use lag(e,1) to make a new vector and failed. I think I am just not understanding the R concept of lag(). In my notion of a lagged vector, I want a vector f where f[35] is e[34], i.e. is the first residual above of -5073.24843. This is just not what I get by saying lag(e,1) - I am just not understanding lag(). I would be very happy if someone could educate me on how to utilise lag(). Okay, I try to get my way in a different way: > print(T) [1] 167 > f = numeric(T) > f[1] = NA > f[2:T] = e[1:(T-1)] This looks reasonable? I thought this should do the trick. I am hand-initialising a T-length vector with NA in the 1st elem, and I copy out the values of e[] from 1 till 166 into f[2:T]. I thought this should give me a lagged e. It doesn't -- > print(f) [1] NA -5073.24843 -4210.27886 -8218.01782 -1489.10583 (lines deleted) [131] 1859.49628 -4988.82853 -25172.43241 NA NA (lines deleted) [166] NA NA I thought "Okay, what seems to be happening is that the e[1] that I have is `actually' the e[34] of my thoughts". So I try: > f=rep(NA, T) # zap out f > f[35:T] = e[34:(T-1)] # copy out useful stuff into 35..T > print(f) [1] NA NA NA NA NA (lines deleted) [31] NA NA NA NA 7660.00638 [36] -1271.04645 -10917.29418 -11111.60144 -1597.98355 -1066.01901 (lines deleted) [131] 1859.49628 -4988.82853 -25172.43241 NA NA (lines deleted) [166] NA NA This is wrong!! Recall (from upstairs) that e[34] was -5073.24843. That value seems to have mysteriously vanished. Instead, the first non-NA in f - which is f[35] - is 7660.00638, which (incidentally) was e[67]. I just don't know how that value got here. And, the values in f[] seem to peter out at 133! After 133, they are all NA until the end. I guess I'm _just_ not understanding what is the animal that is returned by residual(lm()). I know I am missing something basic, because lots of people must be doing what I am trying: I.e. to run a regression, extract a residual, lag it, and use it for a 2nd stage regression. I know that the vector e (returned by residual(lm())) is different from a simple vector, for when I say: > print(f[35]) [1] 7660.006 > print(e[35]) 68 -1271.046 the two animals seem to be different. I don't understand e[35] - why is it not just a number - there seems to be some index tagging along? How do I get at the pure numbers of the residuals? Thanks much, -ans. -- Ajay Shah Consultant [EMAIL PROTECTED] Department of Economic Affairs http://www.mayin.org/ajayshah Ministry of Finance, New Delhi ______________________________________________ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html