I have a situation where lagged values of a time-series are used to
predict future values. I have packed together the time-series and the
lagged values into a data frame:

> str(D)
'data.frame':   191 obs. of  13 variables:
 $ y    : num  -0.21 -2.28 -2.71 2.26 -1.11 1.71 2.63 -0.45 -0.11 4.79
 ...
 $ y.l1 : num  NA -0.21 -2.28 -2.71 2.26 -1.11 1.71 2.63 -0.45 -0.11
 ...
 $ y.l2 : num  NA NA -0.21 -2.28 -2.71 2.26 -1.11 1.71 2.63 -0.45 ...
 $ y.l3 : num  NA NA NA -0.21 -2.28 -2.71 2.26 -1.11 1.71 2.63 ...
 $ y.l4 : num  NA NA NA NA -0.21 -2.28 -2.71 2.26 -1.11 1.71 ...
 $ y.l5 : num  NA NA NA NA NA -0.21 -2.28 -2.71 2.26 -1.11 ...
 $ y.l6 : num  NA NA NA NA NA NA -0.21 -2.28 -2.71 2.26 ...
 $ y.l7 : num  NA NA NA NA NA NA NA -0.21 -2.28 -2.71 ...
 $ y.l8 : num  NA NA NA NA NA NA NA NA -0.21 -2.28 ...
 $ y.l9 : num  NA NA NA NA NA NA NA NA NA -0.21 ...
 $ y.l10: num  NA NA NA NA NA NA NA NA NA NA ...
 $ y.l11: num  NA NA NA NA NA NA NA NA NA NA ...
 $ y.l12: num  NA NA NA NA NA NA NA NA NA NA ...

I have:

> insample <- 1:179
> outsample <- 180:191

To help you see what is going on:

> D[outsample,]
     y y.l1 y.l2 y.l3 y.l4 y.l5 y.l6 y.l7 y.l8  y.l9 y.l10 y.l11 y.l12
180 NA 8.81 8.53 5.68 5.97 9.75 7.20 7.63 4.73 12.24 10.76  8.13  9.82
181 NA   NA 8.81 8.53 5.68 5.97 9.75 7.20 7.63  4.73 12.24 10.76  8.13
182 NA   NA   NA 8.81 8.53 5.68 5.97 9.75 7.20  7.63  4.73 12.24 10.76
183 NA   NA   NA   NA 8.81 8.53 5.68 5.97 9.75  7.20  7.63  4.73 12.24
184 NA   NA   NA   NA   NA 8.81 8.53 5.68 5.97  9.75  7.20  7.63  4.73
185 NA   NA   NA   NA   NA   NA 8.81 8.53 5.68  5.97  9.75  7.20  7.63
186 NA   NA   NA   NA   NA   NA   NA 8.81 8.53  5.68  5.97  9.75  7.20
187 NA   NA   NA   NA   NA   NA   NA   NA 8.81  8.53  5.68  5.97  9.75
188 NA   NA   NA   NA   NA   NA   NA   NA   NA  8.81  8.53  5.68  5.97
189 NA   NA   NA   NA   NA   NA   NA   NA   NA    NA  8.81  8.53  5.68
190 NA   NA   NA   NA   NA   NA   NA   NA   NA    NA    NA  8.81  8.53
191 NA   NA   NA   NA   NA   NA   NA   NA   NA    NA    NA    NA  8.81

Now this works nicely:

> library(rpart)
> predict(rpart(y ~ ., D[insample,], na.action=na.omit), newdata=D[outsample,])
     180      181      182      183      184      185      186  187 
7.551724 7.551724 7.551724 7.551724 7.551724 7.551724 7.551724  6.057636 
     188      189      190      191 
6.057636 6.057636 6.057636 6.057636 

But when I try to do:

> library(randomForest)
> predict(randomForest(y ~ ., D[insample,], na.action=na.omit), 
> newdata=D[outsample,])
[1]
 7.71523

I don't seem to get a vector of twelve predictions; I only get one
prediction. Is it the case that randomForest doesn't like missing
data? Is there anything I can do about it?

Further, when I try to do this:

> library(e1071)
> predict(svm(y ~ ., D[insample,], na.action=na.omit), newdata=D[outsample,])
Error in `names<-.default`(`*tmp*`, value = c("180", "181", "182", "183",  : 
      'names' attribute [12] must be the same length as the vector [0]

Any idea how I should approach this? Is there a generic interface to
the wide range of statistical tools in doing prediction?

-- 
Ajay Shah                                      http://www.mayin.org/ajayshah  
[EMAIL PROTECTED]                             http://ajayshahblog.blogspot.com
<*(:-? - wizard who doesn't know the answer.

______________________________________________
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to