I have a situation where lagged values of a time-series are used to predict future values. I have packed together the time-series and the lagged values into a data frame:
> str(D) 'data.frame': 191 obs. of 13 variables: $ y : num -0.21 -2.28 -2.71 2.26 -1.11 1.71 2.63 -0.45 -0.11 4.79 ... $ y.l1 : num NA -0.21 -2.28 -2.71 2.26 -1.11 1.71 2.63 -0.45 -0.11 ... $ y.l2 : num NA NA -0.21 -2.28 -2.71 2.26 -1.11 1.71 2.63 -0.45 ... $ y.l3 : num NA NA NA -0.21 -2.28 -2.71 2.26 -1.11 1.71 2.63 ... $ y.l4 : num NA NA NA NA -0.21 -2.28 -2.71 2.26 -1.11 1.71 ... $ y.l5 : num NA NA NA NA NA -0.21 -2.28 -2.71 2.26 -1.11 ... $ y.l6 : num NA NA NA NA NA NA -0.21 -2.28 -2.71 2.26 ... $ y.l7 : num NA NA NA NA NA NA NA -0.21 -2.28 -2.71 ... $ y.l8 : num NA NA NA NA NA NA NA NA -0.21 -2.28 ... $ y.l9 : num NA NA NA NA NA NA NA NA NA -0.21 ... $ y.l10: num NA NA NA NA NA NA NA NA NA NA ... $ y.l11: num NA NA NA NA NA NA NA NA NA NA ... $ y.l12: num NA NA NA NA NA NA NA NA NA NA ... I have: > insample <- 1:179 > outsample <- 180:191 To help you see what is going on: > D[outsample,] y y.l1 y.l2 y.l3 y.l4 y.l5 y.l6 y.l7 y.l8 y.l9 y.l10 y.l11 y.l12 180 NA 8.81 8.53 5.68 5.97 9.75 7.20 7.63 4.73 12.24 10.76 8.13 9.82 181 NA NA 8.81 8.53 5.68 5.97 9.75 7.20 7.63 4.73 12.24 10.76 8.13 182 NA NA NA 8.81 8.53 5.68 5.97 9.75 7.20 7.63 4.73 12.24 10.76 183 NA NA NA NA 8.81 8.53 5.68 5.97 9.75 7.20 7.63 4.73 12.24 184 NA NA NA NA NA 8.81 8.53 5.68 5.97 9.75 7.20 7.63 4.73 185 NA NA NA NA NA NA 8.81 8.53 5.68 5.97 9.75 7.20 7.63 186 NA NA NA NA NA NA NA 8.81 8.53 5.68 5.97 9.75 7.20 187 NA NA NA NA NA NA NA NA 8.81 8.53 5.68 5.97 9.75 188 NA NA NA NA NA NA NA NA NA 8.81 8.53 5.68 5.97 189 NA NA NA NA NA NA NA NA NA NA 8.81 8.53 5.68 190 NA NA NA NA NA NA NA NA NA NA NA 8.81 8.53 191 NA NA NA NA NA NA NA NA NA NA NA NA 8.81 Now this works nicely: > library(rpart) > predict(rpart(y ~ ., D[insample,], na.action=na.omit), newdata=D[outsample,]) 180 181 182 183 184 185 186 187 7.551724 7.551724 7.551724 7.551724 7.551724 7.551724 7.551724 6.057636 188 189 190 191 6.057636 6.057636 6.057636 6.057636 But when I try to do: > library(randomForest) > predict(randomForest(y ~ ., D[insample,], na.action=na.omit), > newdata=D[outsample,]) [1] 7.71523 I don't seem to get a vector of twelve predictions; I only get one prediction. Is it the case that randomForest doesn't like missing data? Is there anything I can do about it? Further, when I try to do this: > library(e1071) > predict(svm(y ~ ., D[insample,], na.action=na.omit), newdata=D[outsample,]) Error in `names<-.default`(`*tmp*`, value = c("180", "181", "182", "183", : 'names' attribute [12] must be the same length as the vector [0] Any idea how I should approach this? Is there a generic interface to the wide range of statistical tools in doing prediction? -- Ajay Shah http://www.mayin.org/ajayshah [EMAIL PROTECTED] http://ajayshahblog.blogspot.com <*(:-? - wizard who doesn't know the answer. ______________________________________________ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.