That is great! Thank you very much. Ben
On Thu, Mar 1, 2012 at 2:57 PM, Petr Savicky <savi...@cs.cas.cz> wrote: > On Thu, Mar 01, 2012 at 02:31:01PM -0700, Ben quant wrote: > > Hello, > > > > My direct desire is a good (fast) way to fill values forward until there > is > > another value then fill that value foward in the data xx (at the bottom > of > > this email). For example, from row 1 to row 45 should be NA (no change), > > but from row 46 row 136 the value should be 12649, and from row 137 to > the > > next value should be 13039.00. The last line of code is all you need for > > this part. > > > > If you are so inclined, my goal is this: I want to create a weekly time > > series out of some data based on the report date. The report date is 'rd' > > below, and is the correct date for the time series. My idea (in part seen > > below) is to align rd and ua via the incorrect date (the time series > date), > > then merge that using the report date (rd) and a daily series (so I > capture > > all of the dates) of dates (dt). That gets the data in the right start > > period. I've done all of this so far below and it looks fine. Then I plan > > to roll all of those values forward to the next value (see question > above), > > then I'll do something like this: > > > > xx[weekdays(xx[,1]) == "Friday",] > > > > ...to get a weekly series of Friday values. I'm thinking someone probably > > has a faster way of doing this. I have to do this many times, so speed is > > important. Thanks! > > > > Here is what I have done so far: > > > > dt <- seq(from =as.Date("2009-06-01"), to = Sys.Date(), by = "day") > > > > > nms > > [1] "2009-06-30" "2009-09-30" "2009-12-31" "2010-03-31" "2010-06-30" > > "2010-09-30" "2010-12-31" "2011-03-31" "2011-06-30" "2011-09-30" > > [11] "2011-12-31" > > > > > rd > > 2009-06-30 2009-09-30 2009-12-31 2010-03-31 2010-06-30 > > 2010-09-30 2010-12-31 2011-03-31 2011-06-30 2011-09-30 > > "2009-07-16" "2009-10-15" "2010-01-19" "2010-04-19" "2010-07-19" > > "2010-10-18" "2011-01-18" "2011-04-19" "2011-07-18" "2011-10-17" > > 2011-12-31 > > "2012-01-19" > > > > > ua > > 2009-06-30 2009-09-30 2009-12-31 2010-03-31 2010-06-30 2010-09-30 > > 2010-12-31 2011-03-31 2011-06-30 2011-09-30 2011-12-31 > > 12649.00 13039.00 13425.00 13731.00 14014.00 14389.00 > > 14833.00 15095.00 15481.43 15846.43 16186.43 > > > > > x = merge(ua,rd,by='row.names') > > > names(x) = c('z.date','val','rt_date') > > > xx = merge(dt,x,by.y= 'rt_date',by.x=1,all.x=T) > > > xx > > x z.date val > > 1 2009-06-01 <NA> NA > > 2 2009-06-02 <NA> NA > > 3 2009-06-03 <NA> NA > > 4 2009-06-04 <NA> NA > > 5 2009-06-05 <NA> NA > > > > ...ect.... > > > > 36 2009-07-06 <NA> NA > > 37 2009-07-07 <NA> NA > > 38 2009-07-08 <NA> NA > > 39 2009-07-09 <NA> NA > > 40 2009-07-10 <NA> NA > > 41 2009-07-11 <NA> NA > > 42 2009-07-12 <NA> NA > > 43 2009-07-13 <NA> NA > > 44 2009-07-14 <NA> NA > > 45 2009-07-15 <NA> NA > > 46 2009-07-16 2009-06-30 12649 > > 47 2009-07-17 <NA> NA > > 48 2009-07-18 <NA> NA > > 49 2009-07-19 <NA> NA > > 50 2009-07-20 <NA> NA > > 51 2009-07-21 <NA> NA > > 52 2009-07-22 <NA> NA > > 53 2009-07-23 <NA> NA > > 54 2009-07-24 <NA> NA > > 55 2009-07-25 <NA> NA > > 56 2009-07-26 <NA> NA > > 57 2009-07-27 <NA> NA > > 58 2009-07-28 <NA> NA > > > > ...ect.... > > > > 129 2009-10-07 <NA> NA > > 130 2009-10-08 <NA> NA > > 131 2009-10-09 <NA> NA > > 132 2009-10-10 <NA> NA > > 133 2009-10-11 <NA> NA > > 134 2009-10-12 <NA> NA > > 135 2009-10-13 <NA> NA > > 136 2009-10-14 <NA> NA > > 137 2009-10-15 2009-09-30 13039.00 > > 138 2009-10-16 <NA> NA > > 139 2009-10-17 <NA> NA > > 140 2009-10-18 <NA> NA > > 141 2009-10-19 <NA> NA > > 142 2009-10-20 <NA> NA > > 143 2009-10-21 <NA> NA > > Hi. > > Try first the following simpler version. > > # an input vector > x <- rep(NA, times=20) > x[4] <- "A" > x[9] <- "B" > x[17] <- "C" > > # extending the values forward > values <- c(NA, x[!is.na(x)]) > ind <- cumsum(!is.na(x)) + 1 > y <- values[ind] > > # compare with the original > cbind(x, y) > > x y > [1,] NA NA > [2,] NA NA > [3,] NA NA > [4,] "A" "A" > [5,] NA "A" > [6,] NA "A" > [7,] NA "A" > [8,] NA "A" > [9,] "B" "B" > [10,] NA "B" > [11,] NA "B" > [12,] NA "B" > [13,] NA "B" > [14,] NA "B" > [15,] NA "B" > [16,] NA "B" > [17,] "C" "C" > [18,] NA "C" > [19,] NA "C" > [20,] NA "C" > > This could be applied directly to the last two columns of your > data frame "xx". However, it may be more natural to obtain the > vector "values" from the input data and not from their sparse > form, which is the data frame. Also, the logical vector !is.na(x) > is the same for the last two columns of your data frame, so > it may be computed only once. > > Hope this helps. > > Petr Savicky. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.