I have a time series with interval of 2.5 minutes, or 24 observations per 
hour.  I am trying to find a 1 hr moving average, looking backward, so 
that moving average at n = mean(n-23 : n)

The time series has about 1.5 million rows, with occasional gaps due to 
poor data quality.  I only want to take a 1 hour moving average for those 
periods that are complete, i.e. have 24 observations in the previous hour.

The data is in 3 columns

Value   DateTime        interval

For example:
Value <- rnorm (100, 50, 3)  #my data has 1.5 million rows; using 100 here 
for simple example
DateTime <-  seq(from = 915148800, to=915156150, by =150)  #time steps 
1:50 at 150 second intervals
DateTime [51] <- 915156450 #skip one time step; 
DateTime[52:100] <- seq(from = 915156600, to =915163800, by = 150) #resume 
time steps of 150 seconds
 
x <- cbind (Value, DateTime)
x <- as.data.frame(x)
x$DateTime <-as.POSIXct(x$DateTime, origin="1970-01-01", tz="GMT")  
x1 <- x[-c(1:23), ]   #trimming x to create direct comparison of DateTimes 
in x and x1
x[,3] <-    difftime(x1[,2], x[,2], units="mins")  #ignore warning message
colnames(x) [3]  <- "interval"
x[24:nrow(x),3] <- x[1:(nrow(x)-23),3]     #set interval to be the number 
of minutes between n-23 and n. 
        #57.5 indicates no gaps in the previous hour up to and including 
n. 
        #    >57.5 indicates a gap in the previous n-23 rows. 
x[1:23,3] <- 0/0  #NaN assigned to first 23 rows so as not to take average 
of first hour.
#as expected, row 51: 73 indicates a gap, i.e. interval > 57.5

index <- which (x [,3] == 57.5)  #which rows have no gaps in previous hour

#loop to calculate 1 hour moving average, only for periods which are 
complete
for (i in 1:length(index)) {
    x [index[i],4] <- mean(x[index[i]-23,1] : x[index[i],1])
  } 

#This loop works on this simple example; but this takes VERY long time to 
run  on x with 1.5 million rows.  Over 1 hour running and still not 
complete.  I also tried increasing memory.limit to 4095, but still very 
slow.
 
Any suggestions to make this run faster?  I thought about using the lag 
function but could not get it to work


 
        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to