[R] Faster way to zero-pad a data frame...?
Hello List, I am working on creating periodograms from IP network traffic logs using the Fast Fourier Transform. The FFT requires all the data points to be evenly-spaced in the time domain (constant delta-T), so I have a step where I zero-pad the data. Lately I've been wondering if there is a faster way to do this. Here's what I've got: * data1 is a data frame consisting of a timestamp, in seconds, from the beginning of the network log, and the number of network events that fell on that timestamp. Example: time,events 0,1 1,30 5,14 10,4 *data2 is the zero-padded data frame. It has length equal to the greatest value of time in data2: time,events 1,0 2,0 3,0 4,0 5,0 6,0 7,0 8,0 9,0 10,0 So I run this for loop: for(i in 1:length(data1[,1])) { data2[data1[i,1],2]-data1[i,2] } Which goes to each row in data1, reads the timestamp, and writes the events to the corresponding row in data2. The result is: time,events 0,1 1,30 2,0 3,0 4,0 5,14 6,0 7,0 9,0 9,0 10,4 For a 24-hour log (86,400 seconds) this can take a while...Any advice on how to speed it up would be appreciated. Thanks, Pete Cap - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Faster way to zero-pad a data frame...?
How about starting your time from 1 instead of 0 to make indexing earier (you can always substract one later). If so: x time events 11 1 22 30 36 14 4 11 4 y - data.frame(time=seq(max(x$time)), events=rep(0, max(x$time))) y time events 1 1 0 2 2 0 3 3 0 4 4 0 5 5 0 6 6 0 7 7 0 8 8 0 9 9 0 10 10 0 11 11 0 y$events[x$time] - x$events y time events 1 1 1 2 2 30 3 3 0 4 4 0 5 5 0 6 6 14 7 7 0 8 8 0 9 9 0 10 10 0 11 11 4 On 5/30/06, Pete Cap [EMAIL PROTECTED] wrote: Hello List, I am working on creating periodograms from IP network traffic logs using the Fast Fourier Transform. The FFT requires all the data points to be evenly-spaced in the time domain (constant delta-T), so I have a step where I zero-pad the data. Lately I've been wondering if there is a faster way to do this. Here's what I've got: * data1 is a data frame consisting of a timestamp, in seconds, from the beginning of the network log, and the number of network events that fell on that timestamp. Example: time,events 0,1 1,30 5,14 10,4 *data2 is the zero-padded data frame. It has length equal to the greatest value of time in data2: time,events 1,0 2,0 3,0 4,0 5,0 6,0 7,0 8,0 9,0 10,0 So I run this for loop: for(i in 1:length(data1[,1])) { data2[data1[i,1],2]-data1[i,2] } Which goes to each row in data1, reads the timestamp, and writes the events to the corresponding row in data2. The result is: time,events 0,1 1,30 2,0 3,0 4,0 5,14 6,0 7,0 9,0 9,0 10,4 For a 24-hour log (86,400 seconds) this can take a while...Any advice on how to speed it up would be appreciated. Thanks, Pete Cap - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Jim Holtman Cincinnati, OH +1 513 646 9390 (Cell) +1 513 247 0281 (Home) What is the problem you are trying to solve? [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Faster way to zero-pad a data frame...?
Try this: Lines - time,events 0,1 1,30 5,14 10,4 library(zoo) data1 - read.zoo(textConnection(Lines), header = TRUE, sep = ,) data2 - as.ts(data1) data2[is.na(data2)] - 0 # omit this lines if NAs in extra positions is ok On 5/30/06, Pete Cap [EMAIL PROTECTED] wrote: Hello List, I am working on creating periodograms from IP network traffic logs using the Fast Fourier Transform. The FFT requires all the data points to be evenly-spaced in the time domain (constant delta-T), so I have a step where I zero-pad the data. Lately I've been wondering if there is a faster way to do this. Here's what I've got: * data1 is a data frame consisting of a timestamp, in seconds, from the beginning of the network log, and the number of network events that fell on that timestamp. Example: time,events 0,1 1,30 5,14 10,4 *data2 is the zero-padded data frame. It has length equal to the greatest value of time in data2: time,events 1,0 2,0 3,0 4,0 5,0 6,0 7,0 8,0 9,0 10,0 So I run this for loop: for(i in 1:length(data1[,1])) { data2[data1[i,1],2]-data1[i,2] } Which goes to each row in data1, reads the timestamp, and writes the events to the corresponding row in data2. The result is: time,events 0,1 1,30 2,0 3,0 4,0 5,14 6,0 7,0 9,0 9,0 10,4 For a 24-hour log (86,400 seconds) this can take a while...Any advice on how to speed it up would be appreciated. Thanks, Pete Cap - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Faster way to zero-pad a data frame...?
Why not something simple like: # Toy example: data1 - data.frame(time=c(0,1,5,10),events=c(1,30,14,4)) data2 - rep(0,11) # Or more generally data2 - rep(0,1+max(data1$time)) # You don't need a for loop! Use the indexing capabilities of R! data2[data1$time+1] - data1$events # The ``+1'' is to allow for 0-origin. data2 - ts(data2,start=0) ??? cheers, Rolf Turner [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html