On Mar 10, 2011, at 8:23 AM, Benjamin Stier wrote:

Hello list!

I have a data.frame which looks like this:
serv
datum op.read op.write   read   write
1   2011-01-29 10:00:00       0        0      0       0
2   2011-01-29 10:00:01       0        0      0       0
3   2011-01-29 10:00:02       0        0      0       0
4   2011-01-29 10:00:03       0        4      0  647168
5   2011-01-29 10:00:04       0        0      0       0
6   2011-01-29 10:00:05       0       14      0 1960837
7   2011-01-29 10:00:06       0        0      0       0
...
115 2011-01-30 10:00:54       0        0      0       0
116 2011-01-30 10:00:55       0        0      0       0
117 2011-01-30 10:00:56       0        0      0       0
118 2011-01-30 10:00:57      54        0  29184       0
119 2011-01-30 10:00:58     204        0 122880       0
120 2011-01-30 10:00:59       0        0      0       0
...

I want to compare read/write from each day. I already have a solution, but it
is pretty slow.

See if this is any faster:
> aggregate(serv[, c("read", "write")], list(format(serv$datum, "%Y- %m-%d")), sum)
     Group.1    read    write
1 2011-01-29 1021439 11726356
2 2011-01-30 1089534  4634910


# read the data
serv <- read.delim("cut.inp")

# Reformat the dates from the file
serv$datum <- strptime(serv$datum,  "%Y-%m-%d %H:%M:%S")

# select all single days
dates.serv <- unique(strptime(serv$datum, format="%Y-%m-%d"))

# create a data.frame
values <- data.frame(row.names=1, datum=numeric(0), write=numeric(0), read=numeric(0))
for(i in as.character(dates.serv)) {
       # build up a values for a day-range
       searchstart <- as.POSIXlt(paste(i, "00:00:00", sep=" "))
       searchend <- as.POSIXlt(paste(i, "23:59:59", sep=" "))
       # select all values from a specific day
day <- serv[(serv$datum >= searchstart & serv$datum <= searchend),]
       write <- as.numeric(sum(as.numeric(day$write)))
       read <- as.numeric(sum(as.numeric(day$read)))
       # add to the data.frame
values <- rbind(values, data.frame(datum=i, write=write, read=read))
}

This is my first try using R for statistics so I'm sure this isn't the best
solution.
The for-loop does it's job, but as I said is really slow. My data is for 21
days and 1 line per second.
Is there a better way to select the date-ranges instead of a for- loop? The line where I select all values for "day" seems to be the heaviest. Any idea?

Kind regards,

Benjamin

PS: I attached some sample data, in case you want to try for yourself.
<cut.inp>______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to