Re: [R] Selecting ranges of dates from a dataframe

Francisco Gochez Thu, 10 Mar 2011 09:59:54 -0800

Benjamin,

A more elegant "R-style" solution would be to use one of R's
"apply"/aggregation routines, of which there are many. For example, the "by"
function can split a data.frame by some factor/categorical variable(s), and
then apply a function to each "slice".  The result can then be pieced back
together.  See below for an example in which this factor is simply a
parallel vector of pure dates:


# extract pure date component of time and date
dates <- format(serv$datum, "%Y-%m-%d")

# write auxilliary function to aggregate a "slice" of the data.frame
# x will be a "slice" of data from a single day
aggregateDf <- function(x)
{
    # return a one-row data.frame
    data.frame(datum = format(x$datum[1], "%Y-%m-%d"), write = sum(x$write),
read = sum(x$read) )
}

# now process each "slice" of the serv data.frame using "by"
splitVals <- by(serv, dates, aggregateDf )

# bind back into a single data.frame
values <- do.call(rbind, splitVals)


The difference in execution speed is pretty negligible on my machine, so
it's a more concise solution but I don't know if it is much faster.

HTH,

Francisco

On Thu, Mar 10, 2011 at 1:23 PM, Benjamin Stier <
benjamin.st...@ub.uni-tuebingen.de> wrote:

> Hello list!
>
> I have a data.frame which looks like this:
> > serv
> datum op.read op.write   read   write
> 1   2011-01-29 10:00:00       0        0      0       0
> 2   2011-01-29 10:00:01       0        0      0       0
> 3   2011-01-29 10:00:02       0        0      0       0
> 4   2011-01-29 10:00:03       0        4      0  647168
> 5   2011-01-29 10:00:04       0        0      0       0
> 6   2011-01-29 10:00:05       0       14      0 1960837
> 7   2011-01-29 10:00:06       0        0      0       0
> ...
> 115 2011-01-30 10:00:54       0        0      0       0
> 116 2011-01-30 10:00:55       0        0      0       0
> 117 2011-01-30 10:00:56       0        0      0       0
> 118 2011-01-30 10:00:57      54        0  29184       0
> 119 2011-01-30 10:00:58     204        0 122880       0
> 120 2011-01-30 10:00:59       0        0      0       0
> ...
>
> I want to compare read/write from each day. I already have a solution, but
> it
> is pretty slow.
>
> # read the data
> serv <- read.delim("cut.inp")
>
> # Reformat the dates from the file
> serv$datum <- strptime(serv$datum,  "%Y-%m-%d %H:%M:%S")
>
> # select all single days
> dates.serv <- unique(strptime(serv$datum, format="%Y-%m-%d"))
>
> # create a data.frame
> values <- data.frame(row.names=1, datum=numeric(0), write=numeric(0),
> read=numeric(0))
> for(i in as.character(dates.serv)) {
>        # build up a values for a day-range
>        searchstart <- as.POSIXlt(paste(i, "00:00:00", sep=" "))
>        searchend <- as.POSIXlt(paste(i, "23:59:59", sep=" "))
>        # select all values from a specific day
>        day <- serv[(serv$datum >= searchstart & serv$datum <= searchend),]
>        write <- as.numeric(sum(as.numeric(day$write)))
>        read <- as.numeric(sum(as.numeric(day$read)))
>        # add to the data.frame
>        values <- rbind(values, data.frame(datum=i, write=write, read=read))
> }
>
> This is my first try using R for statistics so I'm sure this isn't the best
> solution.
> The for-loop does it's job, but as I said is really slow. My data is for 21
> days and 1 line per second.
> Is there a better way to select the date-ranges instead of a for-loop? The
> line where I select all values for "day" seems to be the heaviest. Any
> idea?
>
> Kind regards,
>
> Benjamin
>
> PS: I attached some sample data, in case you want to try for yourself.
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Selecting ranges of dates from a dataframe

Reply via email to