Re: [R] dataframe calculations based on certain values of a column
Thanks, your solution using ave() works perfectly. /johannes -Ursprüngliche Nachricht- Von: Bert Gunter An: Johannes Radinger Cc: R help Gesendet: Mittwoch, 26. März 2014 16:45:43 GMT+00:00 Betreff: Re: [R] dataframe calculations based on certain values of a column I believe this will generalize. But check carefully! Using your example (Excellent!), use ave(): with(df,ave(seq_along(var1),var2,FUN=function(i) var3[i]/var3[i][var1[i]=="c"])) [1] 0.500 1.000 1.000 0.833 0.333 1.000 1.750 [8] 1.000 1.000 This is kind of a low level brute force approach. Others may have more elegant approaches. -- Bert Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 "Data is not information. Information is not knowledge. And knowledge is certainly not wisdom." H. Gilbert Welch On Wed, Mar 26, 2014 at 9:09 AM, Johannes Radinger wrote: > Hi, > > I have data in a dataframe in following structure > var1 <- c("a","b","c","a","b","c","a","b","c") > var2 <- c("X","X","X","Y","Y","Y","Z","Z","Z") > var3 <- c(1,2,2,5,2,6,7,4,4) > df <- data.frame(var1,var2,var3) > > Now I'd like to calculate relative values of var3. This values > should be relative to the base value (where var1=c) which is > indicated for each group (var2). > > To illustrate how my result column should look like I divide > the column var3 by a vector c(2,2,2,6,6,6,4,4,4) (= for each group > of var2 the value c) > > Of course this can also be done like this: > df$div <- rep(df$var3[df$var1=="c"],each=length(unique(df$var1))) > df$result_calc <- df$var3/df$div > > > However what when the dataframe is not as simple and not that well ordered > as > in the example here. So for example there is always a value c for each group > but all the "c"s are clumped in the last rows of the dataframe or scatterd > in a random > mannar. Is there a simple way to still calculate such relative values. > Probably with an approach using apply, but maybe someone can give me a hint. > Or do I need to sort my dataframe in order to do such calculations? > > best, > > /Johannes > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] dataframe calculations based on certain values of a column
dplyr's group_by and mutate can create those columns for you: var1 <- c("a","b","c","a","b","c","a","b","c") var2 <- c("X","X","X","Y","Y","Y","Z","Z","Z") var3 <- c(1,2,2,5,2,6,7,4,4) df <- data.frame(var1,var2,var3) dt <- tbl_df(df) dt %.% group_by(var2) %.% mutate( div = var3[var1 == "c"], result_calc = var3/div ) On 2014-03-26 12:09, Johannes Radinger wrote: Hi, I have data in a dataframe in following structure var1 <- c("a","b","c","a","b","c","a","b","c") var2 <- c("X","X","X","Y","Y","Y","Z","Z","Z") var3 <- c(1,2,2,5,2,6,7,4,4) df <- data.frame(var1,var2,var3) Now I'd like to calculate relative values of var3. This values should be relative to the base value (where var1=c) which is indicated for each group (var2). To illustrate how my result column should look like I divide the column var3 by a vector c(2,2,2,6,6,6,4,4,4) (= for each group of var2 the value c) Of course this can also be done like this: df$div <- rep(df$var3[df$var1=="c"],each=length(unique(df$var1))) df$result_calc <- df$var3/df$div However what when the dataframe is not as simple and not that well ordered as in the example here. So for example there is always a value c for each group but all the "c"s are clumped in the last rows of the dataframe or scatterd in a random mannar. Is there a simple way to still calculate such relative values. Probably with an approach using apply, but maybe someone can give me a hint. Or do I need to sort my dataframe in order to do such calculations? best, /Johannes [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] dataframe calculations based on certain values of a column
On 26-03-2014, at 17:09, Johannes Radinger wrote: > Hi, > > I have data in a dataframe in following structure > var1 <- c("a","b","c","a","b","c","a","b","c") > var2 <- c("X","X","X","Y","Y","Y","Z","Z","Z") > var3 <- c(1,2,2,5,2,6,7,4,4) > df <- data.frame(var1,var2,var3) > > Now I'd like to calculate relative values of var3. This values > should be relative to the base value (where var1=c) which is > indicated for each group (var2). > > To illustrate how my result column should look like I divide > the column var3 by a vector c(2,2,2,6,6,6,4,4,4) (= for each group > of var2 the value c) > > Of course this can also be done like this: > df$div <- rep(df$var3[df$var1=="c"],each=length(unique(df$var1))) > df$result_calc <- df$var3/df$div > > > However what when the dataframe is not as simple and not that well ordered > as > in the example here. So for example there is always a value c for each group > but all the "c"s are clumped in the last rows of the dataframe or scatterd > in a random > mannar. Is there a simple way to still calculate such relative values. > Probably with an approach using apply, but maybe someone can give me a hint. > Or do I need to sort my dataframe in order to do such calculations? Create a list splitting the data.frame into groups defined by column var2. And perform the calculation you need. Like this df <- data.frame(var1,var2,var3, stringsAsFactors=FALSE) L <- by(df,list(df$var2), FUN=function(x) { k <- which(x$var1=="c"); x$rel <- x$var3/x$var3[k];x}) And then convert the list L back to a data.frame. See the following two stackoverflow pages for the various ways this can be done. http://stackoverflow.com/questions/4227223/r-list-to-data-frame http://stackoverflow.com/questions/4512465/what-is-the-most-efficient-way-to-cast-a-list-as-a-data-frame?rq=1 Two methods from the first page: data.frame(Reduce(rbind,L)) library (plyr) ldply (L, data.frame) and one method from the second page: for this method do.call(rbind,L) Berend __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] dataframe calculations based on certain values of a column
I believe this will generalize. But check carefully! Using your example (Excellent!), use ave(): with(df,ave(seq_along(var1),var2,FUN=function(i) var3[i]/var3[i][var1[i]=="c"])) [1] 0.500 1.000 1.000 0.833 0.333 1.000 1.750 [8] 1.000 1.000 This is kind of a low level brute force approach. Others may have more elegant approaches. -- Bert Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 "Data is not information. Information is not knowledge. And knowledge is certainly not wisdom." H. Gilbert Welch On Wed, Mar 26, 2014 at 9:09 AM, Johannes Radinger wrote: > Hi, > > I have data in a dataframe in following structure > var1 <- c("a","b","c","a","b","c","a","b","c") > var2 <- c("X","X","X","Y","Y","Y","Z","Z","Z") > var3 <- c(1,2,2,5,2,6,7,4,4) > df <- data.frame(var1,var2,var3) > > Now I'd like to calculate relative values of var3. This values > should be relative to the base value (where var1=c) which is > indicated for each group (var2). > > To illustrate how my result column should look like I divide > the column var3 by a vector c(2,2,2,6,6,6,4,4,4) (= for each group > of var2 the value c) > > Of course this can also be done like this: > df$div <- rep(df$var3[df$var1=="c"],each=length(unique(df$var1))) > df$result_calc <- df$var3/df$div > > > However what when the dataframe is not as simple and not that well ordered > as > in the example here. So for example there is always a value c for each group > but all the "c"s are clumped in the last rows of the dataframe or scatterd > in a random > mannar. Is there a simple way to still calculate such relative values. > Probably with an approach using apply, but maybe someone can give me a hint. > Or do I need to sort my dataframe in order to do such calculations? > > best, > > /Johannes > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] dataframe calculations based on certain values of a column
Hi, I have data in a dataframe in following structure var1 <- c("a","b","c","a","b","c","a","b","c") var2 <- c("X","X","X","Y","Y","Y","Z","Z","Z") var3 <- c(1,2,2,5,2,6,7,4,4) df <- data.frame(var1,var2,var3) Now I'd like to calculate relative values of var3. This values should be relative to the base value (where var1=c) which is indicated for each group (var2). To illustrate how my result column should look like I divide the column var3 by a vector c(2,2,2,6,6,6,4,4,4) (= for each group of var2 the value c) Of course this can also be done like this: df$div <- rep(df$var3[df$var1=="c"],each=length(unique(df$var1))) df$result_calc <- df$var3/df$div However what when the dataframe is not as simple and not that well ordered as in the example here. So for example there is always a value c for each group but all the "c"s are clumped in the last rows of the dataframe or scatterd in a random mannar. Is there a simple way to still calculate such relative values. Probably with an approach using apply, but maybe someone can give me a hint. Or do I need to sort my dataframe in order to do such calculations? best, /Johannes [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Dataframe calculations
Hi If I understand correctly you want to add wait and travel time to first arrive for each block of data in one day test<-SCHEDULE2 test$ARRIVE[test$ARRIVE==0]<-NA library(zoo) test$ARRIVE<-na.locf(test$ARRIVE) datumA<-paste(paste(test$MM, test$DD, test$YEAR, sep="."), test$ARRIVE, sep=" ") datumA<-strptime(datumA, format="%m.%d.%Y %H:%M:%S") w<-cumsum(test$WAIT[1:4]*60) tr<-cumsum(test$TRAVEL[1:4]*60) arrivals <- datumA[1:4]+w+tr departures <- datumA[1:4]+w+c(0,tr[1:3]) now you can either make a cycle in which you choose appropriate values from your data frame or try to look at split/lapply/sapply solution. I would try a cycle with such index idx<-seq(1,316,4) for (i in idx) { wi <- cumsum(test$WAIT[i:(i+4)]*60) tri <- cumsum(test$TRAVEL[i:(i+4)]*60) arrivals <- datumA[i:(i+4)]+wi+tri departures <- datumA[i:(i+4)]+wi+c(0,tri[1:3]) test$ARRIVALS [i+1:i+3] <- arrivals[1:3] test$DEPARTURES[i:i+4] <- departures } untested Regards Petr r-help-boun...@r-project.org napsal dne 19.03.2010 18:58:09: > Unfortunately, that did not correct the problem. Times for 'ARRIVE' need to be > either 07:00:00 or 14:30:00 for the first case of each unique 'MM' by 'DD' > subgroup (the others will be calculated), and the code produces calculations > that I can't interpret from the fixed numbers. Also, 'ARRIVE' and 'DEPART' > incorrectly have the same value for the first case of each unique 'MM' by 'DD' > subgroup. 'DEPART' should equal 'ARRIVE' plus the 'WAIT' time in minutes of > the same line. > > Thank you, > > Mike > > -Original Message- > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On > Behalf Of Erich Neuwirth > Sent: Friday, March 19, 2010 1:33 PM > To: r-help@r-project.org > Subject: Re: [R] Dataframe calculations > > Sorry, > Oddly I got the use of odds and evens the wrong way round. > > addDelays <- function(arriveTime,waitVec,travelVec){ > start<-as.POSIXct(arriveTime,format="%H:%M:%S") > delays<-as.vector(t(cbind(waitVec,travelVec))) > newtimes<-format(start+cumsum(delays)*60,format="%H:%M:%S") > list(departs=c(arriveTime,(evens(newtimes))[-1]), >arrives=odds(newtimes)) > } > > Using the new definition of addDelays above should do the trick. > > > > On 3/19/2010 5:30 PM, Hosack, Michael wrote: > > Erich, > > > > Thank you so much for the effort you put into writing this code. > > I ran it and then assigned the two variables you created to the > > 'ARRIVE' and 'DEPART' variables of my dataframe as you directed and > > the resultant calculations were incorrect. I am not sure why it did > > not work, I do not yet grasp the coding, I am still a novice. > > Perhaps you or someone else could rerun your code on my original > > dataframe and see why it did not yield the correct results. > > > > Thank you, > > > > Mike > > > > -Original Message- > > From: r-help-boun...@r-project.org [ mailto:r-help-boun...@r-project.org] On > Behalf Of Erich Neuwirth > > Sent: Friday, March 19, 2010 11:38 AM > > To: r-help@r-project.org > > Subject: Re: [R] Dataframe calculations > > > > with the following code > > > > newvars()$ARRIVALS and newvars()$DEPARTURES > > will give you the new variables you need. > > > > > > -=-=-= > > > > > > addDelays <- function(arriveTime,waitVec,travelVec){ > > start<-as.POSIXct(arriveTime,format="%H:%M:%S") > > delays<-as.vector(t(cbind(waitVec,travelVec))) > > newtimes<-format(start+cumsum(delays)*60,format="%H:%M:%S") > > list(departs=c(arriveTime,(odds(newtimes))[-1]), > >arrives=evens(newtimes)) > > } > > > > odds <- function(inVec){ > > indvec<-0:(floor((length(inVec)-1)/2)) > > inVec[2*indvec+1] > > } > > > > evens <- function(inVec){ > > odds(inVec[-1]) > > } > > > > > > newvars <- function(){ > > DATE<-with(SCHEDULE2,paste(YEAR,MM,DD,sep="")) > > starts<-as.list(with(SCHEDULE2,tapply(ARRIVE,DATE,function(x)x[1]))) > > waits<-with(SCHEDULE2,tapply(WAIT,DATE,function(x)x)) > > travels<-with(SCHEDULE2,tapply(TRAVEL,DATE,function(x)x)) > > list(DEPARTURES= > > > > as.vector(mapply(function(...)addDelays(...)$departs,starts,waits,travels)), > > ARRIVALS= > > > > as.vector(mapply(function(...)addDelays(...)$arrives,start
Re: [R] Dataframe calculations
try this: # add 'date' to separate the data SCHEDULE2 <- within(SCHEDULE2, { date <- paste(YEAR, '-', MM, '-', DD, sep='') ARRIVE <- as.POSIXct(paste(date, ARRIVE)) DEPART <- as.POSIXct(paste(date, DEPART)) }) # process each day result <- lapply(split(SCHEDULE2, SCHEDULE2$date), function(.day){ # assume first line is complete; convert to POSIXct for (i in 2:nrow(.day)){ .day$ARRIVE[i] <- .day$DEPART[i - 1L] + (.day$TRAVEL[i - 1L] * 60) .day$DEPART[i] <- .day$ARRIVE[i] + (.day$WAIT[i] * 60) } # return the changes .day }) SCHEDULE2 <- do.call(rbind, result) On Fri, Mar 19, 2010 at 9:05 AM, Hosack, Michael wrote: > Hi everyone, > > My question will probably seem simple to most of you, but I > have spent many hours trying to solve it. I need to perform > a series of sequential calculations on my dataframe that move > across rows and down columns, and then repeat themselves at > each unique 'MM' by 'DD' grouping. Specifically, I want to add > 'DEPART' time (24 hr time) to 'TRAVEL'(minutes) in line 1 and > put the result in 'ARRIVE' (24 hr time) of line 2, then I want > to add 'WAIT' (minutes) to that 'ARRIVE' time of line 2 to > create 'DEPART', which will then be combined with 'TRAVEL' > (minutes) to yield the 'ARRIVE' time of line 3, etc. This > series of calc's will start anew beginning at each unique 'MM' > by 'DD' grouping. Any advice would be greatly appreciated. > > Thank you, > > Mike > > SCHEDULE2 <- > structure(list(MM = c("05", "05", "05", "05", "05", "05", "05", > "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", > "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", > "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", > "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", > "05", "05", "05", "05", "05", "06", "06", "06", "06", "06", "06", > "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", > "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", > "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", > "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", > "06", "06", "07", "07", "07", "07", "07", "07", "07", "07", "07", > "07", "07", "07", "07", "07", "07", "07", "07", "07", "07", "07", > "07", "07", "07", "07", "07", "07", "07", "07", "07", "07", "07", > "07", "07", "07", "07", "07", "07", "07", "07", "07", "07", "07", > "07", "07", "07", "07", "07", "07", "08", "08", "08", "08", "08", > "08", "08", "08", "08", "08", "08", "08", "08", "08", "08", "08", > "08", "08", "08", "08", "08", "08", "08", "08", "08", "08", "08", > "08", "08", "08", "08", "08", "08", "08", "08", "08", "08", "08", > "08", "08", "08", "08", "08", "08", "08", "08", "08", "08", "08", > "08", "08", "08", "08", "08", "08", "08", "09", "09", "09", "09", > "09", "09", "09", "09", "09", "09", "09", "09", "09", "09", "09", > "09", "09", "09", "09", "09", "09", "09", "09", "09", "09", "09", > "09", "09", "09", "09", "09", "09", "09", "09", "09", "09", "09", > "09", "09", "09", "09", "09", "09", "09", "09", "09", "09", "09", > "09", "09", "09", "09", "10", "10", "10", "10", "10", "10", "10", > "10", "10", "10", "10", "10", "10", "10", "10", "10", "10", "10", > "10", "10", "10", "10", "10", "10", "10", "10", "10", "10", "10", > "10", "10", "10", "10", "10", "10", "10", "10", "10", "10", "10", > "10", "10", "10", "10", "10", "10", "10", "10", "10", "10", "10", > "10"), DD = c("02", "02", "02", "02", "03", "03", "03", "03", > "06", "06", "06", "06", "09", "09", "09", "09", "10", "10", "10", > "10", "14", "14", "14", "14", "16", "16", "16", "16", "17", "17", > "17", "17", "19", "19", "19", "19", "22", "22", "22", "22", "24", > "24", "24", "24", "27", "27", "27", "27", "29", "29", "29", "29", > "31", "31", "31", "31", "04", "04", "04", "04", "06", "06", "06", > "06", "07", "07", "07", "07", "10", "10", "10", "10", "12", "12", > "12", "12", "16", "16", "16", "16", "17", "17", "17", "17", "19", > "19", "19", "19", "22", "22", "22", "22", "23", "23", "23", "23", > "27", "27", "27", "27", "28", "28", "28", "28", "29", "29", "29", > "29", "03", "03", "03", "03", "05", "05", "05", "05", "09", "09", > "09", "09", "10", "10", "10", "10", "13", "13", "13", "13", "14", > "14", "14", "14", "18", "18", "18", "18", "22", "22", "22", "22", > "23", "23", "23", "23", "24", "24", "24", "24", "27", "27", "27", > "27", "28", "28", "28", "28", "01", "01", "01", "01", "04", "04", > "04", "04", "06", "06", "06", "06", "07", "07", "07", "07", "12", > "12", "12", "12", "13", "13", "13", "13", "14", "14", "14", "14", > "16", "16", "16", "16", "19", "19", "19", "19", "21", "21", "21", > "21", "23", "23", "23", "23", "24", "24", "24", "24", "28", "28", > "28", "28", "31", "31", "31", "31", "02", "02", "02", "02", "04", > "04", "04", "04", "08", "08", "08", "08", "09", "09", "09", "09", > "11", "11", "11", "11", "14", "14", "14", "14", "16", "16", "16", > "16", "19", "19", "19", "19", "20", "20", "20", "20", "
Re: [R] Dataframe calculations
Unfortunately, that did not correct the problem. Times for 'ARRIVE' need to be either 07:00:00 or 14:30:00 for the first case of each unique 'MM' by 'DD' subgroup (the others will be calculated), and the code produces calculations that I can't interpret from the fixed numbers. Also, 'ARRIVE' and 'DEPART' incorrectly have the same value for the first case of each unique 'MM' by 'DD' subgroup. 'DEPART' should equal 'ARRIVE' plus the 'WAIT' time in minutes of the same line. Thank you, Mike -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Erich Neuwirth Sent: Friday, March 19, 2010 1:33 PM To: r-help@r-project.org Subject: Re: [R] Dataframe calculations Sorry, Oddly I got the use of odds and evens the wrong way round. addDelays <- function(arriveTime,waitVec,travelVec){ start<-as.POSIXct(arriveTime,format="%H:%M:%S") delays<-as.vector(t(cbind(waitVec,travelVec))) newtimes<-format(start+cumsum(delays)*60,format="%H:%M:%S") list(departs=c(arriveTime,(evens(newtimes))[-1]), arrives=odds(newtimes)) } Using the new definition of addDelays above should do the trick. On 3/19/2010 5:30 PM, Hosack, Michael wrote: > Erich, > > Thank you so much for the effort you put into writing this code. > I ran it and then assigned the two variables you created to the > 'ARRIVE' and 'DEPART' variables of my dataframe as you directed and > the resultant calculations were incorrect. I am not sure why it did > not work, I do not yet grasp the coding, I am still a novice. > Perhaps you or someone else could rerun your code on my original > dataframe and see why it did not yield the correct results. > > Thank you, > > Mike > > -Original Message- > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On > Behalf Of Erich Neuwirth > Sent: Friday, March 19, 2010 11:38 AM > To: r-help@r-project.org > Subject: Re: [R] Dataframe calculations > > with the following code > > newvars()$ARRIVALS and newvars()$DEPARTURES > will give you the new variables you need. > > > -=-=-= > > > addDelays <- function(arriveTime,waitVec,travelVec){ > start<-as.POSIXct(arriveTime,format="%H:%M:%S") > delays<-as.vector(t(cbind(waitVec,travelVec))) > newtimes<-format(start+cumsum(delays)*60,format="%H:%M:%S") > list(departs=c(arriveTime,(odds(newtimes))[-1]), >arrives=evens(newtimes)) > } > > odds <- function(inVec){ > indvec<-0:(floor((length(inVec)-1)/2)) > inVec[2*indvec+1] > } > > evens <- function(inVec){ > odds(inVec[-1]) > } > > > newvars <- function(){ > DATE<-with(SCHEDULE2,paste(YEAR,MM,DD,sep="")) > starts<-as.list(with(SCHEDULE2,tapply(ARRIVE,DATE,function(x)x[1]))) > waits<-with(SCHEDULE2,tapply(WAIT,DATE,function(x)x)) > travels<-with(SCHEDULE2,tapply(TRAVEL,DATE,function(x)x)) > list(DEPARTURES= > > as.vector(mapply(function(...)addDelays(...)$departs,starts,waits,travels)), > ARRIVALS= > > as.vector(mapply(function(...)addDelays(...)$arrives,starts,waits,travels))) > } > > > > SCHEDULE2 <- > structure(list(MM = c("05", "05", "05", "05", "05", "05", "05", "05", "05", > "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", > "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", > "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", > "05", "05", "05", "05", "05", "05", "05", "05", "06", "06", "06", "06", "06", > "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", > "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", > "06", "06", "06", "06&qu
Re: [R] Dataframe calculations
Sorry, Oddly I got the use of odds and evens the wrong way round. addDelays <- function(arriveTime,waitVec,travelVec){ start<-as.POSIXct(arriveTime,format="%H:%M:%S") delays<-as.vector(t(cbind(waitVec,travelVec))) newtimes<-format(start+cumsum(delays)*60,format="%H:%M:%S") list(departs=c(arriveTime,(evens(newtimes))[-1]), arrives=odds(newtimes)) } Using the new definition of addDelays above should do the trick. On 3/19/2010 5:30 PM, Hosack, Michael wrote: > Erich, > > Thank you so much for the effort you put into writing this code. > I ran it and then assigned the two variables you created to the > 'ARRIVE' and 'DEPART' variables of my dataframe as you directed and > the resultant calculations were incorrect. I am not sure why it did > not work, I do not yet grasp the coding, I am still a novice. > Perhaps you or someone else could rerun your code on my original > dataframe and see why it did not yield the correct results. > > Thank you, > > Mike > > -Original Message- > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On > Behalf Of Erich Neuwirth > Sent: Friday, March 19, 2010 11:38 AM > To: r-help@r-project.org > Subject: Re: [R] Dataframe calculations > > with the following code > > newvars()$ARRIVALS and newvars()$DEPARTURES > will give you the new variables you need. > > > -=-=-= > > > addDelays <- function(arriveTime,waitVec,travelVec){ > start<-as.POSIXct(arriveTime,format="%H:%M:%S") > delays<-as.vector(t(cbind(waitVec,travelVec))) > newtimes<-format(start+cumsum(delays)*60,format="%H:%M:%S") > list(departs=c(arriveTime,(odds(newtimes))[-1]), >arrives=evens(newtimes)) > } > > odds <- function(inVec){ > indvec<-0:(floor((length(inVec)-1)/2)) > inVec[2*indvec+1] > } > > evens <- function(inVec){ > odds(inVec[-1]) > } > > > newvars <- function(){ > DATE<-with(SCHEDULE2,paste(YEAR,MM,DD,sep="")) > starts<-as.list(with(SCHEDULE2,tapply(ARRIVE,DATE,function(x)x[1]))) > waits<-with(SCHEDULE2,tapply(WAIT,DATE,function(x)x)) > travels<-with(SCHEDULE2,tapply(TRAVEL,DATE,function(x)x)) > list(DEPARTURES= > > as.vector(mapply(function(...)addDelays(...)$departs,starts,waits,travels)), > ARRIVALS= > > as.vector(mapply(function(...)addDelays(...)$arrives,starts,waits,travels))) > } > > > > SCHEDULE2 <- > structure(list(MM = c("05", "05", "05", "05", "05", "05", "05", "05", "05", > "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", > "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", > "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", > "05", "05", "05", "05", "05", "05", "05", "05", "06", "06", "06", "06", "06", > "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", > "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", > "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", > "06", "06", "06", "06", "06", "06", "06", "06", "07", "07", "07", "07", "07", > "07", "07", "07", "07", "07", "07", "07", "07", "07", "07", "07", "07", "07", > "07", "07", "07", "07", "07", "07", "07", "07", "07", "07", "07", "07", "07", > "07", &q
Re: [R] Dataframe calculations
Erich, Thank you so much for the effort you put into writing this code. I ran it and then assigned the two variables you created to the 'ARRIVE' and 'DEPART' variables of my dataframe as you directed and the resultant calculations were incorrect. I am not sure why it did not work, I do not yet grasp the coding, I am still a novice. Perhaps you or someone else could rerun your code on my original dataframe and see why it did not yield the correct results. Thank you, Mike -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Erich Neuwirth Sent: Friday, March 19, 2010 11:38 AM To: r-help@r-project.org Subject: Re: [R] Dataframe calculations with the following code newvars()$ARRIVALS and newvars()$DEPARTURES will give you the new variables you need. -=-=-= addDelays <- function(arriveTime,waitVec,travelVec){ start<-as.POSIXct(arriveTime,format="%H:%M:%S") delays<-as.vector(t(cbind(waitVec,travelVec))) newtimes<-format(start+cumsum(delays)*60,format="%H:%M:%S") list(departs=c(arriveTime,(odds(newtimes))[-1]), arrives=evens(newtimes)) } odds <- function(inVec){ indvec<-0:(floor((length(inVec)-1)/2)) inVec[2*indvec+1] } evens <- function(inVec){ odds(inVec[-1]) } newvars <- function(){ DATE<-with(SCHEDULE2,paste(YEAR,MM,DD,sep="")) starts<-as.list(with(SCHEDULE2,tapply(ARRIVE,DATE,function(x)x[1]))) waits<-with(SCHEDULE2,tapply(WAIT,DATE,function(x)x)) travels<-with(SCHEDULE2,tapply(TRAVEL,DATE,function(x)x)) list(DEPARTURES= as.vector(mapply(function(...)addDelays(...)$departs,starts,waits,travels)), ARRIVALS= as.vector(mapply(function(...)addDelays(...)$arrives,starts,waits,travels))) } SCHEDULE2 <- structure(list(MM = c("05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "07", "07", "07", "07", "07", "07", "07", "07", "07", "07", "07", "07", "07", "07", "07", "07", "07", "07", "07", "07", "07", "07", "07", "07", "07", "07", "07", "07", "07", "07", "07", "07", "07", "07", "07", "07", "07", "07", "07", "07", "07", "07", "07", "07", "07", "07", "07", "07", "08", "08", "08", "08", "08", "! 08", "08", "08", "08", "08", "08", "08", "08", "08", "08", "08", "08", "08", "08", "08", "08", "08", "08", "08", "08", "08", "08", "08", "08", "08", "08", "08", "08", "08", "08", "08", "08", "08", "08", "0
Re: [R] Dataframe calculations
with the following code newvars()$ARRIVALS and newvars()$DEPARTURES will give you the new variables you need. -=-=-= addDelays <- function(arriveTime,waitVec,travelVec){ start<-as.POSIXct(arriveTime,format="%H:%M:%S") delays<-as.vector(t(cbind(waitVec,travelVec))) newtimes<-format(start+cumsum(delays)*60,format="%H:%M:%S") list(departs=c(arriveTime,(odds(newtimes))[-1]), arrives=evens(newtimes)) } odds <- function(inVec){ indvec<-0:(floor((length(inVec)-1)/2)) inVec[2*indvec+1] } evens <- function(inVec){ odds(inVec[-1]) } newvars <- function(){ DATE<-with(SCHEDULE2,paste(YEAR,MM,DD,sep="")) starts<-as.list(with(SCHEDULE2,tapply(ARRIVE,DATE,function(x)x[1]))) waits<-with(SCHEDULE2,tapply(WAIT,DATE,function(x)x)) travels<-with(SCHEDULE2,tapply(TRAVEL,DATE,function(x)x)) list(DEPARTURES= as.vector(mapply(function(...)addDelays(...)$departs,starts,waits,travels)), ARRIVALS= as.vector(mapply(function(...)addDelays(...)$arrives,starts,waits,travels))) } -- Erich Neuwirth, University of Vienna Faculty of Computer Science Computer Supported Didactics Working Group Visit our SunSITE at http://sunsite.univie.ac.at Phone: +43-1-4277-39464 Fax: +43-1-4277-39459 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Dataframe calculations
Hi everyone, My question will probably seem simple to most of you, but I have spent many hours trying to solve it. I need to perform a series of sequential calculations on my dataframe that move across rows and down columns, and then repeat themselves at each unique 'MM' by 'DD' grouping. Specifically, I want to add 'DEPART' time (24 hr time) to 'TRAVEL'(minutes) in line 1 and put the result in 'ARRIVE' (24 hr time) of line 2, then I want to add 'WAIT' (minutes) to that 'ARRIVE' time of line 2 to create 'DEPART', which will then be combined with 'TRAVEL' (minutes) to yield the 'ARRIVE' time of line 3, etc. This series of calc's will start anew beginning at each unique 'MM' by 'DD' grouping. Any advice would be greatly appreciated. Thank you, Mike SCHEDULE2 <- structure(list(MM = c("05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "05", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "07", "07", "07", "07", "07", "07", "07", "07", "07", "07", "07", "07", "07", "07", "07", "07", "07", "07", "07", "07", "07", "07", "07", "07", "07", "07", "07", "07", "07", "07", "07", "07", "07", "07", "07", "07", "07", "07", "07", "07", "07", "07", "07", "07", "07", "07", "07", "07", "08", "08", "08", "08", "08", "08", "08", "08", "08", "08", "08", "08", "08", "08", "08", "08", "08", "08", "08", "08", "08", "08", "08", "08", "08", "08", "08", "08", "08", "08", "08", "08", "08", "08", "08", "08", "08", "08", "08", "08", "08", "08", "08", "08", "08", "08", "08", "08", "08", "08", "08", "08", "08", "08", "08", "08", "09", "09", "09", "09", "09", "09", "09", "09", "09", "09", "09", "09", "09", "09", "09", "09", "09", "09", "09", "09", "09", "09", "09", "09", "09", "09", "09", "09", "09", "09", "09", "09", "09", "09", "09", "09", "09", "09", "09", "09", "09", "09", "09", "09", "09", "09", "09", "09", "09", "09", "09", "09", "10", "10", "10", "10", "10", "10", "10", "10", "10", "10", "10", "10", "10", "10", "10", "10", "10", "10", "10", "10", "10", "10", "10", "10", "10", "10", "10", "10", "10", "10", "10", "10", "10", "10", "10", "10", "10", "10", "10", "10", "10", "10", "10", "10", "10", "10", "10", "10", "10", "10", "10", "10"), DD = c("02", "02", "02", "02", "03", "03", "03", "03", "06", "06", "06", "06", "09", "09", "09", "09", "10", "10", "10", "10", "14", "14", "14", "14", "16", "16", "16", "16", "17", "17", "17", "17", "19", "19", "19", "19", "22", "22", "22", "22", "24", "24", "24", "24", "27", "27", "27", "27", "29", "29", "29", "29", "31", "31", "31", "31", "04", "04", "04", "04", "06", "06", "06", "06", "07", "07", "07", "07", "10", "10", "10", "10", "12", "12", "12", "12", "16", "16", "16", "16", "17", "17", "17", "17", "19", "19", "19", "19", "22", "22", "22", "22", "23", "23", "23", "23", "27", "27", "27", "27", "28", "28", "28", "28", "29", "29", "29", "29", "03", "03", "03", "03", "05", "05", "05", "05", "09", "09", "09", "09", "10", "10", "10", "10", "13", "13", "13", "13", "14", "14", "14", "14", "18", "18", "18", "18", "22", "22", "22", "22", "23", "23", "23", "23", "24", "24", "24", "24", "27", "27", "27", "27", "28", "28", "28", "28", "01", "01", "01", "01", "04", "04", "04", "04", "06", "06", "06", "06", "07", "07", "07", "07", "12", "12", "12", "12", "13", "13", "13", "13", "14", "14", "14", "14", "16", "16", "16", "16", "19", "19", "19", "19", "21", "21", "21", "21", "23", "23", "23", "23", "24", "24", "24", "24", "28", "28", "28", "28", "31", "31", "31", "31", "02", "02", "02", "02", "04", "04", "04", "04", "08", "08", "08", "08", "09", "09", "09", "09", "11", "11", "11", "11", "14", "14", "14", "14", "16", "16", "16", "16", "19", "19", "19", "19", "20", "20", "20", "20", "21", "21", "21", "21", "26", "26", "26", "26", "27", "27", "27", "27", "29", "29", "29", "29", "03", "03", "03", "03", "05", "05", "05", "05", "08", "08", "08", "08", "10", "10", "10", "10", "14", "14", "14", "14", "15", "15", "15", "15", "16", "16", "16", "16", "20", "20", "20", "20", "21", "21", "21", "21", "24", "24", "24", "24", "26", "26", "26", "26", "29", "29", "29", "29", "30", "30", "30", "30" ), YEAR = c("2010", "2010", "2010", "2010", "2010", "2010", "2010", "2010", "2010", "2010", "2010", "2010", "2010", "2010", "2010", "2010", "2010", "2010", "2010", "2010", "2010", "2010", "2010", "2010", "2010", "2010", "2010", "2010", "2010", "2010", "2010", "2010", "2010", "2010", "2010", "2010", "2010", "2010", "2010", "2010", "2010", "2010", "2010", "2010", "2010", "2010", "2010", "2010", "2010",