I got another 10% savings with this example by using only one subscripting adjustment. I also fixed a typo in my previous posting (which didn't affect the timing).
microbenchmark( rmh={ d3 <-data.frame(ID=rownames(d1), d1, test1=0, test2=0, test4=0, test5=0) myRowSubset <- d3$gender=="f" & d3$workshop==1 test1 <- 1 d3[myRowSubset, "test1"] <- test1 + 6 d3[myRowSubset, "test2"] <- test1 + 6 + 2 d3[myRowSubset, c("test4", "test5")] <- test1 }, rmh4={ d4 <- data.frame(ID=rownames(d1), d1, test1=0, test2=0, test4=0, test5=0) myRowSubset <- d4$gender=="f" & d4$workshop==1 test1 <- 1 d4[myRowSubset, c("test1", "test2", "test4", "test5")] <- matrix(test1 + c(6, 6+2, 0, 0), nrow=sum(myRowSubset), ncol=4, byrow=TRUE) } ) Unit: microseconds expr min lq mean median uq max neval cld rmh 956.187 1183.304 1538.012 1617.985 1865.149 2177.071 100 b rmh4 850.729 1042.997 1380.842 1416.476 1700.307 2448.545 100 a On Mon, Dec 17, 2018 at 12:49 PM Richard M. Heiberger <r...@temple.edu> wrote: > > this can be dome even faster, and I think more easily read, using only base R > > d1 <- data.frame(workshop=rep(1:2,4), > gender=rep(c("f","m"),each=4)) > > ## needed by vector and rowbased, not needed by rmh > library(tibble) > library(plyr) > library(magrittr) > > microbenchmark( > vector = {d1 %>% > rownames_to_column("ID") %>% > mutate( > test1 = ifelse(gender == "f" & workshop == 1, 7, 0), > test2 = ifelse(gender == "f" & workshop == 1, test1 + 2, 0), > test4 = ifelse(gender == "f" & workshop == 1, 1, 0), > test5 = test4 > ) }, > rowbased = {d1 %>% > rownames_to_column("ID") %>% > mutate(test1 = NA, test2 = NA, test4 = NA, test5 = NA) %>% > ddply("ID", > within, > if (gender == "f" & workshop == 1) { > test1 <- 1 > test1 <- 6 + test1 > test2 <- 2 + test1 > test4 <- 1 > test5 <- 1 > } else { > test1 <- test2 <- test4 <- test5 <- 0 > })}, > rmh={ > data.frame(ID=rownames(d1), > d1, > test1=0, > test2=0, > test4=0, > test5=0) > myRowSubset <- d3$gender=="f" & d3$workshop==1 > test1 <- 1 > d3[myRowSubset, "test1"] <- test1 + 6 > d3[myRowSubset, "test2"] <- test1 + 6 + 2 > d3[myRowSubset, c("test4", "test5")] <- test1 > } > ) > > Unit: microseconds > expr min lq mean median uq max neval cld > vector 1281.994 1468.102 1669.266 1573.043 1750.354 3171.777 100 a > rowbased 8131.230 8691.899 10894.700 9219.882 10435.642 133293.034 100 b > rmh 925.571 1056.530 1167.568 1116.425 1221.457 1968.199 100 a > On Mon, Dec 17, 2018 at 12:15 PM Thierry Onkelinx via R-help > <r-help@r-project.org> wrote: > > > > Dear Paul, > > > > R's power is that is works vectorised. Unlike SAS which is rowbased. Using > > R in a SAS way will lead to very slow code. > > > > Your examples can be written vectorised > > > > d1 %>% > > rownames_to_column("ID") %>% > > mutate( > > test1 = ifelse(gender == "f" & workshop == 1, 7, 0), > > test2 = ifelse(gender == "f" & workshop == 1, test1 + 2, 0), > > test4 = ifelse(gender == "f" & workshop == 1, 1, 0), > > test5 = test4 > > ) > > > > Here is a speed comparison. > > > > library(microbenchmark) > > microbenchmark( > > vector = {d1 %>% > > rownames_to_column("ID") %>% > > mutate( > > test1 = ifelse(gender == "f" & workshop == 1, 7, 0), > > test2 = ifelse(gender == "f" & workshop == 1, test1 + 2, 0), > > test4 = ifelse(gender == "f" & workshop == 1, 1, 0), > > test5 = test4 > > ) }, > > rowbased = {d1 %>% > > rownames_to_column("ID") %>% > > mutate(test1 = NA, test2 = NA, test4 = NA, test5 = NA) %>% > > ddply("ID", > > within, > > if (gender == "f" & workshop == 1) { > > test1 <- 1 > > test1 <- 6 + test1 > > test2 <- 2 + test1 > > test4 <- 1 > > test5 <- 1 > > } else { > > test1 <- test2 <- test4 <- test5 <- 0 > > })} > > ) > > > > > > Best regards, > > > > Thierry > > > > ir. Thierry Onkelinx > > Statisticus / Statistician > > > > Vlaamse Overheid / Government of Flanders > > INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND > > FOREST > > Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance > > thierry.onkel...@inbo.be > > Havenlaan 88 bus 73, 1000 Brussel > > www.inbo.be > > > > /////////////////////////////////////////////////////////////////////////////////////////// > > To call in the statistician after the experiment is done may be no more > > than asking him to perform a post-mortem examination: he may be able to say > > what the experiment died of. ~ Sir Ronald Aylmer Fisher > > The plural of anecdote is not data. ~ Roger Brinner > > The combination of some data and an aching desire for an answer does not > > ensure that a reasonable answer can be extracted from a given body of data. > > ~ John Tukey > > /////////////////////////////////////////////////////////////////////////////////////////// > > > > <https://www.inbo.be> > > > > > > Op ma 17 dec. 2018 om 16:30 schreef Paul Miller via R-help < > > r-help@r-project.org>: > > > > > Hello All, > > > > > > Season's greetings! > > > > > > Am trying to replicate some SAS code in R. The SAS code uses if-then-do > > > code blocks. I've been trying to do likewise in R as that seems to be the > > > most reliable way to get the same result. > > > > > > Below is some toy data and some code that does work. There are some things > > > I don't necessarily like about the code though. So I was hoping some > > > people > > > could help make it better. One thing I don't like is that the within > > > function reverses the order of the computed columns such that test1:test5 > > > becomes test5:test1. I've used a mutate to overcome that but would prefer > > > not to have to do so. > > > > > > Another, perhaps very small thing, is the need to calculate an ID > > > variable that becomes the basis for a grouping. > > > > > > I did considerable Internet searching for R code that conditionally > > > computes blocks of code. I didn't find much though and so am wondering if > > > my search terms were not sufficient or if there is some other reason. It > > > occurred to me that maybe if-then-do code blocks like we often see in SAS > > > as are frowned upon and therefore not much implemented. > > > > > > I'd be interested in seeing more R-compatible approaches if this is the > > > case. I've learned that it's a mistake to try and make R be like SAS. It's > > > better to let R be R. Trouble is I'm not always sure how to do that. > > > > > > Thanks, > > > > > > Paul > > > > > > > > > d1 <- data.frame(workshop=rep(1:2,4), > > > gender=rep(c("f","m"),each=4)) > > > > > > library(tibble) > > > library(plyr) > > > > > > d2 <- d1 %>% > > > rownames_to_column("ID") %>% > > > mutate(test1 = NA, test2 = NA, test4 = NA, test5 = NA) %>% > > > ddply("ID", > > > within, > > > if (gender == "f" & workshop == 1) { > > > test1 <- 1 > > > test1 <- 6 + test1 > > > test2 <- 2 + test1 > > > test4 <- 1 > > > test5 <- 1 > > > } else { > > > test1 <- test2 <- test4 <- test5 <- 0 > > > }) > > > > > > ______________________________________________ > > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > PLEASE do read the posting guide > > > http://www.R-project.org/posting-guide.html > > > and provide commented, minimal, self-contained, reproducible code. > > > > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.