Re: [R] Competing with SPSS and SAS: improving code that loops through rows (data manipulation)

Dimitri Liakhovitski Sat, 27 Mar 2010 05:45:26 -0700

Dear all, thank you so much for your advice, and special thanks to
you, Jim, for digging into my code (which was too long).
I'll dig into yours now - it definitely looks very fast - and it's a
lot of great learning for me. Because you can see - I am just a
rudimentary programmer.
Thank you very-very much!
Dimitri


On Fri, Mar 26, 2010 at 7:28 PM, Jim Price <price...@hotmail.com> wrote:
>
> Here's my first stab. It removes some of the typical redundencies in your
> code (loops, building data frames by adding one column at a time) and
> instead does what is probably more canonical R style (although I'm willing
> to be corrected, as I suspect my code is a little suspect at times).
>
> For this example, I got a 10-fold speed-up, although I suspect this code
> will scale a lot better - primarily because I'm not continually expanding
> the data frames one column at a time, but instead working each part out
> separately and then sticking them together at the end. The key commands used
> (for when you look through the help files) are lapply, do.call, by and
> Reduce.
>
> If you use this scaled up you'd need to play with some of the indices in
> places, but I'm sure that's all pretty obvious.
>
> Oh, and because this is the usual (and good!) advice - don't call your data
> 'data':
>
> library(fortunes)
> fortune('dog')
>
>
>
> # This was your base set-up code
> set.seed(123)
> data<-data.frame(group=c(rep("first",10),rep("second",10)),week=c(1:10,1:10),a=abs(round(rnorm(20)*10,0)),
> b=abs(round(rnorm(20)*100,0)))
> data
>
>
> # Set up the ratio variables
> system.time({
> temp <- cbind(data, do.call(cbind, lapply(names(data)[3:4], function(.x)
>        {
>                unlist(by(data, data$group, function(.y) .y[,.x] / 
> max(.y[,.x])))
>        })))
> colnames(temp)[5:6] <- paste(colnames(data)[3:4], 'ind.to.max', sep = '.')
> })
>
>
>
>
>
> system.time({
> constants <- expand.grid(vars = colnames(temp)[5:6], c1 = 1:3, c2 =
> seq(0.15, 0.45, 0.15))
>
>
> results <- lapply(seq(nrow(constants)), function(.x)
>        {
>                dat <- temp[, as.character(constants[.x, 1])]
>                d <- exp(1) ^ log(0.5) / constants[.x, 2]
>                l <- -10 * log(1 - constants[.x, 3])
>
>                unlist(by(dat, temp$group, function(.y)
>                        Reduce(function(.u, .v) 1 - ((1 - .u * d) / (exp(1) ^ 
> (.v * l))), .y,
> accumulate = T, init = 0)[-1]))
>        })
>
> final <- cbind(temp, do.call(cbind, results))
> colnames(final)[-(1:6)] <- paste(substr(constants$vars, 1, 1), constants$c1,
> 100*constants$c2, '..transf', sep = '.')
> })
>
>
>
>
>
> Jim Price.
> Cardiome Pharma Corp.
>
>
> --
> View this message in context: 
> http://n4.nabble.com/Competing-with-SPSS-and-SAS-improving-code-that-loops-through-rows-data-manipulation-tp1692848p1692967.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Dimitri Liakhovitski
Ninah.com
dimitri.liakhovit...@ninah.com

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Competing with SPSS and SAS: improving code that loops through rows (data manipulation)

Reply via email to