Here is the solution using pmin/pmax for 10,000 rows. > min_pctle_cut <- 0.01 > max_pctle_cut <- 0.99 > library(outliers) > > n <- 10000 > x1 <- runif(n) > x2 <- runif(n) > x3 <- x1 + x2 + runif(n)/10 > x4 <- x1 + x2 + x3 + runif(n)/10 > x5 <- factor(sample(c('a','b','c'),n,replace=TRUE)) > x6 <- factor(1*(x5=='a' | x5=='c')) > data1 <- cbind(x1,x2,x3,x4,x5,x6) > x <- data.frame(data1) > > z <- x[,sapply(x,is.numeric)] > zNew <- z # save for 2nd test > > qs <- sapply(z, function(z) quantile(z, + c(min_pctle_cut, max_pctle_cut), na.rm = TRUE)) > > > #Loop below taking time for execution > > system.time(for (i in 1:ncol(z)) + { + for (j in 1:nrow(z)) + { + if (z[j,i] < qs[1,i]) z[j,i]=qs[1,i] + if (z[j,i] > qs[2,i]) z[j,i]=qs[2,i] + } + }) user system elapsed 6.64 0.00 7.76 > > system.time({ + for (i in 1:ncol(z)) zNew[[i]] <- pmax(qs[1,i], pmin(qs[2,i], z[[i]])) + }) user system elapsed 0.02 0.00 0.00 > > all(z == zNew) # are they the same? [1] TRUE >
On Tue, Nov 22, 2011 at 6:24 AM, Jim Holtman <jholt...@gmail.com> wrote: > You can easily vectorize this code using pmin/pmax. > > Sent from my iPad > > On Nov 22, 2011, at 1:06, Aher <ajit.a...@cedar-consulting.com> wrote: > >> Hi Experts, >> >> I am new to R, using following sample code for capping outliers using >> percentile information. Working on large data (30000 observations and 150 >> variables), loop I am using in the below mentioned code for detecting >> outliers and capping to upper /lower percentile value is taking much time >> for the execution. >> Is there anything wrong with code, can anyone suggest improvement in the >> script to enhance performance! >> min_pctle_cut <- 0.01 >> max_pctle_cut <- 0.99 >> library(outliers) >> >> n <- 100 >> x1 <- runif(n) >> x2 <- runif(n) >> x3 <- x1 + x2 + runif(n)/10 >> x4 <- x1 + x2 + x3 + runif(n)/10 >> x5 <- factor(sample(c('a','b','c'),n,replace=TRUE)) >> x6 <- factor(1*(x5=='a' | x5=='c')) >> data1 <- cbind(x1,x2,x3,x4,x5,x6) >> x <- data.frame(data1) >> >> z <- x[,sapply(x,is.numeric)] >> >> qs <- sapply(z, function(z) quantile(z, >> c(min_pctle_cut, max_pctle_cut), na.rm = TRUE)) >> >> >> #Loop below taking time for execution >> >> system.time(for (i in 1:ncol(z)) >> { >> for (j in 1:nrow(z)) >> { >> if (z[j,i] < qs[1,i]) z[j,i]=qs[1,i] >> if (z[j,i] > qs[2,i]) z[j,i]=qs[2,i] >> } >> }) >> >> >> >> -- >> View this message in context: >> http://r.789695.n4.nabble.com/Capping-outliers-tp4094647p4094647.html >> Sent from the R help mailing list archive at Nabble.com. >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.