HI Jorge, I was able to reproduce the error. The link below provides a way to adjust the stack. I didn't test it.
http://stackoverflow.com/questions/14719349/error-c-stack-usage-is-too-close-to-the-limit Also check this link http://stackoverflow.com/questions/13245019/how-to-change-the-stack-size-using-ulimit-or-per-process-on-mac-os-x-for-a-c-or A.K. On Monday, June 30, 2014 10:08 PM, Jorge I Velez <jorgeivanve...@gmail.com> wrote: Hi Arun, Thank you very much for your suggestion. While running some tests, I came across the following: # sample data n <- 2000 p <- 1000 x2 <- data.frame(variable = rep(paste0('x', 1:p), each = n), id = rep(paste0('p', 1:p), n), outcome = sample(0:2, n*p, TRUE), rate = runif(n*p, 0.5, 1)) str(x2) library(dplyr) library(tidyr) # Arun's suggestion system.time({wide1 <- x2%>% select(-rate) %>% mutate(variable=factor(variable, levels=unique(variable)),id=factor(id, levels=unique(id))) %>% spread(variable,outcome) colnames(wide1)[-1] <- paste("outcome",colnames(wide1)[-1],sep=".") }) # Error: C stack usage 18920219 is too close to the limit # Timing stopped at: 13.833 0.251 14.085 Do you happen to know what can be done to avoid this? Thank you. Best, Jorge.- On Mon, Jun 30, 2014 at 6:51 PM, arun <smartpink...@yahoo.com> wrote: > >Hi Jorge, > >You may try: >library(dplyr) >library(tidyr) > >#Looks like this is faster than the other methods. >system.time({wide1 <- x2%>% > select(-rate) %>% > mutate(variable=factor(variable, >levels=unique(variable)),id=factor(id, levels=unique(id))) %>% > spread(variable,outcome) >colnames(wide1)[-1] <- paste("outcome",colnames(wide1)[-1],sep=".") >}) > > #user system elapsed > # 0.006 0.00 0.006 > > > >system.time(wide <- reshape(x2[, -4], v.names = "outcome", idvar = "id", > timevar = "variable", direction = "wide")) > #user system elapsed > # 0.169 0.000 0.169 > > > >system.time({ >sel <- unique(x2$variable) >id <- unique(x2$id) > >X <- matrix(NA, ncol = length(sel) + 1, nrow = length(id)) >X[, 1] <- id >colnames(X) <- c('id', sel) >r <- mclapply(seq_along(sel), function(i){ > out <- x2[x2$variable == sel[i], ][, 3] > }, mc.cores = 4) >X[, -1] <- do.call(rbind, r) >X >}) > ># user system elapsed ># 0.125 0.011 0.074 > > > wide2 <- wide1 >wide2$id <- as.character(wide2$id) > wide$id <- as.character(wide$id) >all.equal(wide, wide2, check.attributes=F) >#[1] TRUE > >A.K. > > > > >On Sunday, June 29, 2014 11:48 PM, Jorge I Velez <jorgeivanve...@gmail.com> >wrote: >Dear R-help, > >I am working with some data stored as "filename.txt.gz" in my working >directory. >After reading the data in using read.table(), I can see that each of them >has four columns (variable, id, outcome, and rate) and the following >structure: > ># sample data >x2 <- data.frame(variable = rep(paste0('x', 1:100), each = 100), id = >rep(paste0('p', 1:100), 100), outcome = sample(0:2, 10000, TRUE), rate = >runif(10000, 0.5, 1)) >str(x2) > >Each variable, i.e., x1, x2,..., x100 is repeated as many times as the >number of unique IDs (100 in this example). What I would like to do is to >transform the data above >in a long format. I can do this by using > ># reshape >wide <- reshape(x2[, -4], v.names = "outcome", idvar = "id", > timevar = "variable", direction = "wide") >str(wide) > ># or a "hack" with mclapply: > >require(parallel) >sel <- as.character(unique(x2$variable)) >id <- as.character(unique(x2$id)) >X <- matrix(NA, ncol = length(sel) + 1, nrow = length(id)) >X[, 1] <- id >colnames(X) <- c('id', sel) >r <- mclapply(seq_along(sel), function(i){ > out <- x2[x2$variable == sel[i], ][, 3] > }, mc.cores = 4) >X[, -1] <- do.call(rbind, r) >X > >However, I was wondering if it is possible to come up with another solution >, hopefully faster than these >. Unfortunately, either one of these takes a very long time to process, >specially when the number of variables is very large >(> 250,000) and the number of ids is ~2000. > >I would very much appreciate your suggestions. At the end of this message >is my sessionInfo(). > >Thank you very much in advance. > >Best regards, >Jorge Velez.- > > >R> sessionInfo() > >R version 3.0.2 Patched (2013-12-11 r64449) >Platform: x86_64-apple-darwin10.8.0 (64-bit) > >locale: >[1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8 > >attached base packages: >[1] graphics grDevices utils datasets parallel compiler stats >[8] methods base > >other attached packages: >[1] knitr_1.6.3 ggplot2_1.0.0 slidifyLibraries_0.3.1 >[4] slidify_0.3.52 > >loaded via a namespace (and not attached): >[1] colorspace_1.2-4 digest_0.6.4 evaluate_0.5.5 formatR_0.10 >[5] grid_3.0.2 gtable_0.1.2 markdown_0.7.1 MASS_7.3-33 >[9] munsell_0.4.2 plyr_1.8.1 proto_0.3-10 Rcpp_0.11.2 >[13] reshape2_1.4 scales_0.2.4 stringr_0.6.2 tools_3.0.2 >[17] whisker_0.4 yaml_2.1.13 > > > [[alternative HTML version deleted]] > >______________________________________________ >R-help@r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.