Thanks for your advice Jim! I tried Rprof but since the code just freezes the system, I am not able to get results so far as I had to close R after waiting for a long time. I am confused that the same code would work differently on the same system.
I tried out foreach package as well but didnt notice significant improvement. Is it that my code is not efficient or there is sth wrong or sth has changed with my system? Thanks! On Fri, Oct 18, 2013 at 7:14 AM, jim holtman <jholt...@gmail.com> wrote: > You might want to use the profiler (Rprof) on a subset of your code to > see where time is being spent. Find a subet that runs for a minute, > or so, and enable profiling for the test. Take a look and see which > functions are taking the time. This will be a start. You can also > watch the task monitor while the application is running to see how > fast it is using the CPU and memory. If you are going around a loop a > number of times, you can put some monitoring 'cat' statements that > will periodically print out the memory and CPU used. So these are > some of the techniques to start looking at things in your program. > Also data.frames are very costly to 'index' into. You might want to > consider converting to a matrix (where possible since all columns have > to have the same mode). This can provide significant improvement. > This is something that you will be able to see when you use the > profiling tool since it will probably show a lot of time in the > functions that handle dataframes. > > Jim Holtman > Data Munger Guru > > What is the problem that you are trying to solve? > Tell me what you want to do, not how you want to do it. > > > On Fri, Oct 18, 2013 at 9:23 AM, Ye Lin <ye...@lbl.gov> wrote: > > Thanks for your help David! > > > > I was running the same code the other day and it worked fine although it > > took a while as well. You are right that dff shud be df1 and maybe it's a > > portion of my data so it have an error of length =0. > > > > About CPU usage, I got it by clicking ctrl+alt+delete and it showed CPU > > usage is really high. Is there anyway to figure out why R is taxing my > > system? > > > > Thanks! > > > > Ye > > > > On Thursday, October 17, 2013, David Winsemius wrote: > > > >> > >> On Oct 17, 2013, at 2:56 PM, Ye Lin wrote: > >> > >> > Hey R professionals, > >> > > >> > I have a large dataset and I want to run a loop on it basically > creating > >> a > >> > new column which gathers information from another reference table. > >> > > >> > When I run the code, R just freezes and even does not response after > >> 30min > >> > which is really unusual. I tried sapply as well but does not improve > at > >> > all. > >> > > >> > I am running R 3.0.2 on Windows 7. I checked the system, when I run > the > >> > code, my CPU usage is about 25%-30% that is taxing my desktop. > >> > >> A guess: It's not your CPU use ... it's your RAM use. You've probably > >> exhausted your RAM and your system has paged out to virutla memory > >> > > >> > Here is my code: > >> > > >> > #df1 is the data set I want to add a new column# > >> > #b is the reference tabel# > >> > > >> > for (i in (1:nrow(df1))) { > >> > begin=which(b$Time2==df1$start[i] & b$Date==df1$Date[i]) > >> > date=unlist(strsplit(as.character(dff$end[i])," "))[1] > >> > end=ifelse(date=="2013-10-17", > >> > which(b$Time2==df1$end[i] & b$Date==df1$Date[i]), > >> > which(b$Time2==df1$end[i]-3600*24 & b$Date==as.Date(df1$Date[i])+1)) > >> > df1$new[i] <- sum(b[begin:end,]$Power) > >> > } > >> > > >> > >> I get: > >> Error in strsplit(as.character(dff$end[i]), " ") : object 'dff' not > found > >> > >> If I change the dff to df1, I get: > >> Error in begin:end : argument of length 0 > >> > >> -- > >> David. > >> > And here is a mimic sample of df1 & b: > >> > > >> > df1 <- structure(list(Date = structure(c(1369699200, 1369699200, > >> > 1369699200, > >> > 1369699200, 1369699200), tzone = "UTC", class = c("POSIXct", > >> > "POSIXt")), start = structure(c(1381991205, 1381990247, 1382010454, > >> > 1382007281, 1381992288), tzone = "UTC", class = c("POSIXct", > >> > "POSIXt")), end = structure(c(1381992405, 1381993727, 1382010694, > >> > 1382007461, 1381992468), tzone = "UTC", class = c("POSIXct", > >> > "POSIXt"))), .Names = c("Date", "start", "end"), row.names = c(NA, > >> > -5L), class = "data.frame") > >> > > >> > > >> > b <- structure(list(Date = structure(c(1369699200, 1369699200, > >> 1369699200, > >> > 1369699200, 1369699200, 1369699200, 1369699200, 1369699200, > 1369699200, > >> > 1369699200, 1369699200, 1369699200, 1369699200, 1369699200, > 1369699200, > >> > 1369699200, 1369699200, 1369699200, 1369699200, 1369699200, > 1369699200, > >> > 1369699200, 1369699200, 1369699200, 1369699200, 1369699200, > 1369699200, > >> > 1369699200, 1369699200, 1369699200, 1369699200, 1369699200, > 1369699200, > >> > 1369699200, 1369699200, 1369699200, 1369699200, 1369699200, > 1369699200, > >> > 1369699200, 1369699200, 1369699200, 1369699200, 1369699200, > 1369699200, > >> > 1369699200, 1369699200, 1369699200, 1369699200, 1369699200), tzone = > >> "UTC", > >> > class = c("POSIXct", > >> > "POSIXt")), Time2 = structure(c(1381989634, 1381989694, 1381989754, > >> > 1381989814, 1381989874, 1381989934, 1381989994, 1381990054, > 1381990114, > >> > 1381990174, 1381990234, 1381990294, 1381990354, 1381990414, > 1381990474, > >> > 1381990534, 1381990594, 1381990654, 1381990714, 1381990774, > 1381990834, > >> > 1381990894, 1381990954, 1381991014, 1381991074, 1381991134, > 1381991194, > >> > 1381991254, 1381991314, 1381991374, 1381991434, 1381991494, > 1381991554, > >> > 1381991614, 1381991674, 1381991734, 1381991794, 1381991854, > 1381991914, > >> > 1381991974, 1381992034, 1381992094, 1381992154, 1381992214, > 1381992274, > >> > 1381992334, 1381992394, 1381992454, 1381992514, 1381992574), tzone = > >> "UTC", > >> > class = c("POSIXct", > >> > "POSIXt")), Power = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, > >> > 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, > >> > 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, > >> > 45, 46, 47, 48, 49, 50)), .Names = c("Date", "Time2", "Power" > >> > ), row.names = c(NA, -50L), class = "data.frame") > >> > > >> > Thanks for your help! > >> > > >> > [[alternative HTML version deleted]] > >> > > >> > ______________________________________________ > >> > R-help@r-project.org <javascript:;> mailing list > >> > https://stat.ethz.ch/mailman/listinfo/r-help > >> > PLEASE do read the posting guide > >> http://www.R-project.org/posting-guide.html > >> > and provide commented, minimal, self-contained, reproducible code. > >> > >> David Winsemius > >> Alameda, CA, USA > >> > >> > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.