You might want to use the profiler (Rprof) on a subset of your code to see where time is being spent. Find a subet that runs for a minute, or so, and enable profiling for the test. Take a look and see which functions are taking the time. This will be a start. You can also watch the task monitor while the application is running to see how fast it is using the CPU and memory. If you are going around a loop a number of times, you can put some monitoring 'cat' statements that will periodically print out the memory and CPU used. So these are some of the techniques to start looking at things in your program. Also data.frames are very costly to 'index' into. You might want to consider converting to a matrix (where possible since all columns have to have the same mode). This can provide significant improvement. This is something that you will be able to see when you use the profiling tool since it will probably show a lot of time in the functions that handle dataframes.
Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. On Fri, Oct 18, 2013 at 9:23 AM, Ye Lin <ye...@lbl.gov> wrote: > Thanks for your help David! > > I was running the same code the other day and it worked fine although it > took a while as well. You are right that dff shud be df1 and maybe it's a > portion of my data so it have an error of length =0. > > About CPU usage, I got it by clicking ctrl+alt+delete and it showed CPU > usage is really high. Is there anyway to figure out why R is taxing my > system? > > Thanks! > > Ye > > On Thursday, October 17, 2013, David Winsemius wrote: > >> >> On Oct 17, 2013, at 2:56 PM, Ye Lin wrote: >> >> > Hey R professionals, >> > >> > I have a large dataset and I want to run a loop on it basically creating >> a >> > new column which gathers information from another reference table. >> > >> > When I run the code, R just freezes and even does not response after >> 30min >> > which is really unusual. I tried sapply as well but does not improve at >> > all. >> > >> > I am running R 3.0.2 on Windows 7. I checked the system, when I run the >> > code, my CPU usage is about 25%-30% that is taxing my desktop. >> >> A guess: It's not your CPU use ... it's your RAM use. You've probably >> exhausted your RAM and your system has paged out to virutla memory >> > >> > Here is my code: >> > >> > #df1 is the data set I want to add a new column# >> > #b is the reference tabel# >> > >> > for (i in (1:nrow(df1))) { >> > begin=which(b$Time2==df1$start[i] & b$Date==df1$Date[i]) >> > date=unlist(strsplit(as.character(dff$end[i])," "))[1] >> > end=ifelse(date=="2013-10-17", >> > which(b$Time2==df1$end[i] & b$Date==df1$Date[i]), >> > which(b$Time2==df1$end[i]-3600*24 & b$Date==as.Date(df1$Date[i])+1)) >> > df1$new[i] <- sum(b[begin:end,]$Power) >> > } >> > >> >> I get: >> Error in strsplit(as.character(dff$end[i]), " ") : object 'dff' not found >> >> If I change the dff to df1, I get: >> Error in begin:end : argument of length 0 >> >> -- >> David. >> > And here is a mimic sample of df1 & b: >> > >> > df1 <- structure(list(Date = structure(c(1369699200, 1369699200, >> > 1369699200, >> > 1369699200, 1369699200), tzone = "UTC", class = c("POSIXct", >> > "POSIXt")), start = structure(c(1381991205, 1381990247, 1382010454, >> > 1382007281, 1381992288), tzone = "UTC", class = c("POSIXct", >> > "POSIXt")), end = structure(c(1381992405, 1381993727, 1382010694, >> > 1382007461, 1381992468), tzone = "UTC", class = c("POSIXct", >> > "POSIXt"))), .Names = c("Date", "start", "end"), row.names = c(NA, >> > -5L), class = "data.frame") >> > >> > >> > b <- structure(list(Date = structure(c(1369699200, 1369699200, >> 1369699200, >> > 1369699200, 1369699200, 1369699200, 1369699200, 1369699200, 1369699200, >> > 1369699200, 1369699200, 1369699200, 1369699200, 1369699200, 1369699200, >> > 1369699200, 1369699200, 1369699200, 1369699200, 1369699200, 1369699200, >> > 1369699200, 1369699200, 1369699200, 1369699200, 1369699200, 1369699200, >> > 1369699200, 1369699200, 1369699200, 1369699200, 1369699200, 1369699200, >> > 1369699200, 1369699200, 1369699200, 1369699200, 1369699200, 1369699200, >> > 1369699200, 1369699200, 1369699200, 1369699200, 1369699200, 1369699200, >> > 1369699200, 1369699200, 1369699200, 1369699200, 1369699200), tzone = >> "UTC", >> > class = c("POSIXct", >> > "POSIXt")), Time2 = structure(c(1381989634, 1381989694, 1381989754, >> > 1381989814, 1381989874, 1381989934, 1381989994, 1381990054, 1381990114, >> > 1381990174, 1381990234, 1381990294, 1381990354, 1381990414, 1381990474, >> > 1381990534, 1381990594, 1381990654, 1381990714, 1381990774, 1381990834, >> > 1381990894, 1381990954, 1381991014, 1381991074, 1381991134, 1381991194, >> > 1381991254, 1381991314, 1381991374, 1381991434, 1381991494, 1381991554, >> > 1381991614, 1381991674, 1381991734, 1381991794, 1381991854, 1381991914, >> > 1381991974, 1381992034, 1381992094, 1381992154, 1381992214, 1381992274, >> > 1381992334, 1381992394, 1381992454, 1381992514, 1381992574), tzone = >> "UTC", >> > class = c("POSIXct", >> > "POSIXt")), Power = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, >> > 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, >> > 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, >> > 45, 46, 47, 48, 49, 50)), .Names = c("Date", "Time2", "Power" >> > ), row.names = c(NA, -50L), class = "data.frame") >> > >> > Thanks for your help! >> > >> > [[alternative HTML version deleted]] >> > >> > ______________________________________________ >> > R-help@r-project.org <javascript:;> mailing list >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. >> >> David Winsemius >> Alameda, CA, USA >> >> > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.