Hi Ian, first of all, take a look at the functions sapply, mapply, lapply, tapply, ... : they are the more efficient way of implementing loops.
Second, could you elaborate a bit further on the data set : the amount of the month ago, is that one value from another row, or the sum of all values in the previous month? I saw in your example dataset that the last month has 2 rows, but couldn't figure out whether that's a typo or really means something. That's necessary information to optimize your code. 129s is indeed far too long for a simple action. Cheers Joris On Mon, Oct 19, 2009 at 3:49 PM, Ian Willems <ian.will...@uz.kuleuven.ac.be> wrote: > Short: get rid of the loops I use and optimize runtime > > Dear all, > > I want to calculate for each row the amount of the month ago. I use a matrix > with 2100 rows and 22 colums (which is still a very small matrix. nrows of > other matrixes can easily be more then 100000) > > Table before > Year month quarter yearmonth Service ... Amount > 2009 9 Q3 092009 A ... 120 > 2009 9 Q3 092009 B ... 80 > 2009 8 Q3 082009 A ... 40 > 2009 7 Q3 072009 A ... 50 > > The result I want > Year month quarter yearmonth Service ... Amount amound_lastmonth > 2009 9 Q3 092009 A ... 120 > 40 > 2009 9 Q3 092009 B ... 80 > ... > 2009 8 Q3 082009 A ... 40 > 50 > 2009 7 Q3 072009 A ... 50 > ... > > Table is not exactly the same but gives a good idea what I have and what I > want > > The code I have written (see below) does what I want but it is very very > slow. It takes 129s for 400 rows. And the time gets four times higher each > time I double the amount of rows. > I'm new in programming in R, but I found that you can use Rprof and > summaryRprof to analyse your code (output see below) > But I don't really understand the output > I guess I need code that requires linear time and need to get rid of the 2 > for loops. > can someone help me or tell me what else I can do to optimize my runtime > > I use R 2.9.2 > windows Xp service pack3 > > Thank you in advance > > Best regards, > > Willems Ian > > > ***************************** > dataset[,5]= month > dataset[,3]= year > dataset[,22]= amount > dataset[,14]= servicetype > > [CODE] > #for each row of the matrix check if each row has.. >> for (j in 1:Number_rows) { > + sum<-0 > + for(i in 1:Number_rows){ > + if (dataset[j,14]== dataset[i,14]) #..the same service type > + {if (dataset[j,18]== dataset[i,18]) # .. the same department > + {if (dataset[j,5]== "1") # if month=1, month ago is 12 and year is > -1 > + {if ("12"== dataset[i,5]) > + {if ((dataset[j,3]-1)== dataset[i,3]) > + > + { sum<-sum + dataset[i,22]} > + }} > + else { > + if ((dataset[j,5]-1)== dataset[i,5]) " if month != 1, month ago is > month -1 > + { if (dataset[j,3]== dataset[i,3]) > + {sum<-sum + dataset[i,22]} > + }}}}}} > > [\Code] > >> summaryRprof() > $by.self > self.time self.pct total.time total.pct > [.data.frame 33.92 26.2 80.90 62.5 > NextMethod 12.68 9.8 12.68 9.8 > [.factor 8.60 6.6 18.36 14.2 > Ops.factor 8.10 6.3 40.08 31.0 > sort.int 6.82 5.3 13.70 10.6 > [ 6.70 5.2 85.44 66.0 > names 6.54 5.1 6.54 5.1 > length 5.66 4.4 5.66 4.4 > == 5.04 3.9 44.92 34.7 > levels 4.80 3.7 5.56 4.3 > is.na 4.24 3.3 4.24 3.3 > dim 3.66 2.8 3.66 2.8 > switch 3.60 2.8 3.80 2.9 > vector 2.68 2.1 8.02 6.2 > inherits 1.90 1.5 1.90 1.5 > any 1.68 1.3 1.68 1.3 > noNA.levels 1.46 1.1 7.84 6.1 > .Call 1.40 1.1 1.40 1.1 > ! 1.26 1.0 1.26 1.0 > attr<- 1.06 0.8 1.06 0.8 > .subset 1.00 0.8 1.00 0.8 > class<- 0.82 0.6 0.82 0.6 > != 0.80 0.6 0.80 0.6 > levels.default 0.68 0.5 0.76 0.6 > all 0.62 0.5 0.62 0.5 > < 0.54 0.4 0.54 0.4 > - 0.48 0.4 0.48 0.4 > is.factor 0.44 0.3 2.34 1.8 > .subset2 0.38 0.3 0.38 0.3 > attr 0.36 0.3 0.36 0.3 > is.character 0.28 0.2 0.28 0.2 > is.null 0.28 0.2 0.28 0.2 > | 0.26 0.2 0.26 0.2 > oldClass<- 0.20 0.2 0.20 0.2 > is.atomic 0.16 0.1 0.16 0.1 > nzchar 0.10 0.1 0.10 0.1 > is.numeric 0.06 0.0 0.06 0.0 > oldClass 0.06 0.0 0.06 0.0 > ( 0.04 0.0 0.04 0.0 > [.data 0.02 0.0 0.02 0.0 > > $by.total > total.time total.pct self.time self.pct > [ 85.44 66.0 6.70 5.2 > [.data.frame 80.90 62.5 33.92 26.2 > == 44.92 34.7 5.04 3.9 > Ops.factor 40.08 31.0 8.10 6.3 > [.factor 18.36 14.2 8.60 6.6 > sort.int 13.70 10.6 6.82 5.3 > NextMethod 12.68 9.8 12.68 9.8 > vector 8.02 6.2 2.68 2.1 > noNA.levels 7.84 6.1 1.46 1.1 > names 6.54 5.1 6.54 5.1 > length 5.66 4.4 5.66 4.4 > levels 5.56 4.3 4.80 3.7 > is.na 4.24 3.3 4.24 3.3 > switch 3.80 2.9 3.60 2.8 > dim 3.66 2.8 3.66 2.8 > is.factor 2.34 1.8 0.44 0.3 > inherits 1.90 1.5 1.90 1.5 > any 1.68 1.3 1.68 1.3 > .Call 1.40 1.1 1.40 1.1 > ! 1.26 1.0 1.26 1.0 > attr<- 1.06 0.8 1.06 0.8 > .subset 1.00 0.8 1.00 0.8 > class<- 0.82 0.6 0.82 0.6 > != 0.80 0.6 0.80 0.6 > levels.default 0.76 0.6 0.68 0.5 > all 0.62 0.5 0.62 0.5 > < 0.54 0.4 0.54 0.4 > - 0.48 0.4 0.48 0.4 > .subset2 0.38 0.3 0.38 0.3 > attr 0.36 0.3 0.36 0.3 > is.character 0.28 0.2 0.28 0.2 > is.null 0.28 0.2 0.28 0.2 > | 0.26 0.2 0.26 0.2 > oldClass<- 0.20 0.2 0.20 0.2 > is.atomic 0.16 0.1 0.16 0.1 > nzchar 0.10 0.1 0.10 0.1 > is.numeric 0.06 0.0 0.06 0.0 > oldClass 0.06 0.0 0.06 0.0 > ( 0.04 0.0 0.04 0.0 > [.data 0.02 0.0 0.02 0.0 > > $sampling.time > [1] 129.38 > > Warning message: > In readLines(filename, n = chunksize) : > incomplete final line found on 'Rprof.out' > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.