Short: get rid of the loops I use and optimize runtime

Dear all,

I want to calculate for each row the amount of the month ago. I use a matrix 
with 2100 rows and 22 colums (which is still a very small matrix. nrows of 
other matrixes can easily be more then 100000)

Table before
Year  month quarter yearmonth Service ...  Amount
2009  9        Q3            092009          A                ...    120
2009  9        Q3            092009          B                 ...     80
2009  8        Q3           082009           A                  ...     40
2009  7        Q3           072009           A                   ...      50

The result I want
Year month  quarter yearmonth Service ...    Amount   amound_lastmonth
2009 9           Q3          092009              A            ...    120        
 40
2009 9           Q3          092009              B            ...    80         
  ...
2009 8           Q3          082009              A           ...    40          
  50
2009 7           Q3          072009              A         ...     50           
  ...

Table is not exactly the same but gives a good idea what I have and what I want

The code I have written (see below) does what I want but it is very very slow. 
It takes 129s for 400 rows. And the time gets four times higher each time I 
double the amount of rows.
I'm new in programming in R, but I found that you can use Rprof and 
summaryRprof to analyse your code (output see below)
But I don't really understand the output
I guess I need code that requires linear time and need to get rid of the 2 for 
loops.
can someone help me or tell me what else I can do to optimize my runtime

I use R 2.9.2
windows Xp service pack3

Thank you in advance

Best regards,

Willems Ian


*****************************
dataset[,5]= month
dataset[,3]= year
dataset[,22]= amount
dataset[,14]= servicetype

[CODE]
#for each row of the matrix check if each row has..
> for (j in 1:Number_rows) {
+ sum<-0
+ for(i in 1:Number_rows){
+ if (dataset[j,14]== dataset[i,14]) #..the same service type
+   {if (dataset[j,18]== dataset[i,18]) # .. the same department
+        {if (dataset[j,5]== "1")  # if month=1, month ago is 12 and year is -1
+           {if ("12"== dataset[i,5])
+            {if ((dataset[j,3]-1)== dataset[i,3])
+
+         { sum<-sum + dataset[i,22]}
+      }}
+      else {
+       if ((dataset[j,5]-1)== dataset[i,5]) " if month != 1, month ago is 
month -1
+         { if (dataset[j,3]== dataset[i,3])
+         {sum<-sum + dataset[i,22]}
+      }}}}}}

[\Code]

> summaryRprof()
$by.self
               self.time self.pct total.time total.pct
[.data.frame       33.92  26.2    80.90      62.5
NextMethod         12.68  9.8     12.68       9.8
[.factor            8.60  6.6      18.36      14.2
Ops.factor          8.10  6.3      40.08      31.0
sort.int            6.82  5.3      13.70      10.6
[                   6.70  5.2      85.44      66.0
names               6.54  5.1       6.54       5.1
length              5.66  4.4       5.66       4.4
==                  5.04  3.9      44.92      34.7
levels              4.80  3.7       5.56       4.3
is.na               4.24  3.3       4.24       3.3
dim                 3.66  2.8       3.66       2.8
switch              3.60  2.8       3.80       2.9
vector              2.68  2.1       8.02       6.2
inherits            1.90  1.5       1.90       1.5
any                 1.68  1.3       1.68       1.3
noNA.levels         1.46  1.1       7.84       6.1
.Call               1.40  1.1       1.40       1.1
!                   1.26  1.0       1.26       1.0
attr<-              1.06  0.8       1.06       0.8
.subset             1.00  0.8       1.00       0.8
class<-             0.82  0.6       0.82       0.6
!=                  0.80  0.6       0.80       0.6
levels.default      0.68  0.5       0.76       0.6
all                 0.62  0.5       0.62       0.5
<                   0.54  0.4       0.54       0.4
-                   0.48  0.4       0.48       0.4
is.factor           0.44  0.3       2.34       1.8
.subset2            0.38  0.3       0.38       0.3
attr                0.36  0.3       0.36       0.3
is.character        0.28  0.2       0.28       0.2
is.null             0.28  0.2       0.28       0.2
|                   0.26  0.2       0.26       0.2
oldClass<-          0.20  0.2       0.20       0.2
is.atomic           0.16  0.1       0.16       0.1
nzchar              0.10  0.1       0.10       0.1
is.numeric          0.06  0.0       0.06       0.0
oldClass            0.06  0.0       0.06       0.0
(                   0.04  0.0       0.04       0.0
[.data              0.02  0.0       0.02       0.0

$by.total
               total.time total.pct self.time self.pct
[                   85.44  66.0      6.70      5.2
[.data.frame        80.90  62.5     33.92     26.2
==                  44.92  34.7      5.04      3.9
Ops.factor          40.08  31.0      8.10      6.3
[.factor            18.36  14.2      8.60      6.6
sort.int            13.70  10.6      6.82      5.3
NextMethod          12.68  9.8     12.68      9.8
vector               8.02  6.2      2.68      2.1
noNA.levels          7.84  6.1      1.46      1.1
names                6.54  5.1      6.54      5.1
length               5.66  4.4      5.66      4.4
levels               5.56  4.3      4.80      3.7
is.na                4.24  3.3      4.24      3.3
switch               3.80  2.9      3.60      2.8
dim                  3.66  2.8      3.66      2.8
is.factor            2.34  1.8      0.44      0.3
inherits             1.90  1.5      1.90      1.5
any                  1.68  1.3      1.68      1.3
.Call                1.40  1.1      1.40      1.1
!                    1.26  1.0      1.26      1.0
attr<-               1.06  0.8      1.06      0.8
.subset              1.00  0.8      1.00      0.8
class<-              0.82  0.6      0.82      0.6
!=                   0.80  0.6      0.80      0.6
levels.default       0.76  0.6      0.68      0.5
all                  0.62  0.5      0.62      0.5
<                    0.54  0.4      0.54      0.4
-                    0.48  0.4      0.48      0.4
.subset2             0.38  0.3      0.38      0.3
attr                 0.36  0.3      0.36      0.3
is.character         0.28  0.2      0.28      0.2
is.null              0.28  0.2      0.28      0.2
|                    0.26  0.2      0.26      0.2
oldClass<-           0.20  0.2      0.20      0.2
is.atomic            0.16  0.1      0.16      0.1
nzchar               0.10  0.1      0.10      0.1
is.numeric           0.06  0.0      0.06      0.0
oldClass             0.06  0.0      0.06      0.0
(                    0.04  0.0      0.04      0.0
[.data               0.02  0.0      0.02      0.0

$sampling.time
[1] 129.38

Warning message:
In readLines(filename, n = chunksize) :
  incomplete final line found on 'Rprof.out'

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to