Re: [R] Replacing for loop with tapply!?

2005-06-12 Thread Sander Oom
 year. 
Unfortunately, the code breaks down (when uncommenting mat-NA).

I have tried 'ifelse' statements in the functions, but it becomes even 
more of a mess. I could subset the matrix before hand, but this would 
mean merging with a complete matrix afterwards to make it compatible 
with other years. That would slow things down.

How can I make the code robust for rows containing all missing values?

Thanks for your help,

Sander.

Dimitris Rizopoulos wrote:
for the maximum you could use something like:

ind[, 1] - apply(mat, 2, max)

I hope it helps.

Best,
Dimitris


Dimitris Rizopoulos
Ph.D. Student
Biostatistical Centre
School of Public Health
Catholic University of Leuven

Address: Kapucijnenvoer 35, Leuven, Belgium
Tel: +32/16/336899
Fax: +32/16/337015
Web: http://www.med.kuleuven.ac.be/biostat/
 http://www.student.kuleuven.ac.be/~m0390867/dimitris.htm



- Original Message - 
From: Sander Oom [EMAIL PROTECTED]
To: Dimitris Rizopoulos [EMAIL PROTECTED]
Cc: r-help@stat.math.ethz.ch
Sent: Friday, June 10, 2005 12:10 PM
Subject: Re: [R] Replacing for loop with tapply!?


Thanks Dimitris,

Very impressive! Much faster than before.

Thanks to new found R.basic, I can simply rotate the result with
rotate270{R.basic}:

mat - matrix(sample(-15:50, 365 * 15000, TRUE), 365, 15000)
temps - c(37, 39, 41)
#
#ind - matrix(0, length(temps), ncol(mat))
ind - matrix(0, 4, ncol(mat))
(startDate - date())
[1] Fri Jun 10 12:08:01 2005
for(i in seq(along = temps)) ind[i, ] - colSums(mat  temps[i])
ind[4, ] - colMeans(max(mat))
Error in colMeans(max(mat)) : 'x' must be an array of at least two
dimensions
(endDate - date())
[1] Fri Jun 10 12:08:02 2005
ind - rotate270(ind)
ind[1:10,]
  V4 V3 V2 V1
1   0 56 75 80
2   0 46 53 60
3   0 50 58 67
4   0 60 72 80
5   0 59 68 76
6   0 55 67 74
7   0 62 77 93
8   0 45 57 67
9   0 57 68 75
10  0 61 66 76

However, I have not managed to get the row maximum using your 
method? It
should be 50 for most rows, but my first guess code gives an error!

Any suggestions?

Sander



Dimitris Rizopoulos wrote:
maybe you are looking for something along these lines:

mat - matrix(sample(-15:50, 365 * 15000, TRUE), 365, 15000)
temps - c(37, 39, 41)
#
ind - matrix(0, length(temps), ncol(mat))
for(i in seq(along = temps)) ind[i, ] - colSums(mat  temps[i])
ind


I hope it helps.

Best,
Dimitris


Dimitris Rizopoulos
Ph.D. Student
Biostatistical Centre
School of Public Health
Catholic University of Leuven

Address: Kapucijnenvoer 35, Leuven, Belgium
Tel: +32/16/336899
Fax: +32/16/337015
Web: http://www.med.kuleuven.ac.be/biostat/
http://www.student.kuleuven.ac.be/~m0390867/dimitris.htm


- Original Message - 
From: Sander Oom [EMAIL PROTECTED]
To: r-help@stat.math.ethz.ch
Sent: Friday, June 10, 2005 10:50 AM
Subject: [R] Replacing for loop with tapply!?


Dear all,

We have a large data set with temperature data for weather stations
across the globe (15000 stations).

For each station, we need to calculate the number of days a certain
temperature is exceeded.

So far we used the following S code, where mat88 is a matrix
containing
rows of 365 daily temperatures for each of 15000 weather stations:

m - 37
n - 2
outmat88 - matrix(0, ncol = 4, nrow = nrow(mat88))
for(i in 1:nrow(mat88)) {
# i - 3
row1 - as.data.frame(df88[i,  ])
temprow37 - select.rows(row1, row1  m)
temprow39 - select.rows(row1, row1  m + n)
temprow41 - select.rows(row1, row1  m + 2 * n)
outmat88[i, 1] - max(row1, na.rm = T)
outmat88[i, 2] - count.rows(temprow37)
outmat88[i, 3] - count.rows(temprow39)
outmat88[i, 4] - count.rows(temprow41)
}
outmat88

We have transferred the data to a more potent Linux box running R,
but
still hope to speed up the code.

I know a for loop should be avoided when looking for speed. I also
know
the answer is in something like tapply, but my understanding of
these
commands is still to limited to see the solution. Could someone 
show
me
the way!?

Thanks in advance,

Sander.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Replacing for loop with tapply!?

2005-06-11 Thread Petr Pikal
Hi

On 10 Jun 2005 at 20:05, Sander Oom wrote:

 Dear all,
 
 Dimitris and Andy, thanks for your great help. I have progressed to
 the following code which runs very fast and effective:
 
 mat - matrix(sample(-15:50, 15 * 10, TRUE), 15, 10)
 mat[mat45] - NA

 mat-NA

By this you redefine mat as 

 str(mat)
 logi NA


and your code gives an error that it has to have some dimensions

+  apply(mat, 1, max, na.rm=TRUE))
Error in rowSums(mat  temp, na.rm = TRUE) : 
'x' must be an array of at least two dimensions


If your matrix has one row full of NA's it only complains but 
computes a value. 

 mat[3,]-NA
 temps - c(35, 37, 39)
 ind - rbind(
+  t(sapply(temps, function(temp)
+rowSums(mat  temp, na.rm=TRUE) )),
+  rowSums(!is.na(mat), na.rm=FALSE),
+  apply(mat, 1, max, na.rm=TRUE))
Warning message:
no finite arguments to max; returning -Inf 
 ind - t(ind)
 ind

 ind
  [,1] [,2] [,3] [,4] [,5]
 [1,]5539   48
 [2,]1119   42
 [3,]0000 -Inf
 
 mat
 temps - c(35, 37, 39)
 ind - rbind(
  t(sapply(temps, function(temp)
rowSums(mat  temp, na.rm=TRUE) )),
  rowSums(!is.na(mat), na.rm=FALSE),
  apply(mat, 1, max, na.rm=TRUE))
 ind - t(ind)
 ind
 
 However, some weather stations have missing values for the whole year.
 Unfortunately, the code breaks down (when uncommenting mat-NA).
 
 I have tried 'ifelse' statements in the functions, but it becomes even
 more of a mess. I could subset the matrix before hand, but this would
 mean merging with a complete matrix afterwards to make it compatible
 with other years. That would slow things down.
 
 How can I make the code robust for rows containing all missing values?


which(rowSums(!is.na(mat))==0) 
This gives you indices which lines of your matrix has all values NA 
and you can use it for fine tuning of your code. What you need to 
do depends on what results do you want, how ind matrix should 
look like after processing mat with one or more rows full of NA's.

HTH
Petr


 
 Thanks for your help,
 
 Sander.
 
 Dimitris Rizopoulos wrote:
  for the maximum you could use something like:
  
  ind[, 1] - apply(mat, 2, max)
  
  I hope it helps.
  
  Best,
  Dimitris
  
  
  Dimitris Rizopoulos
  Ph.D. Student
  Biostatistical Centre
  School of Public Health
  Catholic University of Leuven
  
  Address: Kapucijnenvoer 35, Leuven, Belgium
  Tel: +32/16/336899
  Fax: +32/16/337015
  Web: http://www.med.kuleuven.ac.be/biostat/
   http://www.student.kuleuven.ac.be/~m0390867/dimitris.htm
  
  
  
  - Original Message - 
  From: Sander Oom [EMAIL PROTECTED]
  To: Dimitris Rizopoulos [EMAIL PROTECTED] Cc:
  r-help@stat.math.ethz.ch Sent: Friday, June 10, 2005 12:10 PM
  Subject: Re: [R] Replacing for loop with tapply!?
  
  
 Thanks Dimitris,
 
 Very impressive! Much faster than before.
 
 Thanks to new found R.basic, I can simply rotate the result with
 rotate270{R.basic}:
 
 mat - matrix(sample(-15:50, 365 * 15000, TRUE), 365, 15000)
 temps - c(37, 39, 41)
 #
 #ind - matrix(0, length(temps), ncol(mat))
 ind - matrix(0, 4, ncol(mat))
 (startDate - date())
 [1] Fri Jun 10 12:08:01 2005
 for(i in seq(along = temps)) ind[i, ] - colSums(mat  temps[i])
 ind[4, ] - colMeans(max(mat))
 Error in colMeans(max(mat)) : 'x' must be an array of at least two
 dimensions
 (endDate - date())
 [1] Fri Jun 10 12:08:02 2005
 ind - rotate270(ind)
 ind[1:10,]
V4 V3 V2 V1
 1   0 56 75 80
 2   0 46 53 60
 3   0 50 58 67
 4   0 60 72 80
 5   0 59 68 76
 6   0 55 67 74
 7   0 62 77 93
 8   0 45 57 67
 9   0 57 68 75
 10  0 61 66 76
 
 However, I have not managed to get the row maximum using your 
 method? It
 should be 50 for most rows, but my first guess code gives an error!
 
 Any suggestions?
 
 Sander
 
 
 
 Dimitris Rizopoulos wrote:
 maybe you are looking for something along these lines:
 
 mat - matrix(sample(-15:50, 365 * 15000, TRUE), 365, 15000)
 temps - c(37, 39, 41)
 #
 ind - matrix(0, length(temps), ncol(mat))
 for(i in seq(along = temps)) ind[i, ] - colSums(mat  temps[i])
 ind
 
 
 I hope it helps.
 
 Best,
 Dimitris
 
 
 Dimitris Rizopoulos
 Ph.D. Student
 Biostatistical Centre
 School of Public Health
 Catholic University of Leuven
 
 Address: Kapucijnenvoer 35, Leuven, Belgium
 Tel: +32/16/336899
 Fax: +32/16/337015
 Web: http://www.med.kuleuven.ac.be/biostat/
  http://www.student.kuleuven.ac.be/~m0390867/dimitris.htm
 
 
 - Original Message - 
 From: Sander Oom [EMAIL PROTECTED]
 To: r-help@stat.math.ethz.ch
 Sent: Friday, June 10, 2005 10:50 AM
 Subject: [R] Replacing for loop with tapply!?
 
 
 Dear all,
 
 We have a large data set with temperature data for weather
 stations across the globe (15000 stations).
 
 For each station, we need to calculate the number of days a
 certain temperature is exceeded.
 
 So far we used the following S code, where mat88 is a matrix
 containing
 rows of 365 daily temperatures for each

Re: [R] Replacing for loop with tapply!?

2005-06-10 Thread Dimitris Rizopoulos
maybe you are looking for something along these lines:

mat - matrix(sample(-15:50, 365 * 15000, TRUE), 365, 15000)
temps - c(37, 39, 41)
#
ind - matrix(0, length(temps), ncol(mat))
for(i in seq(along = temps)) ind[i, ] - colSums(mat  temps[i])
ind


I hope it helps.

Best,
Dimitris


Dimitris Rizopoulos
Ph.D. Student
Biostatistical Centre
School of Public Health
Catholic University of Leuven

Address: Kapucijnenvoer 35, Leuven, Belgium
Tel: +32/16/336899
Fax: +32/16/337015
Web: http://www.med.kuleuven.ac.be/biostat/
 http://www.student.kuleuven.ac.be/~m0390867/dimitris.htm


- Original Message - 
From: Sander Oom [EMAIL PROTECTED]
To: r-help@stat.math.ethz.ch
Sent: Friday, June 10, 2005 10:50 AM
Subject: [R] Replacing for loop with tapply!?


 Dear all,

 We have a large data set with temperature data for weather stations
 across the globe (15000 stations).

 For each station, we need to calculate the number of days a certain
 temperature is exceeded.

 So far we used the following S code, where mat88 is a matrix 
 containing
 rows of 365 daily temperatures for each of 15000 weather stations:

 m - 37
 n - 2
 outmat88 - matrix(0, ncol = 4, nrow = nrow(mat88))
 for(i in 1:nrow(mat88)) {
 # i - 3
 row1 - as.data.frame(df88[i,  ])
 temprow37 - select.rows(row1, row1  m)
 temprow39 - select.rows(row1, row1  m + n)
 temprow41 - select.rows(row1, row1  m + 2 * n)
 outmat88[i, 1] - max(row1, na.rm = T)
 outmat88[i, 2] - count.rows(temprow37)
 outmat88[i, 3] - count.rows(temprow39)
 outmat88[i, 4] - count.rows(temprow41)
 }
 outmat88

 We have transferred the data to a more potent Linux box running R, 
 but
 still hope to speed up the code.

 I know a for loop should be avoided when looking for speed. I also 
 know
 the answer is in something like tapply, but my understanding of 
 these
 commands is still to limited to see the solution. Could someone show 
 me
 the way!?

 Thanks in advance,

 Sander.
 -- 
 
 Dr Sander P. Oom
 Animal, Plant and Environmental Sciences,
 University of the Witwatersrand
 Private Bag 3, Wits 2050, South Africa
 Tel (work)  +27 (0)11 717 64 04
 Tel (home)  +27 (0)18 297 44 51
 Fax +27 (0)18 299 24 64
 Email   [EMAIL PROTECTED]
 Web www.oomvanlieshout.net/sander

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Replacing for loop with tapply!?

2005-06-10 Thread Sander Oom
Thanks Dimitris,

Very impressive! Much faster than before.

Thanks to new found R.basic, I can simply rotate the result with 
rotate270{R.basic}:

  mat - matrix(sample(-15:50, 365 * 15000, TRUE), 365, 15000)
  temps - c(37, 39, 41)
  #
  #ind - matrix(0, length(temps), ncol(mat))
  ind - matrix(0, 4, ncol(mat))
  (startDate - date())
[1] Fri Jun 10 12:08:01 2005
  for(i in seq(along = temps)) ind[i, ] - colSums(mat  temps[i])
  ind[4, ] - colMeans(max(mat))
Error in colMeans(max(mat)) : 'x' must be an array of at least two 
dimensions
  (endDate - date())
[1] Fri Jun 10 12:08:02 2005
  ind - rotate270(ind)
  ind[1:10,]
V4 V3 V2 V1
1   0 56 75 80
2   0 46 53 60
3   0 50 58 67
4   0 60 72 80
5   0 59 68 76
6   0 55 67 74
7   0 62 77 93
8   0 45 57 67
9   0 57 68 75
10  0 61 66 76

However, I have not managed to get the row maximum using your method? It 
should be 50 for most rows, but my first guess code gives an error!

Any suggestions?

Sander



Dimitris Rizopoulos wrote:
 maybe you are looking for something along these lines:
 
 mat - matrix(sample(-15:50, 365 * 15000, TRUE), 365, 15000)
 temps - c(37, 39, 41)
 #
 ind - matrix(0, length(temps), ncol(mat))
 for(i in seq(along = temps)) ind[i, ] - colSums(mat  temps[i])
 ind
 
 
 I hope it helps.
 
 Best,
 Dimitris
 
 
 Dimitris Rizopoulos
 Ph.D. Student
 Biostatistical Centre
 School of Public Health
 Catholic University of Leuven
 
 Address: Kapucijnenvoer 35, Leuven, Belgium
 Tel: +32/16/336899
 Fax: +32/16/337015
 Web: http://www.med.kuleuven.ac.be/biostat/
  http://www.student.kuleuven.ac.be/~m0390867/dimitris.htm
 
 
 - Original Message - 
 From: Sander Oom [EMAIL PROTECTED]
 To: r-help@stat.math.ethz.ch
 Sent: Friday, June 10, 2005 10:50 AM
 Subject: [R] Replacing for loop with tapply!?
 
 
Dear all,

We have a large data set with temperature data for weather stations
across the globe (15000 stations).

For each station, we need to calculate the number of days a certain
temperature is exceeded.

So far we used the following S code, where mat88 is a matrix 
containing
rows of 365 daily temperatures for each of 15000 weather stations:

m - 37
n - 2
outmat88 - matrix(0, ncol = 4, nrow = nrow(mat88))
for(i in 1:nrow(mat88)) {
# i - 3
row1 - as.data.frame(df88[i,  ])
temprow37 - select.rows(row1, row1  m)
temprow39 - select.rows(row1, row1  m + n)
temprow41 - select.rows(row1, row1  m + 2 * n)
outmat88[i, 1] - max(row1, na.rm = T)
outmat88[i, 2] - count.rows(temprow37)
outmat88[i, 3] - count.rows(temprow39)
outmat88[i, 4] - count.rows(temprow41)
}
outmat88

We have transferred the data to a more potent Linux box running R, 
but
still hope to speed up the code.

I know a for loop should be avoided when looking for speed. I also 
know
the answer is in something like tapply, but my understanding of 
these
commands is still to limited to see the solution. Could someone show 
me
the way!?

Thanks in advance,

Sander.
-- 

Dr Sander P. Oom
Animal, Plant and Environmental Sciences,
University of the Witwatersrand
Private Bag 3, Wits 2050, South Africa
Tel (work)  +27 (0)11 717 64 04
Tel (home)  +27 (0)18 297 44 51
Fax +27 (0)18 299 24 64
Email   [EMAIL PROTECTED]
Web www.oomvanlieshout.net/sander

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html

 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
 


-- 

Dr Sander P. Oom
Animal, Plant and Environmental Sciences,
University of the Witwatersrand
Private Bag 3, Wits 2050, South Africa
Tel (work)  +27 (0)11 717 64 04
Tel (home)  +27 (0)18 297 44 51
Fax +27 (0)18 299 24 64
Email   [EMAIL PROTECTED]
Web www.oomvanlieshout.net/sander

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Replacing for loop with tapply!?

2005-06-10 Thread Dimitris Rizopoulos
for the maximum you could use something like:

ind[, 1] - apply(mat, 2, max)

I hope it helps.

Best,
Dimitris


Dimitris Rizopoulos
Ph.D. Student
Biostatistical Centre
School of Public Health
Catholic University of Leuven

Address: Kapucijnenvoer 35, Leuven, Belgium
Tel: +32/16/336899
Fax: +32/16/337015
Web: http://www.med.kuleuven.ac.be/biostat/
 http://www.student.kuleuven.ac.be/~m0390867/dimitris.htm



- Original Message - 
From: Sander Oom [EMAIL PROTECTED]
To: Dimitris Rizopoulos [EMAIL PROTECTED]
Cc: r-help@stat.math.ethz.ch
Sent: Friday, June 10, 2005 12:10 PM
Subject: Re: [R] Replacing for loop with tapply!?


 Thanks Dimitris,

 Very impressive! Much faster than before.

 Thanks to new found R.basic, I can simply rotate the result with
 rotate270{R.basic}:

  mat - matrix(sample(-15:50, 365 * 15000, TRUE), 365, 15000)
  temps - c(37, 39, 41)
  #
  #ind - matrix(0, length(temps), ncol(mat))
  ind - matrix(0, 4, ncol(mat))
  (startDate - date())
 [1] Fri Jun 10 12:08:01 2005
  for(i in seq(along = temps)) ind[i, ] - colSums(mat  temps[i])
  ind[4, ] - colMeans(max(mat))
 Error in colMeans(max(mat)) : 'x' must be an array of at least two
 dimensions
  (endDate - date())
 [1] Fri Jun 10 12:08:02 2005
  ind - rotate270(ind)
  ind[1:10,]
V4 V3 V2 V1
 1   0 56 75 80
 2   0 46 53 60
 3   0 50 58 67
 4   0 60 72 80
 5   0 59 68 76
 6   0 55 67 74
 7   0 62 77 93
 8   0 45 57 67
 9   0 57 68 75
 10  0 61 66 76

 However, I have not managed to get the row maximum using your 
 method? It
 should be 50 for most rows, but my first guess code gives an error!

 Any suggestions?

 Sander



 Dimitris Rizopoulos wrote:
 maybe you are looking for something along these lines:

 mat - matrix(sample(-15:50, 365 * 15000, TRUE), 365, 15000)
 temps - c(37, 39, 41)
 #
 ind - matrix(0, length(temps), ncol(mat))
 for(i in seq(along = temps)) ind[i, ] - colSums(mat  temps[i])
 ind


 I hope it helps.

 Best,
 Dimitris

 
 Dimitris Rizopoulos
 Ph.D. Student
 Biostatistical Centre
 School of Public Health
 Catholic University of Leuven

 Address: Kapucijnenvoer 35, Leuven, Belgium
 Tel: +32/16/336899
 Fax: +32/16/337015
 Web: http://www.med.kuleuven.ac.be/biostat/
  http://www.student.kuleuven.ac.be/~m0390867/dimitris.htm


 - Original Message - 
 From: Sander Oom [EMAIL PROTECTED]
 To: r-help@stat.math.ethz.ch
 Sent: Friday, June 10, 2005 10:50 AM
 Subject: [R] Replacing for loop with tapply!?


Dear all,

We have a large data set with temperature data for weather stations
across the globe (15000 stations).

For each station, we need to calculate the number of days a certain
temperature is exceeded.

So far we used the following S code, where mat88 is a matrix
containing
rows of 365 daily temperatures for each of 15000 weather stations:

m - 37
n - 2
outmat88 - matrix(0, ncol = 4, nrow = nrow(mat88))
for(i in 1:nrow(mat88)) {
# i - 3
row1 - as.data.frame(df88[i,  ])
temprow37 - select.rows(row1, row1  m)
temprow39 - select.rows(row1, row1  m + n)
temprow41 - select.rows(row1, row1  m + 2 * n)
outmat88[i, 1] - max(row1, na.rm = T)
outmat88[i, 2] - count.rows(temprow37)
outmat88[i, 3] - count.rows(temprow39)
outmat88[i, 4] - count.rows(temprow41)
}
outmat88

We have transferred the data to a more potent Linux box running R,
but
still hope to speed up the code.

I know a for loop should be avoided when looking for speed. I also
know
the answer is in something like tapply, but my understanding of
these
commands is still to limited to see the solution. Could someone 
show
me
the way!?

Thanks in advance,

Sander.
-- 

Dr Sander P. Oom
Animal, Plant and Environmental Sciences,
University of the Witwatersrand
Private Bag 3, Wits 2050, South Africa
Tel (work)  +27 (0)11 717 64 04
Tel (home)  +27 (0)18 297 44 51
Fax +27 (0)18 299 24 64
Email   [EMAIL PROTECTED]
Web www.oomvanlieshout.net/sander

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html


 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html



 -- 
 
 Dr Sander P. Oom
 Animal, Plant and Environmental Sciences,
 University of the Witwatersrand
 Private Bag 3, Wits 2050, South Africa
 Tel (work)  +27 (0)11 717 64 04
 Tel (home)  +27 (0)18 297 44 51
 Fax +27 (0)18 299 24 64
 Email   [EMAIL PROTECTED]
 Web www.oomvanlieshout.net/sander

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html

Re: [R] Replacing for loop with tapply!?

2005-06-10 Thread Sander Oom
Dear all,

Dimitris and Andy, thanks for your great help. I have progressed to the 
following code which runs very fast and effective:

mat - matrix(sample(-15:50, 15 * 10, TRUE), 15, 10)
mat[mat45] - NA
mat-NA
mat
temps - c(35, 37, 39)
ind - rbind(
 t(sapply(temps, function(temp)
   rowSums(mat  temp, na.rm=TRUE) )),
 rowSums(!is.na(mat), na.rm=FALSE),
 apply(mat, 1, max, na.rm=TRUE))
ind - t(ind)
ind

However, some weather stations have missing values for the whole year. 
Unfortunately, the code breaks down (when uncommenting mat-NA).

I have tried 'ifelse' statements in the functions, but it becomes even 
more of a mess. I could subset the matrix before hand, but this would 
mean merging with a complete matrix afterwards to make it compatible 
with other years. That would slow things down.

How can I make the code robust for rows containing all missing values?

Thanks for your help,

Sander.

Dimitris Rizopoulos wrote:
 for the maximum you could use something like:
 
 ind[, 1] - apply(mat, 2, max)
 
 I hope it helps.
 
 Best,
 Dimitris
 
 
 Dimitris Rizopoulos
 Ph.D. Student
 Biostatistical Centre
 School of Public Health
 Catholic University of Leuven
 
 Address: Kapucijnenvoer 35, Leuven, Belgium
 Tel: +32/16/336899
 Fax: +32/16/337015
 Web: http://www.med.kuleuven.ac.be/biostat/
  http://www.student.kuleuven.ac.be/~m0390867/dimitris.htm
 
 
 
 - Original Message - 
 From: Sander Oom [EMAIL PROTECTED]
 To: Dimitris Rizopoulos [EMAIL PROTECTED]
 Cc: r-help@stat.math.ethz.ch
 Sent: Friday, June 10, 2005 12:10 PM
 Subject: Re: [R] Replacing for loop with tapply!?
 
 
Thanks Dimitris,

Very impressive! Much faster than before.

Thanks to new found R.basic, I can simply rotate the result with
rotate270{R.basic}:

mat - matrix(sample(-15:50, 365 * 15000, TRUE), 365, 15000)
temps - c(37, 39, 41)
#
#ind - matrix(0, length(temps), ncol(mat))
ind - matrix(0, 4, ncol(mat))
(startDate - date())
[1] Fri Jun 10 12:08:01 2005
for(i in seq(along = temps)) ind[i, ] - colSums(mat  temps[i])
ind[4, ] - colMeans(max(mat))
Error in colMeans(max(mat)) : 'x' must be an array of at least two
dimensions
(endDate - date())
[1] Fri Jun 10 12:08:02 2005
ind - rotate270(ind)
ind[1:10,]
   V4 V3 V2 V1
1   0 56 75 80
2   0 46 53 60
3   0 50 58 67
4   0 60 72 80
5   0 59 68 76
6   0 55 67 74
7   0 62 77 93
8   0 45 57 67
9   0 57 68 75
10  0 61 66 76

However, I have not managed to get the row maximum using your 
method? It
should be 50 for most rows, but my first guess code gives an error!

Any suggestions?

Sander



Dimitris Rizopoulos wrote:
maybe you are looking for something along these lines:

mat - matrix(sample(-15:50, 365 * 15000, TRUE), 365, 15000)
temps - c(37, 39, 41)
#
ind - matrix(0, length(temps), ncol(mat))
for(i in seq(along = temps)) ind[i, ] - colSums(mat  temps[i])
ind


I hope it helps.

Best,
Dimitris


Dimitris Rizopoulos
Ph.D. Student
Biostatistical Centre
School of Public Health
Catholic University of Leuven

Address: Kapucijnenvoer 35, Leuven, Belgium
Tel: +32/16/336899
Fax: +32/16/337015
Web: http://www.med.kuleuven.ac.be/biostat/
 http://www.student.kuleuven.ac.be/~m0390867/dimitris.htm


- Original Message - 
From: Sander Oom [EMAIL PROTECTED]
To: r-help@stat.math.ethz.ch
Sent: Friday, June 10, 2005 10:50 AM
Subject: [R] Replacing for loop with tapply!?


Dear all,

We have a large data set with temperature data for weather stations
across the globe (15000 stations).

For each station, we need to calculate the number of days a certain
temperature is exceeded.

So far we used the following S code, where mat88 is a matrix
containing
rows of 365 daily temperatures for each of 15000 weather stations:

m - 37
n - 2
outmat88 - matrix(0, ncol = 4, nrow = nrow(mat88))
for(i in 1:nrow(mat88)) {
# i - 3
row1 - as.data.frame(df88[i,  ])
temprow37 - select.rows(row1, row1  m)
temprow39 - select.rows(row1, row1  m + n)
temprow41 - select.rows(row1, row1  m + 2 * n)
outmat88[i, 1] - max(row1, na.rm = T)
outmat88[i, 2] - count.rows(temprow37)
outmat88[i, 3] - count.rows(temprow39)
outmat88[i, 4] - count.rows(temprow41)
}
outmat88

We have transferred the data to a more potent Linux box running R,
but
still hope to speed up the code.

I know a for loop should be avoided when looking for speed. I also
know
the answer is in something like tapply, but my understanding of
these
commands is still to limited to see the solution. Could someone 
show
me
the way!?

Thanks in advance,

Sander.
--

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Replacing for loop with tapply!?

2005-06-10 Thread Kjetil Brinchmann Halvorsen
Sander Oom wrote:

Dear all,

We have a large data set with temperature data for weather stations 
across the globe (15000 stations).

For each station, we need to calculate the number of days a certain 
temperature is exceeded.

So far we used the following S code, where mat88 is a matrix containing 
rows of 365 daily temperatures for each of 15000 weather stations:

   m - 37
   n - 2
   outmat88 - matrix(0, ncol = 4, nrow = nrow(mat88))
   for(i in 1:nrow(mat88)) {
   # i - 3
   row1 - as.data.frame(df88[i,  ])
   temprow37 - select.rows(row1, row1  m)
   temprow39 - select.rows(row1, row1  m + n)
   temprow41 - select.rows(row1, row1  m + 2 * n)
   outmat88[i, 1] - max(row1, na.rm = T)
   outmat88[i, 2] - count.rows(temprow37)
   outmat88[i, 3] - count.rows(temprow39)
   outmat88[i, 4] - count.rows(temprow41)
   }
   outmat88

  

What you need is not tapply but apply. Something like
   apply(mat88, 1, function(x) sum(x  30))

where your treshold should replace 30 and the `1' refers to rows. For 
multiple tresholds:

apply(mat88, 1, function(x) c( sum(x20), sum(x25), sum(x30)))

Kjetil

We have transferred the data to a more potent Linux box running R, but 
still hope to speed up the code.

I know a for loop should be avoided when looking for speed. I also know 
the answer is in something like tapply, but my understanding of these 
commands is still to limited to see the solution. Could someone show me 
the way!?

Thanks in advance,

Sander.
  



-- 

Kjetil Halvorsen.

Peace is the most effective weapon of mass construction.
   --  Mahdi Elmandjra





-- 
No virus found in this outgoing message.
Checked by AVG Anti-Virus.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Replacing for loop with tapply!?

2005-06-10 Thread Adaikalavan Ramasamy
OK, so you want to find some summary statistics for each column, where
some columns could be completely missing. 

Writing a small wrapper should help. When you use apply(), you are
actually applying a function to every column (or row). First, let us
simulate a dataset with 15 days/rows and 10 stations/columns 

### simulate data
set.seed(1)# for reproducibility 
mat - matrix(sample(-15:50, 15 * 10, TRUE), 15, 10)  
mat[ mat  45 ] - NA  # create some missing values
mat[ ,9 ]   - NA  # station 9's data is completely missing


Here are two example of such wrappers :

find.stats1 - function( data, threshold=c(37,39,41) ){
  
  n   - length(threshold)
  out - matrix(  nrow=(n + 1), ncol=ncol(data) ) # initialise

  out[1, ] - apply(data, 2, function(x) 
 ifelse( all(is.na(x)), NA, max(x, na.rm=T) ))

  for(i in 1:n) out[ i+1, ] - colSums( data  threshold[i], na.rm=T )
  
  rownames(out) - c( daily_max, paste(above, threshold, sep=_) )
  colnames(out) - rownames(data)  # name of the stations
  return( out )
}
  
find.stats2 - function( data, threshold=c(37,39,41) ){
  
  n  - length(threshold)
  excess - numeric( n )
  out- matrix(  nrow=(n + 1), ncol=ncol(data) ) # initialise
  good   - which( apply( data, 2, function(x) !all(is.na(x)) ) )
  # colums that are not completely missing
 
  out[ , good] - apply( data[ , good], 2, function(x){
m - max( x, na.rm=T )
for(i in 1:n){ excess[i] - sum( x  threshold[i], na.rm=TRUE ) }
return( c(m, excess) )
  } ) 
  
  rownames(out) - c( daily_max, paste(above, threshold, sep=_) )
  colnames(out) - rownames(data)  # name of the stations
  return( out )
}

find.stats1( mat )
  [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
daily_max   44   42   39   41   45   43   42   45   NA42
above_37 212132210 1
above_39 210132110 1
above_41 210022110 1

find.stats2( mat )
  [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
daily_max   44   42   39   41   45   43   42   45   NA42
above_37 21213221   NA 1
above_39 21013211   NA 1
above_41 21002211   NA 1


On my laptop 'find.stats1' and 'find.stats2' (which is more flexible)
takes 7 and 6 seconds respectively to execute on a dataset with 1
stations and 365 days.

Regards, Adai



On Fri, 2005-06-10 at 20:05 +0200, Sander Oom wrote:
 Dear all,
 
 Dimitris and Andy, thanks for your great help. I have progressed to the 
 following code which runs very fast and effective:
 
 mat - matrix(sample(-15:50, 15 * 10, TRUE), 15, 10)
 mat[mat45] - NA
 mat-NA
 mat
 temps - c(35, 37, 39)
 ind - rbind(
  t(sapply(temps, function(temp)
rowSums(mat  temp, na.rm=TRUE) )),
  rowSums(!is.na(mat), na.rm=FALSE),
  apply(mat, 1, max, na.rm=TRUE))
 ind - t(ind)
 ind
 
 However, some weather stations have missing values for the whole year. 
 Unfortunately, the code breaks down (when uncommenting mat-NA).
 
 I have tried 'ifelse' statements in the functions, but it becomes even 
 more of a mess. I could subset the matrix before hand, but this would 
 mean merging with a complete matrix afterwards to make it compatible 
 with other years. That would slow things down.
 
 How can I make the code robust for rows containing all missing values?
 
 Thanks for your help,
 
 Sander.
 
 Dimitris Rizopoulos wrote:
  for the maximum you could use something like:
  
  ind[, 1] - apply(mat, 2, max)
  
  I hope it helps.
  
  Best,
  Dimitris
  
  
  Dimitris Rizopoulos
  Ph.D. Student
  Biostatistical Centre
  School of Public Health
  Catholic University of Leuven
  
  Address: Kapucijnenvoer 35, Leuven, Belgium
  Tel: +32/16/336899
  Fax: +32/16/337015
  Web: http://www.med.kuleuven.ac.be/biostat/
   http://www.student.kuleuven.ac.be/~m0390867/dimitris.htm
  
  
  
  - Original Message - 
  From: Sander Oom [EMAIL PROTECTED]
  To: Dimitris Rizopoulos [EMAIL PROTECTED]
  Cc: r-help@stat.math.ethz.ch
  Sent: Friday, June 10, 2005 12:10 PM
  Subject: Re: [R] Replacing for loop with tapply!?
  
  
 Thanks Dimitris,
 
 Very impressive! Much faster than before.
 
 Thanks to new found R.basic, I can simply rotate the result with
 rotate270{R.basic}:
 
 mat - matrix(sample(-15:50, 365 * 15000, TRUE), 365, 15000)
 temps - c(37, 39, 41)
 #
 #ind - matrix(0, length(temps), ncol(mat))
 ind - matrix(0, 4, ncol(mat))
 (startDate - date())
 [1] Fri Jun 10 12:08:01 2005
 for(i in seq(along = temps)) ind[i, ] - colSums(mat  temps[i])
 ind[4, ] - colMeans(max(mat))
 Error in colMeans(max(mat)) : 'x' must be an array of at least two
 dimensions
 (endDate - date())
 [1] Fri Jun 10 12:08:02 2005
 ind - rotate270(ind)
 ind[1:10,]
V4 V3 V2 V1
 1   0 56 75 80
 2   0 46 53 60
 3   0 50 58 67
 4   0 60 72 80
 5   0 59 68 76
 6