Re: [R] Speeding indexing and sub-sectioning of 3d array

2006-08-13 Thread Ray Brownrigg
 Date: Thu, 10 Aug 2006 14:34:27 -0400
 From: Swidan, Firas [EMAIL PROTECTED]
 
 Hi Patrick,
 
 Thanks for the help. The function I listed is just an example. I isolated
 and kept only the problematic part in my code for clarity sake. I ended up
 implementing the functionality in C and now it takes 22 seconds to calculate
 the objective.
 
 Best regards,
 Firas.
 
Interestingly, I was able to develop an algorithm in R that achieves the
same order-of-magnitude speedup as your C code, but at the expense of
greater memory requirements.  However it only works if the function you
are using is really is mean() [your code labels use Median].  It does
this by making use of cumsum() and logical indexing, working with sums
of values rather than calculationg the means and then dividing by the
numbers of values in the hypercube at the end.

If you want to try coding this algorithm in C for even greater
performance improvement (or for interest only), let me know.  I suspect
it will be difficult to code in C because of the vectorisation it takes
advantage of.

In the output below, cK3d() is your algorithm (slightly adjusted to
cover the whole matrix and to return something), and cK3dme is my
equivalent, running on a Pentium IV 3.2GHz, NetBSD system with 1GB
memory.

Regards,
Ray Brownrigg

 x - rnorm(245*175*150)
 dim(x) - c(245, 175, 150)
 unix.time(yme - cK3dme(x, 3))
[1] 13.870  1.690 15.813  0.000  0.000
 unix.time(y - cK3d(x, 3))
[1] 500.206   0.035 505.738   0.000   0.000
 all.equal(y, yme)
[1] TRUE
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Speeding indexing and sub-sectioning of 3d array

2006-08-10 Thread Swidan, Firas
Hi Patrick,

Thanks for the help. The function I listed is just an example. I isolated
and kept only the problematic part in my code for clarity sake. I ended up
implementing the functionality in C and now it takes 22 seconds to calculate
the objective.

Best regards,
Firas.

On 8/9/06 4:41 PM, Patrick Burns [EMAIL PROTECTED] wrote:

 First off, I hope that the function you list is just an example
 since it only returns what the last iteration does -- obviously
 the same answer can be arrived at much quicker.
 
 The main principal in speeding up loops is to do as little
 inside the loops as possible.  'fjj1' is essentially the same as
 the listed function, but with one slight cleanup.
 
 fjj1 -
 function(x, radius=3)
 {
 dx - dim(x)
 dx1 - dx[1]
 dx2 - dx[2]
 dx3 - dx[3]
 for(i in (radius + 1):(dx1 - radius - 1)) {
 for(j in (radius + 1):(dx2 - radius - 1)) {
 for(k in (radius + 1):(dx3 - radius -1)) {
 ans - mean(x[(i-radius):(i+radius),
 (j-radius):(j+radius),
 (k-radius):(k+radius)])
 }
 }
 }
 ans
 }
 
 The time to run fjj1(jj, 3) on my machine where jj is a
 245 by 175 by 150 array was 1222 seconds.
 
 'fjj2'  substantially reduces the number of sequences
 created.  It took 975 seconds.
 
 fjj2 -
 function(x, radius=3)
 {
 dx - dim(x)
 dx1 - dx[1]
 dx2 - dx[2]
 dx3 - dx[3]
 rseq - -radius:radius
 for(i in (radius + 1):(dx1 - radius - 1)) {
 for(j in (radius + 1):(dx2 - radius - 1)) {
 for(k in (radius + 1):(dx3 - radius -1)) {
 ans - mean(x[i + rseq, j + rseq, k + rseq])
 }
 }
 }
 ans
 }
 
 
 'fjj3' reduces some of the subscripting (but possibly at the
 expense of using more memory -- I'm not sure if it does or
 not).  It took 936 seconds.
 
 fjj3 -
 function(x, radius=3)
 {
 dx - dim(x)
 dx1 - dx[1]
 dx2 - dx[2]
 dx3 - dx[3]
 rseq - -radius:radius
 for(i in (radius + 1):(dx1 - radius - 1)) {
 A - x[i + rseq, , , drop=FALSE]
 for(j in (radius + 1):(dx2 - radius - 1)) {
 B - A[, j + rseq, , drop=FALSE]
 for(k in (radius + 1):(dx3 - radius -1)) {
 ans - mean(B[ , , k + rseq])
 }
 }
 }
 ans
 }
 
 'fjj4' reverses the order of the loops.  Because of the
 way that arrays are stored, it makes sense that subscripting
 a sequence in the first dimension would be faster than
 subscripting subsequent dimensions.  This does seem to be
 the case.  'fjj4' took 839 seconds.
 
 fjj4 -
 function(x, radius=3)
 {
 dx - dim(x)
 dx1 - dx[1]
 dx2 - dx[2]
 dx3 - dx[3]
 rseq - -radius:radius
 for(i in (radius + 1):(dx3 - radius - 1)) {
 A - x[, ,i + rseq, drop=FALSE]
 for(j in (radius + 1):(dx2 - radius - 1)) {
 B - A[, j + rseq, , drop=FALSE]
 for(k in (radius + 1):(dx1 - radius -1)) {
 ans - mean(B[k + rseq, , ])
 }
 }
 }
 ans
 }
 
 
 Another change that would make a marginal difference
 would be to generate the sequences controlling the inner
 loops once at the outset.
 
 If the computation at the heart of the function really is a
 mean or something similar, then it is possible that there
 will be tricks to update that value more efficiently.
 
 Finally, if this will be used enough that the speed is an
 issue, then rewriting it in C would be a good approach.
 
 
 Patrick Burns
 [EMAIL PROTECTED]
 +44 (0)20 8525 0696
 http://www.burns-stat.com
 (home of S Poetry and A Guide for the Unwilling S User)
 
 Swidan, Firas wrote:
 
 Hi,
 
 I am having a problem with a very slow indexing and sub-sectioning of a 3d
 array:
 
  
 
 dim(arr)

 
 [1] 245 175 150
 
 For each point in the array, I am trying to calculate the mean of the values
 in its surrounding:
 
 
 mean( arr[ (i - radius):(i + radius),
(j - radius):(j + radius),
(k - radius):(k + radius)] )
 
 Putting that code in 3 for loops
 
 calculateKMedian - function( arr, radius){
 
  for( i in (radius + 1):(dim(arr)[1] - radius - 1) ){
for( j in (radius + 1):(dim(arr)[2] - radius - 1) )
  for( k in (radius + 1):(dim(arr)[3] - radius - 1) ){
 
 
mediansArr - mean( arr[ (i - radius):(i + radius),
(j - radius):(j + radius),
(k - radius):(k + radius)] )
 
  }
  }
  return(mediansArr)
 }
 
 Results in a very slow run:
 
  
 
 

[R] Speeding indexing and sub-sectioning of 3d array

2006-08-09 Thread Swidan, Firas
Hi,

I am having a problem with a very slow indexing and sub-sectioning of a 3d
array:

 dim(arr)
[1] 245 175 150

For each point in the array, I am trying to calculate the mean of the values
in its surrounding:


mean( arr[ (i - radius):(i + radius),
(j - radius):(j + radius),
(k - radius):(k + radius)] )

Putting that code in 3 for loops

calculateKMedian - function( arr, radius){

  for( i in (radius + 1):(dim(arr)[1] - radius - 1) ){
for( j in (radius + 1):(dim(arr)[2] - radius - 1) )
  for( k in (radius + 1):(dim(arr)[3] - radius - 1) ){


mediansArr - mean( arr[ (i - radius):(i + radius),
(j - radius):(j + radius),
(k - radius):(k + radius)] )

  }
  }
  return(mediansArr)
}

Results in a very slow run:

 system.time( calculateKMedian( a, 3))
[1] 423.468   0.096 423.631   0.000   0.000

If I replace 

mediansArr - mean( arr[ (i - radius):(i + radius),
(j - radius):(j + radius),
(k - radius):(k + radius)] )

With an access to the (I,j,k) cell's value

 mediansArr - arr[i,j,k]

The running time decreases to

 system.time( calculateKMedian( a, 3))
[1] 14.821  0.005 14.829  0.000  0.000



But 14 seconds are still pretty expensive for just scanning the array.

Is there anything I can do to speed the indexing and the sub-sectioning of
the 3d array in this case?

Thanks for the help,
Firas.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Speeding indexing and sub-sectioning of 3d array

2006-08-09 Thread Patrick Burns
First off, I hope that the function you list is just an example
since it only returns what the last iteration does -- obviously
the same answer can be arrived at much quicker.

The main principal in speeding up loops is to do as little
inside the loops as possible.  'fjj1' is essentially the same as
the listed function, but with one slight cleanup.

fjj1 -
function(x, radius=3)
{
dx - dim(x)
dx1 - dx[1]
dx2 - dx[2]
dx3 - dx[3]
for(i in (radius + 1):(dx1 - radius - 1)) {
for(j in (radius + 1):(dx2 - radius - 1)) {
for(k in (radius + 1):(dx3 - radius -1)) {
ans - mean(x[(i-radius):(i+radius),
(j-radius):(j+radius), 
(k-radius):(k+radius)])
}
}
}
ans
}

The time to run fjj1(jj, 3) on my machine where jj is a
245 by 175 by 150 array was 1222 seconds.

'fjj2'  substantially reduces the number of sequences
created.  It took 975 seconds.

fjj2 -
function(x, radius=3)
{
dx - dim(x)
dx1 - dx[1]
dx2 - dx[2]
dx3 - dx[3]
rseq - -radius:radius
for(i in (radius + 1):(dx1 - radius - 1)) {
for(j in (radius + 1):(dx2 - radius - 1)) {
for(k in (radius + 1):(dx3 - radius -1)) {
ans - mean(x[i + rseq, j + rseq, k + rseq])
}
}
}
ans
}


'fjj3' reduces some of the subscripting (but possibly at the
expense of using more memory -- I'm not sure if it does or
not).  It took 936 seconds.

fjj3 -
function(x, radius=3)
{
dx - dim(x)
dx1 - dx[1]
dx2 - dx[2]
dx3 - dx[3]
rseq - -radius:radius
for(i in (radius + 1):(dx1 - radius - 1)) {
A - x[i + rseq, , , drop=FALSE]
for(j in (radius + 1):(dx2 - radius - 1)) {
B - A[, j + rseq, , drop=FALSE]
for(k in (radius + 1):(dx3 - radius -1)) {
ans - mean(B[ , , k + rseq])
}
}
}
ans
}

'fjj4' reverses the order of the loops.  Because of the
way that arrays are stored, it makes sense that subscripting
a sequence in the first dimension would be faster than
subscripting subsequent dimensions.  This does seem to be
the case.  'fjj4' took 839 seconds.

fjj4 -
function(x, radius=3)
{
dx - dim(x)
dx1 - dx[1]
dx2 - dx[2]
dx3 - dx[3]
rseq - -radius:radius
for(i in (radius + 1):(dx3 - radius - 1)) {
A - x[, ,i + rseq, drop=FALSE]
for(j in (radius + 1):(dx2 - radius - 1)) {
B - A[, j + rseq, , drop=FALSE]
for(k in (radius + 1):(dx1 - radius -1)) {
ans - mean(B[k + rseq, , ])
}
}
}
ans
}


Another change that would make a marginal difference
would be to generate the sequences controlling the inner
loops once at the outset.

If the computation at the heart of the function really is a
mean or something similar, then it is possible that there
will be tricks to update that value more efficiently.

Finally, if this will be used enough that the speed is an
issue, then rewriting it in C would be a good approach.


Patrick Burns
[EMAIL PROTECTED]
+44 (0)20 8525 0696
http://www.burns-stat.com
(home of S Poetry and A Guide for the Unwilling S User)

Swidan, Firas wrote:

Hi,

I am having a problem with a very slow indexing and sub-sectioning of a 3d
array:

  

dim(arr)


[1] 245 175 150

For each point in the array, I am trying to calculate the mean of the values
in its surrounding:


mean( arr[ (i - radius):(i + radius),
(j - radius):(j + radius),
(k - radius):(k + radius)] )

Putting that code in 3 for loops

calculateKMedian - function( arr, radius){

  for( i in (radius + 1):(dim(arr)[1] - radius - 1) ){
for( j in (radius + 1):(dim(arr)[2] - radius - 1) )
  for( k in (radius + 1):(dim(arr)[3] - radius - 1) ){


mediansArr - mean( arr[ (i - radius):(i + radius),
(j - radius):(j + radius),
(k - radius):(k + radius)] )

  }
  }
  return(mediansArr)
}

Results in a very slow run:

  

system.time( calculateKMedian( a, 3))


[1] 423.468   0.096 423.631   0.000   0.000

If I replace 

mediansArr - mean( arr[ (i - radius):(i + radius),
(j - radius):(j + radius),
(k - radius):(k + radius)] )

With an access to the (I,j,k) cell's value

 mediansArr - arr[i,j,k]

The running time decreases to

  

system.time( calculateKMedian( a, 3))


[1] 14.821  0.005 14.829  0.000  0.000



But 14 seconds are