Re: [GRASS-user] r.watershed speed-up

2008-08-01 Thread Markus Metz

Hello list,

there is now a new version of r.watershed.fast where results are even 
more similar to r.watershed. They are still not 100% identical to 
r.watershed, but I can't get it more similar. But it comes with a 
further speed increase. I repeated the test of Moritz with the same 
commands on GRASS 6.4.svn, results are below.


Moritz Lennert wrote:


First test in North Carolina demo data set:

g.region rast=elevation

time r.watershed [EMAIL PROTECTED] accumulation=old_accum 
drainage=old_dir basin=old_sheds stream=old_streams thresh=500


real19m2.744s
user18m41.318s
sys 0m1.884s

time r.watershed.fast [EMAIL PROTECTED] 
accumulation=fast_accum drainage=fast_dir basin=fast_sheds 
stream=fast_streams thresh=500


real0m18.034s
user0m17.833s
sys 0m0.196s
Absolute times are not really comparable between systems, but relative 
differences in time should be similar. The following numbers are 
calculated with real time. In the test Moritz did, r.watershed took 63x 
as long as r.watershed.fast, i.e. r.watershed.fast needed only 1.6% of 
the time of r.watershed.
New version: r.watershed took 127x as long as r.watershed.fast, i.e. 
r.watershed.fast needed only 0.8% of the time of r.watershed.
Of the 2025000 cells in the map, 1991218 show the same direction, i.e. 
98%. Those which have different directions are overwhelmingly low 
slope cells.
New version: 2004480 cells, i.e. 99% of all cells show the same flow 
direction.
1833907 cells have the same accumulation value, i.e. 90%, but I guess 
this is to be expected.
New version: 1921510 cells, i.e. 95% of all cells show the same 
accumulation value.


The idea is that a faster r.watershed can also be used for massive 
grids, where GRASS users frequently gave up using r.watershed because it 
would have taken hours or even days. I resampled elevation in the 
North Carolina demo data set from 10m to 3m with r.resamp.rst using 
default values (after the GRASS book Section 5.3.3, paragraph 
Regularized spline with tension (RST) interpolation) to generate a 
fairly large map and ran the same test on the resampled map.


cells in region : 22,500,000

The results:

Speed:
r.watershed took 5459x as long as r.watershed.fast, i.e. 
r.watershed.fast needed only 0.02% of the time of r.watershed (here 
10h2m55s vs. 1m7s, 10 hours versus 1 minute...).


Flow direction differences:
22288539 cells, i.e. 99% of all cells show the same flow direction.

Flow accumulation differences:
20963653 cells, i.e. 93% of all cells show the same accumulation value.

Memory usage of r.watershed and r.watershed.fast: maximum of about 940MB
I don't understand why memory usage increases after SECTION 1a: 
Initiating Memory is completed.
Assuming that there is no longer a time constraint but only a memory 
constraint (although SECTION 4: Watershed Determination can take some 
time on large maps with a large threshold value), the upper region sizes 
that r.watershed.fast can process in RAM would be *roughly* for

1GB RAM:  14,000,000 cells
2GB RAM:  38,000,000 cells
4GB RAM:  86,000,000 cells
8GB RAM: 181,000,000 cells
after putting 400MB aside for the system and other open applications. 
Estimate based on Linux 64bit.


If you want to repeat and analyse the tests with the North Carolina demo 
data set, the new r.watershed.fast is here 
http://markus.metz.giswork.googlepages.com/r.watershed_fast_version.tar.gz 
and the test script is below.


Regards,

Markus


test script:
g.region rast=elevation
time r.watershed [EMAIL PROTECTED] accumulation=nc_accum_old 
drainage=nc_dir_old basin=nc_sheds_old stream=nc_streams_old thresh=500
time r.watershed.fast [EMAIL PROTECTED] 
accumulation=nc_accum_fast drainage=nc_dir_fast basin=nc_sheds_fast 
stream=nc_streams_fast thresh=500

r.mapcalc nc_dir_dif='if((nc_dir_old - nc_dir_fast != 0),1,0)'
r.mapcalc nc_accum_dif='if((nc_accum_old - nc_accum_fast != 0),1,0)'
r.stats -c [EMAIL PROTECTED]
r.stats -c [EMAIL PROTECTED]

r.resamp.rst [EMAIL PROTECTED] ew_res=3 ns_res=3 
elev=elevation_rst overlap=3 zmult=1.0 tension=40.

g.region rast=elevation_rst
time r.watershed [EMAIL PROTECTED] 
accumulation=nc_rst_accum_old drainage=nc_rst_dir_old 
basin=nc_rst_sheds_old stream=nc_rst_streams_old thresh=500
time r.watershed.fast [EMAIL PROTECTED] 
accumulation=nc_rst_accum_fast drainage=nc_rst_dir_fast 
basin=nc_rst_sheds_fast stream=nc_rst_streams_fast thresh=500
r.mapcalc nc_rst_dir_dif='if((nc_rst_dir_old - nc_rst_dir_fast != 
0),1,0)'
r.mapcalc nc_rst_accum_dif='if((nc_rst_accum_old - nc_rst_accum_fast 
!= 0),1,0)'

r.stats -c [EMAIL PROTECTED]
r.stats -c [EMAIL PROTECTED]

___
grass-user mailing list
grass-user@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-user


Re: [GRASS-user] r.watershed speed-up

2008-08-01 Thread G. Allegri
Great. Things get better and faster!
I've tried on a not so big region. But it was enough:

989861 cells (172286 null cells under MASK, where I have the see)

It has taken about 55 seconds (I don't have time to set up a
profiling), where I asked for drain and visual outputs creation.

The stats are (calculated in OO):

DIFF-VALUES  N.PIXELS PERCENTAGE
-7  56  0,0068
-6  30  0,0037
-5  32  0,0039
-4  49  0,0060
-3  38  0,0046
-2  180 0,0220
-1  914 0,1118
0   814650  99,6422
1   910 0,1113
2   229 0,0280
3   79  0,0097
4   42  0,0051
5   21  0,0026
6   70  0,0086
7   152 0,0186
8   22  0,0027
9   21  0,0026
10  5   0,0006
11  15  0,0018
12  30  0,0037
13  18  0,0022
14  3   0,0004
15  5   0,0006
16  4   0,0005

Things has run better then with the previous version, as 99.6% of
values are equal to r.watershed.

Thanks, a lot, for r.watershed.fast! :-)
Giovanni

2008/8/1 Markus Metz [EMAIL PROTECTED]:
 Hello list,

 there is now a new version of r.watershed.fast where results are even more
 similar to r.watershed. They are still not 100% identical to r.watershed,
 but I can't get it more similar. But it comes with a further speed increase.
 I repeated the test of Moritz with the same commands on GRASS 6.4.svn,
 results are below.

 Moritz Lennert wrote:

 First test in North Carolina demo data set:

 g.region rast=elevation

 time r.watershed [EMAIL PROTECTED] accumulation=old_accum
 drainage=old_dir basin=old_sheds stream=old_streams thresh=500

 real19m2.744s
 user18m41.318s
 sys 0m1.884s

 time r.watershed.fast [EMAIL PROTECTED]
 accumulation=fast_accum drainage=fast_dir basin=fast_sheds
 stream=fast_streams thresh=500

 real0m18.034s
 user0m17.833s
 sys 0m0.196s

 Absolute times are not really comparable between systems, but relative
 differences in time should be similar. The following numbers are calculated
 with real time. In the test Moritz did, r.watershed took 63x as long as
 r.watershed.fast, i.e. r.watershed.fast needed only 1.6% of the time of
 r.watershed.
 New version: r.watershed took 127x as long as r.watershed.fast, i.e.
 r.watershed.fast needed only 0.8% of the time of r.watershed.

 Of the 2025000 cells in the map, 1991218 show the same direction, i.e.
 98%. Those which have different directions are overwhelmingly low slope
 cells.

 New version: 2004480 cells, i.e. 99% of all cells show the same flow
 direction.

 1833907 cells have the same accumulation value, i.e. 90%, but I guess this
 is to be expected.

 New version: 1921510 cells, i.e. 95% of all cells show the same accumulation
 value.

 The idea is that a faster r.watershed can also be used for massive grids,
 where GRASS users frequently gave up using r.watershed because it would have
 taken hours or even days. I resampled elevation in the North Carolina demo
 data set from 10m to 3m with r.resamp.rst using default values (after the
 GRASS book Section 5.3.3, paragraph Regularized spline with tension (RST)
 interpolation) to generate a fairly large map and ran the same test on the
 resampled map.

 cells in region : 22,500,000

 The results:

 Speed:
 r.watershed took 5459x as long as r.watershed.fast, i.e. r.watershed.fast
 needed only 0.02% of the time of r.watershed (here 10h2m55s vs. 1m7s, 10
 hours versus 1 minute...).

 Flow direction differences:
 22288539 cells, i.e. 99% of all cells show the same flow direction.

 Flow accumulation differences:
 20963653 cells, i.e. 93% of all cells show the same accumulation value.

 Memory usage of r.watershed and r.watershed.fast: maximum of about 940MB
 I don't understand why memory usage increases after SECTION 1a: Initiating
 Memory is completed.
 Assuming that there is no longer a time constraint but only a memory
 constraint (although SECTION 4: Watershed Determination can take some time
 on large maps with a large threshold value), the upper region sizes that
 r.watershed.fast can process in RAM would be *roughly* for
 1GB RAM:  14,000,000 cells
 2GB RAM:  38,000,000 cells
 4GB RAM:  86,000,000 cells
 8GB RAM: 181,000,000 cells
 after putting 400MB aside for the system and other open applications.
 Estimate based on Linux 64bit.

 If you want to repeat and analyse the tests with the North Carolina demo
 data set, the new r.watershed.fast is here
 http://markus.metz.giswork.googlepages.com/r.watershed_fast_version.tar.gz
 and the test script is below.

 Regards,

 Markus


 test script:
 g.region rast=elevation
 time r.watershed [EMAIL PROTECTED] accumulation=nc_accum_old
 drainage=nc_dir_old basin=nc_sheds_old stream=nc_streams_old thresh=500
 time r.watershed.fast [EMAIL PROTECTED]
 accumulation=nc_accum_fast drainage=nc_dir_fast basin=nc_sheds_fast
 stream=nc_streams_fast thresh=500
 r.mapcalc nc_dir_dif='if((nc_dir_old - nc_dir_fast != 0),1,0)'
 r.mapcalc 

Re: [GRASS-user] r.watershed speed-up

2008-08-01 Thread G. Allegri
With the fast drainage versions I could reproduce water outlet
basins with unsignificant differences. Apart the time saving :-)

2008/8/1 G. Allegri [EMAIL PROTECTED]:
 Great. Things get better and faster!
 I've tried on a not so big region. But it was enough:

 989861 cells (172286 null cells under MASK, where I have the see)

 It has taken about 55 seconds (I don't have time to set up a
 profiling), where I asked for drain and visual outputs creation.

 The stats are (calculated in OO):

 DIFF-VALUES  N.PIXELS PERCENTAGE
 -7  56  0,0068
 -6  30  0,0037
 -5  32  0,0039
 -4  49  0,0060
 -3  38  0,0046
 -2  180 0,0220
 -1  914 0,1118
 0   814650  99,6422
 1   910 0,1113
 2   229 0,0280
 3   79  0,0097
 4   42  0,0051
 5   21  0,0026
 6   70  0,0086
 7   152 0,0186
 8   22  0,0027
 9   21  0,0026
 10  5   0,0006
 11  15  0,0018
 12  30  0,0037
 13  18  0,0022
 14  3   0,0004
 15  5   0,0006
 16  4   0,0005

 Things has run better then with the previous version, as 99.6% of
 values are equal to r.watershed.

 Thanks, a lot, for r.watershed.fast! :-)
 Giovanni

 2008/8/1 Markus Metz [EMAIL PROTECTED]:
 Hello list,

 there is now a new version of r.watershed.fast where results are even more
 similar to r.watershed. They are still not 100% identical to r.watershed,
 but I can't get it more similar. But it comes with a further speed increase.
 I repeated the test of Moritz with the same commands on GRASS 6.4.svn,
 results are below.

 Moritz Lennert wrote:

 First test in North Carolina demo data set:

 g.region rast=elevation

 time r.watershed [EMAIL PROTECTED] accumulation=old_accum
 drainage=old_dir basin=old_sheds stream=old_streams thresh=500

 real19m2.744s
 user18m41.318s
 sys 0m1.884s

 time r.watershed.fast [EMAIL PROTECTED]
 accumulation=fast_accum drainage=fast_dir basin=fast_sheds
 stream=fast_streams thresh=500

 real0m18.034s
 user0m17.833s
 sys 0m0.196s

 Absolute times are not really comparable between systems, but relative
 differences in time should be similar. The following numbers are calculated
 with real time. In the test Moritz did, r.watershed took 63x as long as
 r.watershed.fast, i.e. r.watershed.fast needed only 1.6% of the time of
 r.watershed.
 New version: r.watershed took 127x as long as r.watershed.fast, i.e.
 r.watershed.fast needed only 0.8% of the time of r.watershed.

 Of the 2025000 cells in the map, 1991218 show the same direction, i.e.
 98%. Those which have different directions are overwhelmingly low slope
 cells.

 New version: 2004480 cells, i.e. 99% of all cells show the same flow
 direction.

 1833907 cells have the same accumulation value, i.e. 90%, but I guess this
 is to be expected.

 New version: 1921510 cells, i.e. 95% of all cells show the same accumulation
 value.

 The idea is that a faster r.watershed can also be used for massive grids,
 where GRASS users frequently gave up using r.watershed because it would have
 taken hours or even days. I resampled elevation in the North Carolina demo
 data set from 10m to 3m with r.resamp.rst using default values (after the
 GRASS book Section 5.3.3, paragraph Regularized spline with tension (RST)
 interpolation) to generate a fairly large map and ran the same test on the
 resampled map.

 cells in region : 22,500,000

 The results:

 Speed:
 r.watershed took 5459x as long as r.watershed.fast, i.e. r.watershed.fast
 needed only 0.02% of the time of r.watershed (here 10h2m55s vs. 1m7s, 10
 hours versus 1 minute...).

 Flow direction differences:
 22288539 cells, i.e. 99% of all cells show the same flow direction.

 Flow accumulation differences:
 20963653 cells, i.e. 93% of all cells show the same accumulation value.

 Memory usage of r.watershed and r.watershed.fast: maximum of about 940MB
 I don't understand why memory usage increases after SECTION 1a: Initiating
 Memory is completed.
 Assuming that there is no longer a time constraint but only a memory
 constraint (although SECTION 4: Watershed Determination can take some time
 on large maps with a large threshold value), the upper region sizes that
 r.watershed.fast can process in RAM would be *roughly* for
 1GB RAM:  14,000,000 cells
 2GB RAM:  38,000,000 cells
 4GB RAM:  86,000,000 cells
 8GB RAM: 181,000,000 cells
 after putting 400MB aside for the system and other open applications.
 Estimate based on Linux 64bit.

 If you want to repeat and analyse the tests with the North Carolina demo
 data set, the new r.watershed.fast is here
 http://markus.metz.giswork.googlepages.com/r.watershed_fast_version.tar.gz
 and the test script is below.

 Regards,

 Markus


 test script:
 g.region rast=elevation
 time r.watershed [EMAIL PROTECTED] accumulation=nc_accum_old
 drainage=nc_dir_old basin=nc_sheds_old stream=nc_streams_old thresh=500
 time r.watershed.fast 

Re: [GRASS-user] r.watershed speed-up

2008-07-29 Thread Thomas Adams

Markus,

Thank you very much! I have not tested it yet, but after I do I'll write 
you with my impressions. I think this could be a very valuable contribution.


Regards,
Tom

Markus Metz wrote:

Hello list,

I'm not sure if this list is the right place or rather the developer 
list.
For the A * Search algorithm in r.watershed (memory version without -m 
flag set, r.watershed.ram), I implemented a binary min-heap instead of 
a linear array to store points in ascending order relative to 
elevation. This improves the speed of SECTION 2: A * Search 
considerably, the larger the map, the larger the speed improvement. A 
region with about 1,800,000 cells is processed about 37 times faster 
than with the original routine. A region with about 11,000,000 cells 
is processed about 170 times faster than with the original routine 
(46.7sec vs. 2h12m9sec on my system). SECTION 4: Watershed 
Determination takes now the longest.


Several tests (different region sizes, different DEMs) showed that the 
results of the binary heap version are very similar but not 100% 
identical to the original r.watershed, because of a slightly different 
treatment of cells with equal elevation in a binary min-heap compared 
to a linear array. Differences are found in difficult areas (equal 
elevation of several neighbouring cells) where there are several 
possible solutions for how water could flow. Still, compared with 
other hydrology analysis methods, e.g. r.terraflow, the results can be 
regarded as identical.


The original code was only altered where necessary, only the sorting 
method is new, everything else is unchanged.
Memory usage increases a bit, because a binary heap needs its own 
index for each analysed point (in addition to the other indices 
already needed by r.watershed.ram).


My question is if there is interest in this faster version of 
r.watershed and if someone wants to test it.
The source code is available on 
http://markus.metz.giswork.googlepages.com/


Regards,

Markus

___
grass-user mailing list
grass-user@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-user



--
Thomas E Adams
National Weather Service
Ohio River Forecast Center
1901 South State Route 134
Wilmington, OH 45177

EMAIL:  [EMAIL PROTECTED]

VOICE:  937-383-0528
FAX:937-383-0033

___
grass-user mailing list
grass-user@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-user


[GRASS-user] r.watershed speed-up

2008-07-29 Thread ivan marchesini
First results of one test I did: excellent!!!

1392776 cells dem with steep slopes but also plain areas

r.watershed.fast: 7 seconds
r.watershed:  4 min and 14 seconds

no difference at all between the two generated network paths...

markus: many thanks..
 

Ivan


Il giorno lun, 28/07/2008 alle 23.14 +0200, Markus Metz ha scritto:
 Hello list,
 
 I'm not sure if this list is the right place or rather the developer list.
 For the A * Search algorithm in r.watershed (memory version without -m 
 flag set, r.watershed.ram), I implemented a binary min-heap instead of a 
 linear array to store points in ascending order relative to elevation. 
 This improves the speed of SECTION 2: A * Search considerably, the 
 larger the map, the larger the speed improvement. A region with about 
 1,800,000 cells is processed about 37 times faster than with the 
 original routine. A region with about 11,000,000 cells is processed 
 about 170 times faster than with the original routine (46.7sec vs. 
 2h12m9sec on my system). SECTION 4: Watershed Determination takes now 
 the longest.
 
 Several tests (different region sizes, different DEMs) showed that the 
 results of the binary heap version are very similar but not 100% 
 identical to the original r.watershed, because of a slightly different 
 treatment of cells with equal elevation in a binary min-heap compared to 
 a linear array. Differences are found in difficult areas (equal 
 elevation of several neighbouring cells) where there are several 
 possible solutions for how water could flow. Still, compared with other 
 hydrology analysis methods, e.g. r.terraflow, the results can be 
 regarded as identical.
 
 The original code was only altered where necessary, only the sorting 
 method is new, everything else is unchanged.
 Memory usage increases a bit, because a binary heap needs its own index 
 for each analysed point (in addition to the other indices already needed 
 by r.watershed.ram).
 
 My question is if there is interest in this faster version of 
 r.watershed and if someone wants to test it.
 The source code is available on http://markus.metz.giswork.googlepages.com/
 
 Regards,
 
 Markus
 
 ___
 grass-user mailing list
 grass-user@lists.osgeo.org
 http://lists.osgeo.org/mailman/listinfo/grass-user
 
-- 
Ti prego di cercare di non inviarmi files .dwg, .doc, .xls, .ppt.
Preferisco formati liberi.
Please try to avoid to send me .dwg, .doc, .xls, .ppt files.
I prefer free formats.
http://it.wikipedia.org/wiki/Formato_aperto
http://en.wikipedia.org/wiki/Open_format

Ivan Marchesini
Department of Civil and Environmental Engineering
University of Perugia
Via G. Duranti 93/a 
06125
Perugia (Italy)
Socio fondatore GFOSS Geospatial Free and Open Source Software 
http://www.gfoss.it
e-mail: [EMAIL PROTECTED]
[EMAIL PROTECTED]
tel: +39(0)755853760
fax (university): +39(0)755853756
fax (home): +39(0)5782830887
jabber: [EMAIL PROTECTED]

___
grass-user mailing list
grass-user@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-user


Re: [GRASS-user] r.watershed speed-up

2008-07-29 Thread G. Allegri
Ivan anticipated me for a bit.
GREAT Markus! I could crunch Sardinia in one shot. I haven't measured
the time but it was less then 2 minutes, while with r.watershed I got
stalled.
Analyzing the differences between the r.watershed and r.watershed.fast
for a narrower region, there was some subtle, sparse differences
(1,-1,7) meaning 45° differences:

The following are the differences values (from r.mapcalc) statistics
(number of pixels for values)

 -7 76
-6 34
-5 30
-4 49
-3 43
-2 194
-1 1165
0 813894
1 1218
2 253
3 83
4 45
5 25
6 93
7 250
8 22
9 21
10 5
11 15
12 30
13 18
14 3
15 5
16 4
* 36917172

2008/7/29 ivan marchesini [EMAIL PROTECTED]:
 First results of one test I did: excellent!!!

 1392776 cells dem with steep slopes but also plain areas

 r.watershed.fast: 7 seconds
 r.watershed:  4 min and 14 seconds

 no difference at all between the two generated network paths...

 markus: many thanks..


 Ivan


 Il giorno lun, 28/07/2008 alle 23.14 +0200, Markus Metz ha scritto:
 Hello list,

 I'm not sure if this list is the right place or rather the developer list.
 For the A * Search algorithm in r.watershed (memory version without -m
 flag set, r.watershed.ram), I implemented a binary min-heap instead of a
 linear array to store points in ascending order relative to elevation.
 This improves the speed of SECTION 2: A * Search considerably, the
 larger the map, the larger the speed improvement. A region with about
 1,800,000 cells is processed about 37 times faster than with the
 original routine. A region with about 11,000,000 cells is processed
 about 170 times faster than with the original routine (46.7sec vs.
 2h12m9sec on my system). SECTION 4: Watershed Determination takes now
 the longest.

 Several tests (different region sizes, different DEMs) showed that the
 results of the binary heap version are very similar but not 100%
 identical to the original r.watershed, because of a slightly different
 treatment of cells with equal elevation in a binary min-heap compared to
 a linear array. Differences are found in difficult areas (equal
 elevation of several neighbouring cells) where there are several
 possible solutions for how water could flow. Still, compared with other
 hydrology analysis methods, e.g. r.terraflow, the results can be
 regarded as identical.

 The original code was only altered where necessary, only the sorting
 method is new, everything else is unchanged.
 Memory usage increases a bit, because a binary heap needs its own index
 for each analysed point (in addition to the other indices already needed
 by r.watershed.ram).

 My question is if there is interest in this faster version of
 r.watershed and if someone wants to test it.
 The source code is available on http://markus.metz.giswork.googlepages.com/

 Regards,

 Markus

 ___
 grass-user mailing list
 grass-user@lists.osgeo.org
 http://lists.osgeo.org/mailman/listinfo/grass-user

 --
 Ti prego di cercare di non inviarmi files .dwg, .doc, .xls, .ppt.
 Preferisco formati liberi.
 Please try to avoid to send me .dwg, .doc, .xls, .ppt files.
 I prefer free formats.
 http://it.wikipedia.org/wiki/Formato_aperto
 http://en.wikipedia.org/wiki/Open_format

 Ivan Marchesini
 Department of Civil and Environmental Engineering
 University of Perugia
 Via G. Duranti 93/a
 06125
 Perugia (Italy)
 Socio fondatore GFOSS Geospatial Free and Open Source Software 
 http://www.gfoss.it
 e-mail: [EMAIL PROTECTED]
[EMAIL PROTECTED]
 tel: +39(0)755853760
 fax (university): +39(0)755853756
 fax (home): +39(0)5782830887
 jabber: [EMAIL PROTECTED]

 ___
 grass-user mailing list
 grass-user@lists.osgeo.org
 http://lists.osgeo.org/mailman/listinfo/grass-user

___
grass-user mailing list
grass-user@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-user


Re: [GRASS-user] r.watershed speed-up

2008-07-29 Thread Markus Metz
Yes, there are some differences. Please look at where these differences 
are located and if they are significantly changing the results. In flat 
areas, there will be several possibilities about how water could flow. 
This might also affect the flowpath further upstream. Both the original 
version and the fast version have a certain degree of randomness on how 
to find a path through flat terrain, but it's a different type 
randomness, so to speak. The fast version can not exactly replicate the 
results, but it is pretty close. If there are only slight shifts in the 
flowpath and these differences can be justified by comparing them to the 
input DEM, the results could be useful. I think of particular interest 
are larger differences in flow accumulation, because they have a big 
influence on other output like basins and flow direction.


Thanks for testing!


G. Allegri wrote:

A correction: as you can see from the stats, the differences are not
only  (1,-1,7) as I was saying...  Now I have to check how much these
differences affect the results. Anyway, above 99.5 % of values are
equal.

2008/7/29 G. Allegri [EMAIL PROTECTED]:
  

Ivan anticipated me for a bit.
GREAT Markus! I could crunch Sardinia in one shot. I haven't measured
the time but it was less then 2 minutes, while with r.watershed I got
stalled.
Analyzing the differences between the r.watershed and r.watershed.fast
for a narrower region, there was some subtle, sparse differences
(1,-1,7) meaning 45° differences:

The following are the differences values (from r.mapcalc) statistics
(number of pixels for values)

 -7 76
-6 34
-5 30
-4 49
-3 43
-2 194
-1 1165
0 813894
1 1218
2 253
3 83
4 45
5 25
6 93
7 250
8 22
9 21
10 5
11 15
12 30
13 18
14 3
15 5
16 4
* 36917172



___
grass-user mailing list
grass-user@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-user


Re: [GRASS-user] r.watershed speed-up

2008-07-29 Thread Michael Barton

We're definitely interested in giving this a try too.

Michael

C. Michael Barton, Professor of Anthropology
Director of Graduate Studies
School of Human Evolution  Social Change
Center for Social Dynamics  Complexity
Arizona State University

Phone: 480-965-6262
Fax: 480-965-7671
www: www.public.asu.edu/~cmbarton



On Jul 29, 2008, at 8:44 AM, [EMAIL PROTECTED] wrote:


Date: Tue, 29 Jul 2008 15:24:20 +0200
From: ivan marchesini [EMAIL PROTECTED]
Subject: [GRASS-user] r.watershed speed-up
To: Markus Metz [EMAIL PROTECTED]
Cc: grass-user@lists.osgeo.org
Message-ID: [EMAIL PROTECTED]
Content-Type: text/plain

First results of one test I did: excellent!!!

1392776 cells dem with steep slopes but also plain areas

r.watershed.fast: 7 seconds
r.watershed:  4 min and 14 seconds

no difference at all between the two generated network paths...

markus: many thanks..


Ivan



___
grass-user mailing list
grass-user@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-user


Re: [GRASS-user] r.watershed speed-up

2008-07-29 Thread Moritz Lennert

On 28/07/08 23:14, Markus Metz wrote:

Hello list,

I'm not sure if this list is the right place or rather the developer list.
For the A * Search algorithm in r.watershed (memory version without -m 
flag set, r.watershed.ram), I implemented a binary min-heap instead of a 
linear array to store points in ascending order relative to elevation. 
This improves the speed of SECTION 2: A * Search considerably, the 
larger the map, the larger the speed improvement. A region with about 
1,800,000 cells is processed about 37 times faster than with the 
original routine. A region with about 11,000,000 cells is processed 
about 170 times faster than with the original routine (46.7sec vs. 
2h12m9sec on my system). SECTION 4: Watershed Determination takes now 
the longest.


Several tests (different region sizes, different DEMs) showed that the 
results of the binary heap version are very similar but not 100% 
identical to the original r.watershed, because of a slightly different 
treatment of cells with equal elevation in a binary min-heap compared to 
a linear array. Differences are found in difficult areas (equal 
elevation of several neighbouring cells) where there are several 
possible solutions for how water could flow. Still, compared with other 
hydrology analysis methods, e.g. r.terraflow, the results can be 
regarded as identical.


The original code was only altered where necessary, only the sorting 
method is new, everything else is unchanged.
Memory usage increases a bit, because a binary heap needs its own index 
for each analysed point (in addition to the other indices already needed 
by r.watershed.ram).


My question is if there is interest in this faster version of 
r.watershed and if someone wants to test it.


I think there definitely is interest (see the number of mails about the 
long running time of r.watershed). The important issue will be the 
quality of results.


First test in North Carolina demo data set:

g.region rast=elevation

time r.watershed [EMAIL PROTECTED] accumulation=old_accum 
drainage=old_dir basin=old_sheds stream=old_streams thresh=500


real19m2.744s
user18m41.318s
sys 0m1.884s

time r.watershed.fast [EMAIL PROTECTED] 
accumulation=fast_accum drainage=fast_dir basin=fast_sheds 
stream=fast_streams thresh=500


real0m18.034s
user0m17.833s
sys 0m0.196s

Impressive !

Comparison of watersheds is a bit difficult because of different cat 
numbers and colors...but the two seem fairly similar.


Of the 2025000 cells in the map, 1991218 show the same direction, i.e. 
98%. Those which have different directions are overwhelmingly low slope 
cells.


1833907 cells have the same accumulation value, i.e. 90%, but I guess 
this is to be expected.


I'll look at the details of the differences later.

Moritz

Moritz
___
grass-user mailing list
grass-user@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-user


RE: [GRASS-user] r.watershed speed-up

2008-07-29 Thread Charles Ehlschlaeger
Dear Markus and other r.watershed enthusiasts,
I've thought of a way to make your faster version of r.watershed give
identical results as the older r.watershed. r.watershed.old uses a breadth
first search in section 2. r.watershed.fast is breadth first when the DEM
values are different. The trick would be to slightly adjust DEM values by
when they were added to the heap. Here is some pseudo-code
cellsAddedToHeap = 0
minCellIncrement = Double.MIN_VALUE # (java) This constant represents 
# the smallest increment a CELL 
# can be, if the CELLs are doubles.
# At the time a cell is added to heap, the DEM value placed 
# in the heap would be:
DemOfCellToHeap = DEM + (minCellIncrement * cellsAddedToHeap++))

Sincerely, chuck

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Markus Metz
Sent: Tuesday, July 29, 2008 10:44 AM
To: G. Allegri
Cc: grass-user@lists.osgeo.org
Subject: Re: [GRASS-user] r.watershed speed-up

Yes, there are some differences. Please look at where these differences 
are located and if they are significantly changing the results. In flat 
areas, there will be several possibilities about how water could flow. 
This might also affect the flowpath further upstream. Both the original 
version and the fast version have a certain degree of randomness on how 
to find a path through flat terrain, but it's a different type 
randomness, so to speak. The fast version can not exactly replicate the 
results, but it is pretty close. If there are only slight shifts in the 
flowpath and these differences can be justified by comparing them to the 
input DEM, the results could be useful. I think of particular interest 
are larger differences in flow accumulation, because they have a big 
influence on other output like basins and flow direction.

Thanks for testing!


G. Allegri wrote:
 A correction: as you can see from the stats, the differences are not
 only  (1,-1,7) as I was saying...  Now I have to check how much these
 differences affect the results. Anyway, above 99.5 % of values are
 equal.

 2008/7/29 G. Allegri [EMAIL PROTECTED]:
   
 Ivan anticipated me for a bit.
 GREAT Markus! I could crunch Sardinia in one shot. I haven't measured
 the time but it was less then 2 minutes, while with r.watershed I got
 stalled.
 Analyzing the differences between the r.watershed and r.watershed.fast
 for a narrower region, there was some subtle, sparse differences
 (1,-1,7) meaning 45° differences:

 The following are the differences values (from r.mapcalc) statistics
 (number of pixels for values)

  -7 76
 -6 34
 -5 30
 -4 49
 -3 43
 -2 194
 -1 1165
 0 813894
 1 1218
 2 253
 3 83
 4 45
 5 25
 6 93
 7 250
 8 22
 9 21
 10 5
 11 15
 12 30
 13 18
 14 3
 15 5
 16 4
 * 36917172

 
___
grass-user mailing list
grass-user@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-user


___
grass-user mailing list
grass-user@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-user


Re: [GRASS-user] r.watershed speed-up

2008-07-29 Thread Markus Metz

Dear Chuck,

r.watershed is a much valued tool in GRASS, for me the best watershed 
analsis tool not only in GRASS, therefore I thought about a a way to 
keep the results identical too. I am also aware that the closer the 
results produced by changes in the algorithm are to the results produced 
by original algorithm, the higher the chances that it will be accepted 
by the community.


With regard to your suggestion, I would not adjust DEM values, because 
in larger regions the minimum possible increment is already there in the 
data, i.e. there are no gaps in the data distribution that can be filled 
with adjusted values. One theoretical way out would be to read in DEMs 
as FCELL or DCELL, but then there is the floating point comparison 
problem. (I tried against better knowledge, it doesn't work). Regarding 
the breadth first search, where do you see breadth first when the DEM 
values are different? You lost me there. I don't see differences in how 
points are searched between the two versions, but maybe I have not fully 
understood the original algorithm. As far as I have understood the 
original algorithm, the list of astar_pts following astar_pts.nxt is 
kept in ascending order using elevation. If there are already points 
with equal elevation, the new point is inserted after all other points 
with the same elevation (line 91 in original do_astar.c), so that the 
point inserted first (of several points with equal elevation) will be 
removed first (line 19 in original do_astar.c). This is still the case 
in the new algorithm (insertion: line 136, removal: lines 31 and 192) 
most of the time. If the binary heap becomes fairly large and there are 
many points with equal elevation, there might be an exception. Please 
let me know if I got something wrong there!


Another possibility to produce the exact same results like in the 
original version would be to go recursively down the heap and pick the 
point added earliest from all points with elevation equal to the root 
point. This is easy to implement, but it would have slowed down the 
search algorithm somewhat and I wanted to get something lightening fast.


I have one main argument why it is not a disaster if the results are not 
100% identical:
The order in which neighbouring cells are added is in both versions, 
with respect to the focus cell:
low, up, left, right, upper right corner, lower left corner, lower right 
corner, upper left corner
This order is always kept, irrespective of the already established flow 
direction, thus it is a random order and there is not really a reason 
why the algorithm should stick to that order. I think a rare replacement 
of that random order (2% difference of flow direction in Moritz 
Lennert's test) with another random order (binary heap shuffling) is not 
a disaster and the result is still valid. I did build in a check to make 
results more similar, but there are still scenarios when this check 
doesn't catch.


So my main question to you, the original author of r.watershed, is, if a 
rare violation to the (in my opinion random) order in which neighbours 
are added to the list would cause the results to be no longer valid.
The other question is if I should provide now a version that really 
produces identical results, or if I first sort out the problem of how 
neighbours are (should be) added to and removed from the list. BTW, I 
tried to change the order of adding neighbours to the list too, taking 
into account the already established flow direction. It produces very 
straight lines in flat terrain, which is ok in hydrological terms, but 
some randomness looked better. Flat terrain in the DEM must not be flat 
in reality because of problems with DEM resolution and accuracy, 
randomness produces there more naturally looking results.


Sorry for the long reply!

Regards,

Markus


Charles Ehlschlaeger wrote:

Dear Markus and other r.watershed enthusiasts,
I've thought of a way to make your faster version of r.watershed give
identical results as the older r.watershed. r.watershed.old uses a breadth
first search in section 2. r.watershed.fast is breadth first when the DEM
values are different. The trick would be to slightly adjust DEM values by
when they were added to the heap. Here is some pseudo-code
cellsAddedToHeap = 0
minCellIncrement = Double.MIN_VALUE # (java) This constant represents 
# the smallest increment a CELL 
# can be, if the CELLs are doubles.
# At the time a cell is added to heap, the DEM value placed 
# in the heap would be:

DemOfCellToHeap = DEM + (minCellIncrement * cellsAddedToHeap++))

Sincerely, chuck
  

___
grass-user mailing list
grass-user@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-user


Re: [GRASS-user] r.watershed speed-up

2008-07-29 Thread Michael Barton
I just want to note something very tactfully mentioned below. That is,  
all such algorithms are estimates of real watershed behavior. AFAICT,  
r.watershed produces a very good estimate of these parameters and  
dynamics. Nevertheless, we should be careful about trying to exactly  
replicate its results--as if it IS the real world. We should not  
assume a priori that because they differ slightly that the new one is  
worse. We should at least consider the possibility that the new  
r.watershed.fast algorithm is equally good as the original in that it  
can replicate it to such a high degree. The small differences between  
the two could simply be where both are less able to model real  
landscapes (flat areas) and produce equally good estimates even though  
they differ slightly. It is also possible that the new algorithm makes  
r.watershed.fast an even closer match to reality than r.watershed and  
the differences are where the new modules is better than the old one.  
These are empirical questions.


Michael

On Jul 29, 2008, at 3:41 PM, [EMAIL PROTECTED] wrote:


Date: Tue, 29 Jul 2008 20:46:02 +0200
From: Markus Metz [EMAIL PROTECTED]
Subject: Re: [GRASS-user] r.watershed speed-up
To: Charles Ehlschlaeger [EMAIL PROTECTED]
Cc: grass-user@lists.osgeo.org
Message-ID: [EMAIL PROTECTED]
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Dear Chuck,

r.watershed is a much valued tool in GRASS, for me the best watershed
analsis tool not only in GRASS, therefore I thought about a a way to
keep the results identical too. I am also aware that the closer the
results produced by changes in the algorithm are to the results  
produced

by original algorithm, the higher the chances that it will be accepted
by the community.

With regard to your suggestion, I would not adjust DEM values, because
in larger regions the minimum possible increment is already there in  
the
data, i.e. there are no gaps in the data distribution that can be  
filled

with adjusted values. One theoretical way out would be to read in DEMs
as FCELL or DCELL, but then there is the floating point comparison
problem. (I tried against better knowledge, it doesn't work).  
Regarding

the breadth first search, where do you see breadth first when the DEM
values are different? You lost me there. I don't see differences in  
how
points are searched between the two versions, but maybe I have not  
fully

understood the original algorithm. As far as I have understood the
original algorithm, the list of astar_pts following astar_pts.nxt is
kept in ascending order using elevation. If there are already points
with equal elevation, the new point is inserted after all other points
with the same elevation (line 91 in original do_astar.c), so that the
point inserted first (of several points with equal elevation) will be
removed first (line 19 in original do_astar.c). This is still the case
in the new algorithm (insertion: line 136, removal: lines 31 and 192)
most of the time. If the binary heap becomes fairly large and there  
are

many points with equal elevation, there might be an exception. Please
let me know if I got something wrong there!

Another possibility to produce the exact same results like in the
original version would be to go recursively down the heap and pick the
point added earliest from all points with elevation equal to the root
point. This is easy to implement, but it would have slowed down the
search algorithm somewhat and I wanted to get something lightening  
fast.


I have one main argument why it is not a disaster if the results are  
not

100% identical:
The order in which neighbouring cells are added is in both versions,
with respect to the focus cell:
low, up, left, right, upper right corner, lower left corner, lower  
right

corner, upper left corner
This order is always kept, irrespective of the already established  
flow

direction, thus it is a random order and there is not really a reason
why the algorithm should stick to that order. I think a rare  
replacement

of that random order (2% difference of flow direction in Moritz
Lennert's test) with another random order (binary heap shuffling) is  
not
a disaster and the result is still valid. I did build in a check to  
make

results more similar, but there are still scenarios when this check
doesn't catch.

So my main question to you, the original author of r.watershed, is,  
if a

rare violation to the (in my opinion random) order in which neighbours
are added to the list would cause the results to be no longer valid.
The other question is if I should provide now a version that really
produces identical results, or if I first sort out the problem of how
neighbours are (should be) added to and removed from the list. BTW, I
tried to change the order of adding neighbours to the list too, taking
into account the already established flow direction. It produces very
straight lines in flat terrain, which is ok in hydrological terms, but
some randomness looked better. Flat

[GRASS-user] r.watershed speed-up

2008-07-28 Thread Markus Metz

Hello list,

I'm not sure if this list is the right place or rather the developer list.
For the A * Search algorithm in r.watershed (memory version without -m 
flag set, r.watershed.ram), I implemented a binary min-heap instead of a 
linear array to store points in ascending order relative to elevation. 
This improves the speed of SECTION 2: A * Search considerably, the 
larger the map, the larger the speed improvement. A region with about 
1,800,000 cells is processed about 37 times faster than with the 
original routine. A region with about 11,000,000 cells is processed 
about 170 times faster than with the original routine (46.7sec vs. 
2h12m9sec on my system). SECTION 4: Watershed Determination takes now 
the longest.


Several tests (different region sizes, different DEMs) showed that the 
results of the binary heap version are very similar but not 100% 
identical to the original r.watershed, because of a slightly different 
treatment of cells with equal elevation in a binary min-heap compared to 
a linear array. Differences are found in difficult areas (equal 
elevation of several neighbouring cells) where there are several 
possible solutions for how water could flow. Still, compared with other 
hydrology analysis methods, e.g. r.terraflow, the results can be 
regarded as identical.


The original code was only altered where necessary, only the sorting 
method is new, everything else is unchanged.
Memory usage increases a bit, because a binary heap needs its own index 
for each analysed point (in addition to the other indices already needed 
by r.watershed.ram).


My question is if there is interest in this faster version of 
r.watershed and if someone wants to test it.

The source code is available on http://markus.metz.giswork.googlepages.com/

Regards,

Markus

___
grass-user mailing list
grass-user@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-user