Re: [GRASS-user] r.watershed speed-up
Hello list, there is now a new version of r.watershed.fast where results are even more similar to r.watershed. They are still not 100% identical to r.watershed, but I can't get it more similar. But it comes with a further speed increase. I repeated the test of Moritz with the same commands on GRASS 6.4.svn, results are below. Moritz Lennert wrote: First test in North Carolina demo data set: g.region rast=elevation time r.watershed [EMAIL PROTECTED] accumulation=old_accum drainage=old_dir basin=old_sheds stream=old_streams thresh=500 real19m2.744s user18m41.318s sys 0m1.884s time r.watershed.fast [EMAIL PROTECTED] accumulation=fast_accum drainage=fast_dir basin=fast_sheds stream=fast_streams thresh=500 real0m18.034s user0m17.833s sys 0m0.196s Absolute times are not really comparable between systems, but relative differences in time should be similar. The following numbers are calculated with real time. In the test Moritz did, r.watershed took 63x as long as r.watershed.fast, i.e. r.watershed.fast needed only 1.6% of the time of r.watershed. New version: r.watershed took 127x as long as r.watershed.fast, i.e. r.watershed.fast needed only 0.8% of the time of r.watershed. Of the 2025000 cells in the map, 1991218 show the same direction, i.e. 98%. Those which have different directions are overwhelmingly low slope cells. New version: 2004480 cells, i.e. 99% of all cells show the same flow direction. 1833907 cells have the same accumulation value, i.e. 90%, but I guess this is to be expected. New version: 1921510 cells, i.e. 95% of all cells show the same accumulation value. The idea is that a faster r.watershed can also be used for massive grids, where GRASS users frequently gave up using r.watershed because it would have taken hours or even days. I resampled elevation in the North Carolina demo data set from 10m to 3m with r.resamp.rst using default values (after the GRASS book Section 5.3.3, paragraph Regularized spline with tension (RST) interpolation) to generate a fairly large map and ran the same test on the resampled map. cells in region : 22,500,000 The results: Speed: r.watershed took 5459x as long as r.watershed.fast, i.e. r.watershed.fast needed only 0.02% of the time of r.watershed (here 10h2m55s vs. 1m7s, 10 hours versus 1 minute...). Flow direction differences: 22288539 cells, i.e. 99% of all cells show the same flow direction. Flow accumulation differences: 20963653 cells, i.e. 93% of all cells show the same accumulation value. Memory usage of r.watershed and r.watershed.fast: maximum of about 940MB I don't understand why memory usage increases after SECTION 1a: Initiating Memory is completed. Assuming that there is no longer a time constraint but only a memory constraint (although SECTION 4: Watershed Determination can take some time on large maps with a large threshold value), the upper region sizes that r.watershed.fast can process in RAM would be *roughly* for 1GB RAM: 14,000,000 cells 2GB RAM: 38,000,000 cells 4GB RAM: 86,000,000 cells 8GB RAM: 181,000,000 cells after putting 400MB aside for the system and other open applications. Estimate based on Linux 64bit. If you want to repeat and analyse the tests with the North Carolina demo data set, the new r.watershed.fast is here http://markus.metz.giswork.googlepages.com/r.watershed_fast_version.tar.gz and the test script is below. Regards, Markus test script: g.region rast=elevation time r.watershed [EMAIL PROTECTED] accumulation=nc_accum_old drainage=nc_dir_old basin=nc_sheds_old stream=nc_streams_old thresh=500 time r.watershed.fast [EMAIL PROTECTED] accumulation=nc_accum_fast drainage=nc_dir_fast basin=nc_sheds_fast stream=nc_streams_fast thresh=500 r.mapcalc nc_dir_dif='if((nc_dir_old - nc_dir_fast != 0),1,0)' r.mapcalc nc_accum_dif='if((nc_accum_old - nc_accum_fast != 0),1,0)' r.stats -c [EMAIL PROTECTED] r.stats -c [EMAIL PROTECTED] r.resamp.rst [EMAIL PROTECTED] ew_res=3 ns_res=3 elev=elevation_rst overlap=3 zmult=1.0 tension=40. g.region rast=elevation_rst time r.watershed [EMAIL PROTECTED] accumulation=nc_rst_accum_old drainage=nc_rst_dir_old basin=nc_rst_sheds_old stream=nc_rst_streams_old thresh=500 time r.watershed.fast [EMAIL PROTECTED] accumulation=nc_rst_accum_fast drainage=nc_rst_dir_fast basin=nc_rst_sheds_fast stream=nc_rst_streams_fast thresh=500 r.mapcalc nc_rst_dir_dif='if((nc_rst_dir_old - nc_rst_dir_fast != 0),1,0)' r.mapcalc nc_rst_accum_dif='if((nc_rst_accum_old - nc_rst_accum_fast != 0),1,0)' r.stats -c [EMAIL PROTECTED] r.stats -c [EMAIL PROTECTED] ___ grass-user mailing list grass-user@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-user
Re: [GRASS-user] r.watershed speed-up
Great. Things get better and faster! I've tried on a not so big region. But it was enough: 989861 cells (172286 null cells under MASK, where I have the see) It has taken about 55 seconds (I don't have time to set up a profiling), where I asked for drain and visual outputs creation. The stats are (calculated in OO): DIFF-VALUES N.PIXELS PERCENTAGE -7 56 0,0068 -6 30 0,0037 -5 32 0,0039 -4 49 0,0060 -3 38 0,0046 -2 180 0,0220 -1 914 0,1118 0 814650 99,6422 1 910 0,1113 2 229 0,0280 3 79 0,0097 4 42 0,0051 5 21 0,0026 6 70 0,0086 7 152 0,0186 8 22 0,0027 9 21 0,0026 10 5 0,0006 11 15 0,0018 12 30 0,0037 13 18 0,0022 14 3 0,0004 15 5 0,0006 16 4 0,0005 Things has run better then with the previous version, as 99.6% of values are equal to r.watershed. Thanks, a lot, for r.watershed.fast! :-) Giovanni 2008/8/1 Markus Metz [EMAIL PROTECTED]: Hello list, there is now a new version of r.watershed.fast where results are even more similar to r.watershed. They are still not 100% identical to r.watershed, but I can't get it more similar. But it comes with a further speed increase. I repeated the test of Moritz with the same commands on GRASS 6.4.svn, results are below. Moritz Lennert wrote: First test in North Carolina demo data set: g.region rast=elevation time r.watershed [EMAIL PROTECTED] accumulation=old_accum drainage=old_dir basin=old_sheds stream=old_streams thresh=500 real19m2.744s user18m41.318s sys 0m1.884s time r.watershed.fast [EMAIL PROTECTED] accumulation=fast_accum drainage=fast_dir basin=fast_sheds stream=fast_streams thresh=500 real0m18.034s user0m17.833s sys 0m0.196s Absolute times are not really comparable between systems, but relative differences in time should be similar. The following numbers are calculated with real time. In the test Moritz did, r.watershed took 63x as long as r.watershed.fast, i.e. r.watershed.fast needed only 1.6% of the time of r.watershed. New version: r.watershed took 127x as long as r.watershed.fast, i.e. r.watershed.fast needed only 0.8% of the time of r.watershed. Of the 2025000 cells in the map, 1991218 show the same direction, i.e. 98%. Those which have different directions are overwhelmingly low slope cells. New version: 2004480 cells, i.e. 99% of all cells show the same flow direction. 1833907 cells have the same accumulation value, i.e. 90%, but I guess this is to be expected. New version: 1921510 cells, i.e. 95% of all cells show the same accumulation value. The idea is that a faster r.watershed can also be used for massive grids, where GRASS users frequently gave up using r.watershed because it would have taken hours or even days. I resampled elevation in the North Carolina demo data set from 10m to 3m with r.resamp.rst using default values (after the GRASS book Section 5.3.3, paragraph Regularized spline with tension (RST) interpolation) to generate a fairly large map and ran the same test on the resampled map. cells in region : 22,500,000 The results: Speed: r.watershed took 5459x as long as r.watershed.fast, i.e. r.watershed.fast needed only 0.02% of the time of r.watershed (here 10h2m55s vs. 1m7s, 10 hours versus 1 minute...). Flow direction differences: 22288539 cells, i.e. 99% of all cells show the same flow direction. Flow accumulation differences: 20963653 cells, i.e. 93% of all cells show the same accumulation value. Memory usage of r.watershed and r.watershed.fast: maximum of about 940MB I don't understand why memory usage increases after SECTION 1a: Initiating Memory is completed. Assuming that there is no longer a time constraint but only a memory constraint (although SECTION 4: Watershed Determination can take some time on large maps with a large threshold value), the upper region sizes that r.watershed.fast can process in RAM would be *roughly* for 1GB RAM: 14,000,000 cells 2GB RAM: 38,000,000 cells 4GB RAM: 86,000,000 cells 8GB RAM: 181,000,000 cells after putting 400MB aside for the system and other open applications. Estimate based on Linux 64bit. If you want to repeat and analyse the tests with the North Carolina demo data set, the new r.watershed.fast is here http://markus.metz.giswork.googlepages.com/r.watershed_fast_version.tar.gz and the test script is below. Regards, Markus test script: g.region rast=elevation time r.watershed [EMAIL PROTECTED] accumulation=nc_accum_old drainage=nc_dir_old basin=nc_sheds_old stream=nc_streams_old thresh=500 time r.watershed.fast [EMAIL PROTECTED] accumulation=nc_accum_fast drainage=nc_dir_fast basin=nc_sheds_fast stream=nc_streams_fast thresh=500 r.mapcalc nc_dir_dif='if((nc_dir_old - nc_dir_fast != 0),1,0)' r.mapcalc
Re: [GRASS-user] r.watershed speed-up
With the fast drainage versions I could reproduce water outlet basins with unsignificant differences. Apart the time saving :-) 2008/8/1 G. Allegri [EMAIL PROTECTED]: Great. Things get better and faster! I've tried on a not so big region. But it was enough: 989861 cells (172286 null cells under MASK, where I have the see) It has taken about 55 seconds (I don't have time to set up a profiling), where I asked for drain and visual outputs creation. The stats are (calculated in OO): DIFF-VALUES N.PIXELS PERCENTAGE -7 56 0,0068 -6 30 0,0037 -5 32 0,0039 -4 49 0,0060 -3 38 0,0046 -2 180 0,0220 -1 914 0,1118 0 814650 99,6422 1 910 0,1113 2 229 0,0280 3 79 0,0097 4 42 0,0051 5 21 0,0026 6 70 0,0086 7 152 0,0186 8 22 0,0027 9 21 0,0026 10 5 0,0006 11 15 0,0018 12 30 0,0037 13 18 0,0022 14 3 0,0004 15 5 0,0006 16 4 0,0005 Things has run better then with the previous version, as 99.6% of values are equal to r.watershed. Thanks, a lot, for r.watershed.fast! :-) Giovanni 2008/8/1 Markus Metz [EMAIL PROTECTED]: Hello list, there is now a new version of r.watershed.fast where results are even more similar to r.watershed. They are still not 100% identical to r.watershed, but I can't get it more similar. But it comes with a further speed increase. I repeated the test of Moritz with the same commands on GRASS 6.4.svn, results are below. Moritz Lennert wrote: First test in North Carolina demo data set: g.region rast=elevation time r.watershed [EMAIL PROTECTED] accumulation=old_accum drainage=old_dir basin=old_sheds stream=old_streams thresh=500 real19m2.744s user18m41.318s sys 0m1.884s time r.watershed.fast [EMAIL PROTECTED] accumulation=fast_accum drainage=fast_dir basin=fast_sheds stream=fast_streams thresh=500 real0m18.034s user0m17.833s sys 0m0.196s Absolute times are not really comparable between systems, but relative differences in time should be similar. The following numbers are calculated with real time. In the test Moritz did, r.watershed took 63x as long as r.watershed.fast, i.e. r.watershed.fast needed only 1.6% of the time of r.watershed. New version: r.watershed took 127x as long as r.watershed.fast, i.e. r.watershed.fast needed only 0.8% of the time of r.watershed. Of the 2025000 cells in the map, 1991218 show the same direction, i.e. 98%. Those which have different directions are overwhelmingly low slope cells. New version: 2004480 cells, i.e. 99% of all cells show the same flow direction. 1833907 cells have the same accumulation value, i.e. 90%, but I guess this is to be expected. New version: 1921510 cells, i.e. 95% of all cells show the same accumulation value. The idea is that a faster r.watershed can also be used for massive grids, where GRASS users frequently gave up using r.watershed because it would have taken hours or even days. I resampled elevation in the North Carolina demo data set from 10m to 3m with r.resamp.rst using default values (after the GRASS book Section 5.3.3, paragraph Regularized spline with tension (RST) interpolation) to generate a fairly large map and ran the same test on the resampled map. cells in region : 22,500,000 The results: Speed: r.watershed took 5459x as long as r.watershed.fast, i.e. r.watershed.fast needed only 0.02% of the time of r.watershed (here 10h2m55s vs. 1m7s, 10 hours versus 1 minute...). Flow direction differences: 22288539 cells, i.e. 99% of all cells show the same flow direction. Flow accumulation differences: 20963653 cells, i.e. 93% of all cells show the same accumulation value. Memory usage of r.watershed and r.watershed.fast: maximum of about 940MB I don't understand why memory usage increases after SECTION 1a: Initiating Memory is completed. Assuming that there is no longer a time constraint but only a memory constraint (although SECTION 4: Watershed Determination can take some time on large maps with a large threshold value), the upper region sizes that r.watershed.fast can process in RAM would be *roughly* for 1GB RAM: 14,000,000 cells 2GB RAM: 38,000,000 cells 4GB RAM: 86,000,000 cells 8GB RAM: 181,000,000 cells after putting 400MB aside for the system and other open applications. Estimate based on Linux 64bit. If you want to repeat and analyse the tests with the North Carolina demo data set, the new r.watershed.fast is here http://markus.metz.giswork.googlepages.com/r.watershed_fast_version.tar.gz and the test script is below. Regards, Markus test script: g.region rast=elevation time r.watershed [EMAIL PROTECTED] accumulation=nc_accum_old drainage=nc_dir_old basin=nc_sheds_old stream=nc_streams_old thresh=500 time r.watershed.fast
Re: [GRASS-user] r.watershed speed-up
Markus, Thank you very much! I have not tested it yet, but after I do I'll write you with my impressions. I think this could be a very valuable contribution. Regards, Tom Markus Metz wrote: Hello list, I'm not sure if this list is the right place or rather the developer list. For the A * Search algorithm in r.watershed (memory version without -m flag set, r.watershed.ram), I implemented a binary min-heap instead of a linear array to store points in ascending order relative to elevation. This improves the speed of SECTION 2: A * Search considerably, the larger the map, the larger the speed improvement. A region with about 1,800,000 cells is processed about 37 times faster than with the original routine. A region with about 11,000,000 cells is processed about 170 times faster than with the original routine (46.7sec vs. 2h12m9sec on my system). SECTION 4: Watershed Determination takes now the longest. Several tests (different region sizes, different DEMs) showed that the results of the binary heap version are very similar but not 100% identical to the original r.watershed, because of a slightly different treatment of cells with equal elevation in a binary min-heap compared to a linear array. Differences are found in difficult areas (equal elevation of several neighbouring cells) where there are several possible solutions for how water could flow. Still, compared with other hydrology analysis methods, e.g. r.terraflow, the results can be regarded as identical. The original code was only altered where necessary, only the sorting method is new, everything else is unchanged. Memory usage increases a bit, because a binary heap needs its own index for each analysed point (in addition to the other indices already needed by r.watershed.ram). My question is if there is interest in this faster version of r.watershed and if someone wants to test it. The source code is available on http://markus.metz.giswork.googlepages.com/ Regards, Markus ___ grass-user mailing list grass-user@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-user -- Thomas E Adams National Weather Service Ohio River Forecast Center 1901 South State Route 134 Wilmington, OH 45177 EMAIL: [EMAIL PROTECTED] VOICE: 937-383-0528 FAX:937-383-0033 ___ grass-user mailing list grass-user@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-user
[GRASS-user] r.watershed speed-up
First results of one test I did: excellent!!! 1392776 cells dem with steep slopes but also plain areas r.watershed.fast: 7 seconds r.watershed: 4 min and 14 seconds no difference at all between the two generated network paths... markus: many thanks.. Ivan Il giorno lun, 28/07/2008 alle 23.14 +0200, Markus Metz ha scritto: Hello list, I'm not sure if this list is the right place or rather the developer list. For the A * Search algorithm in r.watershed (memory version without -m flag set, r.watershed.ram), I implemented a binary min-heap instead of a linear array to store points in ascending order relative to elevation. This improves the speed of SECTION 2: A * Search considerably, the larger the map, the larger the speed improvement. A region with about 1,800,000 cells is processed about 37 times faster than with the original routine. A region with about 11,000,000 cells is processed about 170 times faster than with the original routine (46.7sec vs. 2h12m9sec on my system). SECTION 4: Watershed Determination takes now the longest. Several tests (different region sizes, different DEMs) showed that the results of the binary heap version are very similar but not 100% identical to the original r.watershed, because of a slightly different treatment of cells with equal elevation in a binary min-heap compared to a linear array. Differences are found in difficult areas (equal elevation of several neighbouring cells) where there are several possible solutions for how water could flow. Still, compared with other hydrology analysis methods, e.g. r.terraflow, the results can be regarded as identical. The original code was only altered where necessary, only the sorting method is new, everything else is unchanged. Memory usage increases a bit, because a binary heap needs its own index for each analysed point (in addition to the other indices already needed by r.watershed.ram). My question is if there is interest in this faster version of r.watershed and if someone wants to test it. The source code is available on http://markus.metz.giswork.googlepages.com/ Regards, Markus ___ grass-user mailing list grass-user@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-user -- Ti prego di cercare di non inviarmi files .dwg, .doc, .xls, .ppt. Preferisco formati liberi. Please try to avoid to send me .dwg, .doc, .xls, .ppt files. I prefer free formats. http://it.wikipedia.org/wiki/Formato_aperto http://en.wikipedia.org/wiki/Open_format Ivan Marchesini Department of Civil and Environmental Engineering University of Perugia Via G. Duranti 93/a 06125 Perugia (Italy) Socio fondatore GFOSS Geospatial Free and Open Source Software http://www.gfoss.it e-mail: [EMAIL PROTECTED] [EMAIL PROTECTED] tel: +39(0)755853760 fax (university): +39(0)755853756 fax (home): +39(0)5782830887 jabber: [EMAIL PROTECTED] ___ grass-user mailing list grass-user@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-user
Re: [GRASS-user] r.watershed speed-up
Ivan anticipated me for a bit. GREAT Markus! I could crunch Sardinia in one shot. I haven't measured the time but it was less then 2 minutes, while with r.watershed I got stalled. Analyzing the differences between the r.watershed and r.watershed.fast for a narrower region, there was some subtle, sparse differences (1,-1,7) meaning 45° differences: The following are the differences values (from r.mapcalc) statistics (number of pixels for values) -7 76 -6 34 -5 30 -4 49 -3 43 -2 194 -1 1165 0 813894 1 1218 2 253 3 83 4 45 5 25 6 93 7 250 8 22 9 21 10 5 11 15 12 30 13 18 14 3 15 5 16 4 * 36917172 2008/7/29 ivan marchesini [EMAIL PROTECTED]: First results of one test I did: excellent!!! 1392776 cells dem with steep slopes but also plain areas r.watershed.fast: 7 seconds r.watershed: 4 min and 14 seconds no difference at all between the two generated network paths... markus: many thanks.. Ivan Il giorno lun, 28/07/2008 alle 23.14 +0200, Markus Metz ha scritto: Hello list, I'm not sure if this list is the right place or rather the developer list. For the A * Search algorithm in r.watershed (memory version without -m flag set, r.watershed.ram), I implemented a binary min-heap instead of a linear array to store points in ascending order relative to elevation. This improves the speed of SECTION 2: A * Search considerably, the larger the map, the larger the speed improvement. A region with about 1,800,000 cells is processed about 37 times faster than with the original routine. A region with about 11,000,000 cells is processed about 170 times faster than with the original routine (46.7sec vs. 2h12m9sec on my system). SECTION 4: Watershed Determination takes now the longest. Several tests (different region sizes, different DEMs) showed that the results of the binary heap version are very similar but not 100% identical to the original r.watershed, because of a slightly different treatment of cells with equal elevation in a binary min-heap compared to a linear array. Differences are found in difficult areas (equal elevation of several neighbouring cells) where there are several possible solutions for how water could flow. Still, compared with other hydrology analysis methods, e.g. r.terraflow, the results can be regarded as identical. The original code was only altered where necessary, only the sorting method is new, everything else is unchanged. Memory usage increases a bit, because a binary heap needs its own index for each analysed point (in addition to the other indices already needed by r.watershed.ram). My question is if there is interest in this faster version of r.watershed and if someone wants to test it. The source code is available on http://markus.metz.giswork.googlepages.com/ Regards, Markus ___ grass-user mailing list grass-user@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-user -- Ti prego di cercare di non inviarmi files .dwg, .doc, .xls, .ppt. Preferisco formati liberi. Please try to avoid to send me .dwg, .doc, .xls, .ppt files. I prefer free formats. http://it.wikipedia.org/wiki/Formato_aperto http://en.wikipedia.org/wiki/Open_format Ivan Marchesini Department of Civil and Environmental Engineering University of Perugia Via G. Duranti 93/a 06125 Perugia (Italy) Socio fondatore GFOSS Geospatial Free and Open Source Software http://www.gfoss.it e-mail: [EMAIL PROTECTED] [EMAIL PROTECTED] tel: +39(0)755853760 fax (university): +39(0)755853756 fax (home): +39(0)5782830887 jabber: [EMAIL PROTECTED] ___ grass-user mailing list grass-user@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-user ___ grass-user mailing list grass-user@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-user
Re: [GRASS-user] r.watershed speed-up
Yes, there are some differences. Please look at where these differences are located and if they are significantly changing the results. In flat areas, there will be several possibilities about how water could flow. This might also affect the flowpath further upstream. Both the original version and the fast version have a certain degree of randomness on how to find a path through flat terrain, but it's a different type randomness, so to speak. The fast version can not exactly replicate the results, but it is pretty close. If there are only slight shifts in the flowpath and these differences can be justified by comparing them to the input DEM, the results could be useful. I think of particular interest are larger differences in flow accumulation, because they have a big influence on other output like basins and flow direction. Thanks for testing! G. Allegri wrote: A correction: as you can see from the stats, the differences are not only (1,-1,7) as I was saying... Now I have to check how much these differences affect the results. Anyway, above 99.5 % of values are equal. 2008/7/29 G. Allegri [EMAIL PROTECTED]: Ivan anticipated me for a bit. GREAT Markus! I could crunch Sardinia in one shot. I haven't measured the time but it was less then 2 minutes, while with r.watershed I got stalled. Analyzing the differences between the r.watershed and r.watershed.fast for a narrower region, there was some subtle, sparse differences (1,-1,7) meaning 45° differences: The following are the differences values (from r.mapcalc) statistics (number of pixels for values) -7 76 -6 34 -5 30 -4 49 -3 43 -2 194 -1 1165 0 813894 1 1218 2 253 3 83 4 45 5 25 6 93 7 250 8 22 9 21 10 5 11 15 12 30 13 18 14 3 15 5 16 4 * 36917172 ___ grass-user mailing list grass-user@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-user
Re: [GRASS-user] r.watershed speed-up
We're definitely interested in giving this a try too. Michael C. Michael Barton, Professor of Anthropology Director of Graduate Studies School of Human Evolution Social Change Center for Social Dynamics Complexity Arizona State University Phone: 480-965-6262 Fax: 480-965-7671 www: www.public.asu.edu/~cmbarton On Jul 29, 2008, at 8:44 AM, [EMAIL PROTECTED] wrote: Date: Tue, 29 Jul 2008 15:24:20 +0200 From: ivan marchesini [EMAIL PROTECTED] Subject: [GRASS-user] r.watershed speed-up To: Markus Metz [EMAIL PROTECTED] Cc: grass-user@lists.osgeo.org Message-ID: [EMAIL PROTECTED] Content-Type: text/plain First results of one test I did: excellent!!! 1392776 cells dem with steep slopes but also plain areas r.watershed.fast: 7 seconds r.watershed: 4 min and 14 seconds no difference at all between the two generated network paths... markus: many thanks.. Ivan ___ grass-user mailing list grass-user@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-user
Re: [GRASS-user] r.watershed speed-up
On 28/07/08 23:14, Markus Metz wrote: Hello list, I'm not sure if this list is the right place or rather the developer list. For the A * Search algorithm in r.watershed (memory version without -m flag set, r.watershed.ram), I implemented a binary min-heap instead of a linear array to store points in ascending order relative to elevation. This improves the speed of SECTION 2: A * Search considerably, the larger the map, the larger the speed improvement. A region with about 1,800,000 cells is processed about 37 times faster than with the original routine. A region with about 11,000,000 cells is processed about 170 times faster than with the original routine (46.7sec vs. 2h12m9sec on my system). SECTION 4: Watershed Determination takes now the longest. Several tests (different region sizes, different DEMs) showed that the results of the binary heap version are very similar but not 100% identical to the original r.watershed, because of a slightly different treatment of cells with equal elevation in a binary min-heap compared to a linear array. Differences are found in difficult areas (equal elevation of several neighbouring cells) where there are several possible solutions for how water could flow. Still, compared with other hydrology analysis methods, e.g. r.terraflow, the results can be regarded as identical. The original code was only altered where necessary, only the sorting method is new, everything else is unchanged. Memory usage increases a bit, because a binary heap needs its own index for each analysed point (in addition to the other indices already needed by r.watershed.ram). My question is if there is interest in this faster version of r.watershed and if someone wants to test it. I think there definitely is interest (see the number of mails about the long running time of r.watershed). The important issue will be the quality of results. First test in North Carolina demo data set: g.region rast=elevation time r.watershed [EMAIL PROTECTED] accumulation=old_accum drainage=old_dir basin=old_sheds stream=old_streams thresh=500 real19m2.744s user18m41.318s sys 0m1.884s time r.watershed.fast [EMAIL PROTECTED] accumulation=fast_accum drainage=fast_dir basin=fast_sheds stream=fast_streams thresh=500 real0m18.034s user0m17.833s sys 0m0.196s Impressive ! Comparison of watersheds is a bit difficult because of different cat numbers and colors...but the two seem fairly similar. Of the 2025000 cells in the map, 1991218 show the same direction, i.e. 98%. Those which have different directions are overwhelmingly low slope cells. 1833907 cells have the same accumulation value, i.e. 90%, but I guess this is to be expected. I'll look at the details of the differences later. Moritz Moritz ___ grass-user mailing list grass-user@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-user
RE: [GRASS-user] r.watershed speed-up
Dear Markus and other r.watershed enthusiasts, I've thought of a way to make your faster version of r.watershed give identical results as the older r.watershed. r.watershed.old uses a breadth first search in section 2. r.watershed.fast is breadth first when the DEM values are different. The trick would be to slightly adjust DEM values by when they were added to the heap. Here is some pseudo-code cellsAddedToHeap = 0 minCellIncrement = Double.MIN_VALUE # (java) This constant represents # the smallest increment a CELL # can be, if the CELLs are doubles. # At the time a cell is added to heap, the DEM value placed # in the heap would be: DemOfCellToHeap = DEM + (minCellIncrement * cellsAddedToHeap++)) Sincerely, chuck -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Markus Metz Sent: Tuesday, July 29, 2008 10:44 AM To: G. Allegri Cc: grass-user@lists.osgeo.org Subject: Re: [GRASS-user] r.watershed speed-up Yes, there are some differences. Please look at where these differences are located and if they are significantly changing the results. In flat areas, there will be several possibilities about how water could flow. This might also affect the flowpath further upstream. Both the original version and the fast version have a certain degree of randomness on how to find a path through flat terrain, but it's a different type randomness, so to speak. The fast version can not exactly replicate the results, but it is pretty close. If there are only slight shifts in the flowpath and these differences can be justified by comparing them to the input DEM, the results could be useful. I think of particular interest are larger differences in flow accumulation, because they have a big influence on other output like basins and flow direction. Thanks for testing! G. Allegri wrote: A correction: as you can see from the stats, the differences are not only (1,-1,7) as I was saying... Now I have to check how much these differences affect the results. Anyway, above 99.5 % of values are equal. 2008/7/29 G. Allegri [EMAIL PROTECTED]: Ivan anticipated me for a bit. GREAT Markus! I could crunch Sardinia in one shot. I haven't measured the time but it was less then 2 minutes, while with r.watershed I got stalled. Analyzing the differences between the r.watershed and r.watershed.fast for a narrower region, there was some subtle, sparse differences (1,-1,7) meaning 45° differences: The following are the differences values (from r.mapcalc) statistics (number of pixels for values) -7 76 -6 34 -5 30 -4 49 -3 43 -2 194 -1 1165 0 813894 1 1218 2 253 3 83 4 45 5 25 6 93 7 250 8 22 9 21 10 5 11 15 12 30 13 18 14 3 15 5 16 4 * 36917172 ___ grass-user mailing list grass-user@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-user ___ grass-user mailing list grass-user@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-user
Re: [GRASS-user] r.watershed speed-up
Dear Chuck, r.watershed is a much valued tool in GRASS, for me the best watershed analsis tool not only in GRASS, therefore I thought about a a way to keep the results identical too. I am also aware that the closer the results produced by changes in the algorithm are to the results produced by original algorithm, the higher the chances that it will be accepted by the community. With regard to your suggestion, I would not adjust DEM values, because in larger regions the minimum possible increment is already there in the data, i.e. there are no gaps in the data distribution that can be filled with adjusted values. One theoretical way out would be to read in DEMs as FCELL or DCELL, but then there is the floating point comparison problem. (I tried against better knowledge, it doesn't work). Regarding the breadth first search, where do you see breadth first when the DEM values are different? You lost me there. I don't see differences in how points are searched between the two versions, but maybe I have not fully understood the original algorithm. As far as I have understood the original algorithm, the list of astar_pts following astar_pts.nxt is kept in ascending order using elevation. If there are already points with equal elevation, the new point is inserted after all other points with the same elevation (line 91 in original do_astar.c), so that the point inserted first (of several points with equal elevation) will be removed first (line 19 in original do_astar.c). This is still the case in the new algorithm (insertion: line 136, removal: lines 31 and 192) most of the time. If the binary heap becomes fairly large and there are many points with equal elevation, there might be an exception. Please let me know if I got something wrong there! Another possibility to produce the exact same results like in the original version would be to go recursively down the heap and pick the point added earliest from all points with elevation equal to the root point. This is easy to implement, but it would have slowed down the search algorithm somewhat and I wanted to get something lightening fast. I have one main argument why it is not a disaster if the results are not 100% identical: The order in which neighbouring cells are added is in both versions, with respect to the focus cell: low, up, left, right, upper right corner, lower left corner, lower right corner, upper left corner This order is always kept, irrespective of the already established flow direction, thus it is a random order and there is not really a reason why the algorithm should stick to that order. I think a rare replacement of that random order (2% difference of flow direction in Moritz Lennert's test) with another random order (binary heap shuffling) is not a disaster and the result is still valid. I did build in a check to make results more similar, but there are still scenarios when this check doesn't catch. So my main question to you, the original author of r.watershed, is, if a rare violation to the (in my opinion random) order in which neighbours are added to the list would cause the results to be no longer valid. The other question is if I should provide now a version that really produces identical results, or if I first sort out the problem of how neighbours are (should be) added to and removed from the list. BTW, I tried to change the order of adding neighbours to the list too, taking into account the already established flow direction. It produces very straight lines in flat terrain, which is ok in hydrological terms, but some randomness looked better. Flat terrain in the DEM must not be flat in reality because of problems with DEM resolution and accuracy, randomness produces there more naturally looking results. Sorry for the long reply! Regards, Markus Charles Ehlschlaeger wrote: Dear Markus and other r.watershed enthusiasts, I've thought of a way to make your faster version of r.watershed give identical results as the older r.watershed. r.watershed.old uses a breadth first search in section 2. r.watershed.fast is breadth first when the DEM values are different. The trick would be to slightly adjust DEM values by when they were added to the heap. Here is some pseudo-code cellsAddedToHeap = 0 minCellIncrement = Double.MIN_VALUE # (java) This constant represents # the smallest increment a CELL # can be, if the CELLs are doubles. # At the time a cell is added to heap, the DEM value placed # in the heap would be: DemOfCellToHeap = DEM + (minCellIncrement * cellsAddedToHeap++)) Sincerely, chuck ___ grass-user mailing list grass-user@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-user
Re: [GRASS-user] r.watershed speed-up
I just want to note something very tactfully mentioned below. That is, all such algorithms are estimates of real watershed behavior. AFAICT, r.watershed produces a very good estimate of these parameters and dynamics. Nevertheless, we should be careful about trying to exactly replicate its results--as if it IS the real world. We should not assume a priori that because they differ slightly that the new one is worse. We should at least consider the possibility that the new r.watershed.fast algorithm is equally good as the original in that it can replicate it to such a high degree. The small differences between the two could simply be where both are less able to model real landscapes (flat areas) and produce equally good estimates even though they differ slightly. It is also possible that the new algorithm makes r.watershed.fast an even closer match to reality than r.watershed and the differences are where the new modules is better than the old one. These are empirical questions. Michael On Jul 29, 2008, at 3:41 PM, [EMAIL PROTECTED] wrote: Date: Tue, 29 Jul 2008 20:46:02 +0200 From: Markus Metz [EMAIL PROTECTED] Subject: Re: [GRASS-user] r.watershed speed-up To: Charles Ehlschlaeger [EMAIL PROTECTED] Cc: grass-user@lists.osgeo.org Message-ID: [EMAIL PROTECTED] Content-Type: text/plain; charset=ISO-8859-1; format=flowed Dear Chuck, r.watershed is a much valued tool in GRASS, for me the best watershed analsis tool not only in GRASS, therefore I thought about a a way to keep the results identical too. I am also aware that the closer the results produced by changes in the algorithm are to the results produced by original algorithm, the higher the chances that it will be accepted by the community. With regard to your suggestion, I would not adjust DEM values, because in larger regions the minimum possible increment is already there in the data, i.e. there are no gaps in the data distribution that can be filled with adjusted values. One theoretical way out would be to read in DEMs as FCELL or DCELL, but then there is the floating point comparison problem. (I tried against better knowledge, it doesn't work). Regarding the breadth first search, where do you see breadth first when the DEM values are different? You lost me there. I don't see differences in how points are searched between the two versions, but maybe I have not fully understood the original algorithm. As far as I have understood the original algorithm, the list of astar_pts following astar_pts.nxt is kept in ascending order using elevation. If there are already points with equal elevation, the new point is inserted after all other points with the same elevation (line 91 in original do_astar.c), so that the point inserted first (of several points with equal elevation) will be removed first (line 19 in original do_astar.c). This is still the case in the new algorithm (insertion: line 136, removal: lines 31 and 192) most of the time. If the binary heap becomes fairly large and there are many points with equal elevation, there might be an exception. Please let me know if I got something wrong there! Another possibility to produce the exact same results like in the original version would be to go recursively down the heap and pick the point added earliest from all points with elevation equal to the root point. This is easy to implement, but it would have slowed down the search algorithm somewhat and I wanted to get something lightening fast. I have one main argument why it is not a disaster if the results are not 100% identical: The order in which neighbouring cells are added is in both versions, with respect to the focus cell: low, up, left, right, upper right corner, lower left corner, lower right corner, upper left corner This order is always kept, irrespective of the already established flow direction, thus it is a random order and there is not really a reason why the algorithm should stick to that order. I think a rare replacement of that random order (2% difference of flow direction in Moritz Lennert's test) with another random order (binary heap shuffling) is not a disaster and the result is still valid. I did build in a check to make results more similar, but there are still scenarios when this check doesn't catch. So my main question to you, the original author of r.watershed, is, if a rare violation to the (in my opinion random) order in which neighbours are added to the list would cause the results to be no longer valid. The other question is if I should provide now a version that really produces identical results, or if I first sort out the problem of how neighbours are (should be) added to and removed from the list. BTW, I tried to change the order of adding neighbours to the list too, taking into account the already established flow direction. It produces very straight lines in flat terrain, which is ok in hydrological terms, but some randomness looked better. Flat
[GRASS-user] r.watershed speed-up
Hello list, I'm not sure if this list is the right place or rather the developer list. For the A * Search algorithm in r.watershed (memory version without -m flag set, r.watershed.ram), I implemented a binary min-heap instead of a linear array to store points in ascending order relative to elevation. This improves the speed of SECTION 2: A * Search considerably, the larger the map, the larger the speed improvement. A region with about 1,800,000 cells is processed about 37 times faster than with the original routine. A region with about 11,000,000 cells is processed about 170 times faster than with the original routine (46.7sec vs. 2h12m9sec on my system). SECTION 4: Watershed Determination takes now the longest. Several tests (different region sizes, different DEMs) showed that the results of the binary heap version are very similar but not 100% identical to the original r.watershed, because of a slightly different treatment of cells with equal elevation in a binary min-heap compared to a linear array. Differences are found in difficult areas (equal elevation of several neighbouring cells) where there are several possible solutions for how water could flow. Still, compared with other hydrology analysis methods, e.g. r.terraflow, the results can be regarded as identical. The original code was only altered where necessary, only the sorting method is new, everything else is unchanged. Memory usage increases a bit, because a binary heap needs its own index for each analysed point (in addition to the other indices already needed by r.watershed.ram). My question is if there is interest in this faster version of r.watershed and if someone wants to test it. The source code is available on http://markus.metz.giswork.googlepages.com/ Regards, Markus ___ grass-user mailing list grass-user@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-user