Hello everyone,

Here is my final report for GSoC 2021 project, Parallelization of Raster 
Modules for GRASS GIS.

Abstract
The goal of this project is to introduce parallelization to existing raster 
modules in GRASS GIS using OpenMP. This will allow users to take advantage of 
more cores in their hardware to speed up the computation time especially for 
large raster files with large computation cost. The key challenge of this 
project is to separate the parallelizable components from the sequential part 
of the modules without introducing too much overhead in terms of memory, disk 
or computation resources.

Milestones

In total, I have introduced OpenMP support to 8 raster modules in GRASS GIS. 
The pull requests to each module are as follows:

  *   r.univar - https://github.com/OSGeo/grass/pull/1634
  *   r.neighbors - 
https://github.com/OSGeo/grass/pull/<https://github.com/OSGeo/grass/pull/1634>1724<https://github.com/OSGeo/grass/pull/1724>
  *   r.mfilter - 
https://github.com/OSGeo/grass/pull/<https://github.com/OSGeo/grass/pull/1634>1708<https://github.com/OSGeo/grass/pull/1708>
  *   r.resamp.filter - 
https://github.com/OSGeo/grass/pull/<https://github.com/OSGeo/grass/pull/1634>1759<https://github.com/OSGeo/grass/pull/1759>
  *   r.resamp.interp - 
https://github.com/OSGeo/grass/pull/<https://github.com/OSGeo/grass/pull/1634>1771<https://github.com/OSGeo/grass/pull/1771>
  *   r.slope.aspect - 
https://github.com/OSGeo/grass/pull/<https://github.com/OSGeo/grass/pull/1634>1767<https://github.com/OSGeo/grass/pull/1767>
  *   r.series - 
https://github.com/OSGeo/grass/pull/<https://github.com/OSGeo/grass/pull/1634>1776<https://github.com/OSGeo/grass/pull/1776>
  *   r.patch - 
https://github.com/OSGeo/grass/pull/<https://github.com/OSGeo/grass/pull/1634>1782<https://github.com/OSGeo/grass/pull/1782>

Firstly, I have greatly underestimated the complexity of the work. Up to 20 
modules were initially proposed at first but after the second week. However, it 
became clear that we had to cut down on the number of target modules and focus 
more on designing the algorithms. The modules we targeted behave differently as 
compared to some modules that had received OpenMP support in the past such as 
r.sun. In particular, the modules need to keep the same of behavior of having 
low memory footprint even after the parallelization, unlike r.sun which loads 
the entire raster map in-memory.


During the first half of the GSoC, with the mentors’ discussion, we have come 
out with three different approaches to introducing parallel support to 
r.neighbors. After benchmarking their performance and taking account of their 
memory/disk usage, we decided to settle with the last approach which requires 
us to add an extra parameter memory to allow users to adjust their memory 
consumption. With this approach, we have to allow the modules to process the 
raster map by chunks. Once we settled about the design, we started applying the 
same approach to other similar modules with low memory footprints.

For more information regarding the implementation, see 
https://grasswiki.osgeo.org/wiki/Raster_Parallelization_with_OpenMP.


Furthermore, test scripts were included in the modules to ensure the 
consistency of the results. Benchmark scripts were added to allow users to 
easily benchmark the performance of the parallelization to monitor the speedup 
in their own local machine. User documentations were also modified to include 
sections detailing how to make use of the newly added features.

Future Work


In the future, more raster modules can be parallelized using similar approach. 
Then, we can consider tackling more complex modules such as r.watershed and 
r.mapcalc. Also, we could consider exploring 3D raster modules as well.


Furthermore, when we implement parallelization for r.univar, we notice that 
modules that produce statistics involving arithmetic can often have floating 
point discrepancies when dealing with large summation. Because of this, 
computation using different number of threads will now produce different 
results due to having different order of arithmetic. One idea would be to 
introduce Kahan Summation algorithm to reduce the floating-point discrepancies. 
However, this still would not guarantee the consistency of results.

Permanent Links

For the project overview, please visit the 
https://summerofcode.withgoogle.com/dashboard/project/6280792767987712/overview<https://summerofcode.withgoogle.com/dashboard/project/6280792767987712/overview/>/.
For the project timeline and logs, please visit the 
https://trac.osgeo.org/grass/wiki/GSoC/2021/RasterParallelization.

I would like to huge thanks to Huidae Cho, Vaclav Petras and Māris Nartišs for 
their valuable guidance. And I would like to thank the GRASS community for the 
valuable feedbacks and support. Lastly, I would like to thank the GSoC team for 
this opportunity.

Thanks all!

Warmest regards,
Aaron Saw Min Sern



_______________________________________________
grass-dev mailing list
grass-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/grass-dev

Reply via email to