Re: [GRASS-dev] [SoC] GSoC 2021 - Parallelization of raster modules for GRASS GIS

2021-08-10 Thread Moritz Lennert
Hi Aaron,

I haven't had the chance to test all this, yet, but it's great work ! One 
suggestion:

Le 9 août 2021 14:44:43 GMT+02:00, Aaron Saw Min Sern  a 
écrit :
>Hi everyone,
>
>Week 9 has concluded and here's my report for this week.
>
>1) What did I get done this week?
>
>r.univar 
>[https://github.com/OSGeo/grass/pull/1634]
>
>  *   Refactor previous implementation
>
>r.series 
>[https://github.com/OSGeo/grass/pull/1776]
>
>  *   Implement parallelization
>
>Implementation for r.patch is yet to be completed.
>
>2) What do I plan on doing next week?
>
>  *   Finish implementing r.patch parallelization
>  *   Write documentation on manual pages for each of the modules that have 
> been implemented
>Specifically, a section titled "Performance" to include user parameters for 
>parallel processing and expected behavior and issues.
> *   r.univar
> *   r.mfilter
> *   r.neighbors
> *   r.slope.aspect
> *   r.resamp.filter
> *   r.resamp.interp
> *   r.series
> *   r.patch
>  *   Include a wiki page on the general OpenMP implementation and the 
> benchmark results of each module
>
>Please let me know if there are any ideas for better documentation. Thanks!


I don't know what you plan on including in this wiki page, but if there is some 
sort of boiler plate code that could be used for parallelization of other 
modules, it would be great to have that somewhere, with detailed comments about 
each part does and what potential issues are.

Moritz
___
grass-dev mailing list
grass-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/grass-dev


[GRASS-dev] [SoC] GSoC 2021 - Parallelization of raster modules for GRASS GIS

2021-08-09 Thread Aaron Saw Min Sern
Hi everyone,

Week 9 has concluded and here's my report for this week.

1) What did I get done this week?

r.univar 
[https://github.com/OSGeo/grass/pull/1634]

  *   Refactor previous implementation

r.series 
[https://github.com/OSGeo/grass/pull/1776]

  *   Implement parallelization

Implementation for r.patch is yet to be completed.

2) What do I plan on doing next week?

  *   Finish implementing r.patch parallelization
  *   Write documentation on manual pages for each of the modules that have 
been implemented
Specifically, a section titled "Performance" to include user parameters for 
parallel processing and expected behavior and issues.
 *   r.univar
 *   r.mfilter
 *   r.neighbors
 *   r.slope.aspect
 *   r.resamp.filter
 *   r.resamp.interp
 *   r.series
 *   r.patch
  *   Include a wiki page on the general OpenMP implementation and the 
benchmark results of each module

Please let me know if there are any ideas for better documentation. Thanks!

3) Am I blocked on anything?
No major issues.

Cheers,
Aaron

___
grass-dev mailing list
grass-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/grass-dev


Re: [GRASS-dev] [SoC] GSoC 2021 - Parallelization of raster modules for GRASS GIS

2021-08-02 Thread Luca Delucchi
On Mon, 2 Aug 2021 at 15:52, Aaron Saw Min Sern  wrote:
>
> Hi everyone,
>

Dear Aaron,
thanks a lot for your work, I didn't test yet any your pull request
but benchmark seems to be really promising

>
> 2) What do I plan on doing next week?
>
> Refactor r.univar
> Implement parallelization for r.series, r.patch

r.series is really important module for temporal analysis and I'm
really happy to see and test it in the next weeks

>
> 3) Am I blocked on anything?
> No major issues.
>
> Thanks,
> Aaron
>

regards

-- 
ciao
Luca

www.lucadelu.org
___
grass-dev mailing list
grass-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/grass-dev


[GRASS-dev] [SoC] GSoC 2021 - Parallelization of raster modules for GRASS GIS

2021-08-02 Thread Aaron Saw Min Sern
Hi everyone,

Week 8 has concluded and here's my report for this week.

1) What did I get done this week?

r.resamp.interp [https://github.com/OSGeo/grass/pull/1771]

  *   Implement parallelization

r.slope.aspect [https://github.com/OSGeo/grass/pull/1767]

  *   Implement parallelization

Both implementation above follows similarly to r.neighbor 
[https://github.com/OSGeo/grass/pull/1724]. r.slope.aspect keeps track of 
global statistics variable like min/max, thus additional variable reduction is 
required aside from map computation. The benchmarking of the modules will be 
supplemented in the PR.

2) What do I plan on doing next week?

  *   Refactor r.univar
  *   Implement parallelization for r.series, r.patch
  *   Revisit r.proj to decide on implementation

3) Am I blocked on anything?
No major issues.

Thanks,
Aaron

___
grass-dev mailing list
grass-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/grass-dev


[GRASS-dev] [SoC] GSoC 2021 - Parallelization of raster modules for GRASS GIS

2021-07-26 Thread Aaron Saw Min Sern
Hi everyone,

Week 7 has concluded and here's my report for this week.

1) What did I get done this week?

  *   Introduce an environment variable that overwrites the default nprocs 
parameter which is currently 1. This is so that the users do not need to add 
nprocs parameter explicitly.

r.resamp.filter

  *   Implement parallelization
  *   Add test cases

2) What do I plan on doing next week?

  *   Implement parallelization for r.slope.aspect with testing and benchmarking

3) Am I blocked on anything?
No major issues.

Thanks,
Aaron

___
grass-dev mailing list
grass-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/grass-dev


[GRASS-dev] [SoC] GSoC 2021 - Parallelization of raster modules for GRASS GIS

2021-07-19 Thread Aaron Saw Min Sern
Hi everyone,

Week 6 has concluded and here's my report for this week.

1) What did I get done this week?
r.neighbors
The main goal that I have accomplished is to do a complete rework of the 
r.neighbors implementation (PR: https://github.com/OSGeo/grass/pull/1724). A 
benchmark script is ready under 'benchmark' directory for users to test the 
performance on their local machine. The performance is comparable to the 
previous implementation that make use of temporary files as buffer (on SSD) 
instead of memory. The result of the benchmarking on my local machine (12 
cores) is as follows:

[cid:78740075-d760-48f5-aff3-98a5a38cdc0f]

r.mfilter
There are issues pointed out when working on raster files > 2GB (PR: 
https://github.com/OSGeo/grass/pull/1708). This is promptly addressed with 
commit (4caa96), and the cause is due to overflow from multiplication. This PR 
is ready, and a benchmark script is provided as well for local benchmarking.

2) What do I plan on doing next week?

  *   Introduce an environment variable that overwrites the default nprocs 
parameter which is currently 1. This is so that the users do not need to add 
nprocs parameter explicitly.
  *   Implement r.resamp.filter/r.resamp.interp parallelization

3) Am I blocked on anything?
No major issues.

Thanks,
Aaron
___
grass-dev mailing list
grass-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/grass-dev


Re: [GRASS-dev] [SoC] GSoC 2021 - Parallelization of raster modules for GRASS GIS

2021-07-13 Thread Maris Nartiss
Hello Aaron,

2021-07-12 18:23 GMT+03:00, Aaron Saw Min Sern :
> Hi everyone,

> 2) What do I plan on doing next week?
>
>   *   Complete rework of r.neighbors implementation
>   *   Compare benchmark between the two implementations

To make most impact of your GSoC I strongly support your idea of
getting your implementations right. Better to have fewer modules
parallelized than having unstable code in them and thus risking
potential removal of the parallel code.
Anna was testing your code in r.mfilter and it was failing for her.
You should look into it as similar problem could also affect your work
in r.neighbors.

Good luck & keep up good work,
Māris.
___
grass-dev mailing list
grass-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/grass-dev


[GRASS-dev] [SoC] GSoC 2021 - Parallelization of raster modules for GRASS GIS

2021-07-12 Thread Aaron Saw Min Sern
Hi everyone,

Week 5 has concluded and here's my report for this week.

1) What did I get done this week?
To benchmark both r.mfilter and r.neighbor implementation, I have made use of 
the recently merged benchmark library on randomly generated raster using 
r.surf.fractal.

The preliminary result is as follows for both modules (y-axis - time/secs, 
x-axis - nprocs | benchmarked on my local workstation):
[cid:ab3537de-3128-4b41-89d0-248b273e37cf][cid:ba251de1-c387-4b3d-8e88-c3da9f94c378]
Furthermore, checks are done to compare between performance on master branch vs 
after implementation (nprocs = 1), and the results are comparable.

These two implementations make use of extensive disk I/O to write to temporary 
file buffer before transferring to the final raster file format. This behavior 
is default in r.mfilter, but is explicitly introduced in r.neighbors to allow 
for parallelization. Upon discussion with the mentors, we decided that we 
should make better use of memory over disk. Ideally, the user will be able to 
input the size of memory usage to be used for buffer. However, r.mfilter will 
still preserve its original usage of temporary files buffer.

2) What do I plan on doing next week?

  *   Complete rework of r.neighbors implementation
  *   Compare benchmark between the two implementations

3) Am I blocked on anything?
No major roadblock, but I need to catch up a bit to rework my r.neighbor 
implementation.

Thanks,
Aaron
___
grass-dev mailing list
grass-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/grass-dev


[GRASS-dev] [SoC] GSoC 2021 - Parallelization of raster modules for GRASS GIS

2021-07-05 Thread Aaron Saw Min Sern
Hi everyone,

Week 4 has concluded, and this is my report for this week.


1) What did I get done this week?

r.mfilter (PR: 
https://github.com/OSGeo/grass/pull/1708)

  *   Add test cases for different input options (Sequential/Parallel filters, 
repeated, null_mode)
  *   Add parallel implementations for all options excluding Sequential filters 
(inherently not possible to do parallelization

2) What do I plan on doing next week?
Upon discussion with the mentors, we decided to change the current 
implementation for r.neighbor that currently uses Segment libraries that uses a 
temporary file buffer for the different threads to work on before producing the 
raster file format. We realized that the Segment library does not fit the use 
cases enough to compensate for the overhead it might add. It was essentially 
used as an API to write to the file buffer, and we are not making good use of 
its caching capabilities. A native temporary file buffer should fit our use 
cases the most where the threads can write output simultaneously (which is the 
current implementation for r.mfilter).

Next week, I aimed to make the necessary changes for r.neighbor and do proper 
benchmarking on large raster files to monitor the performance gain from 
parallelization (r.mfilter).


3) Am I blocked on anything?

No, it has been good so far.


Github repo: https://github.com/aaronsms/grass


Any suggestions are welcome. Thanks!


Warmest regards,

Aaron

___
grass-dev mailing list
grass-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/grass-dev


[GRASS-dev] [SoC] GSoC 2021 - Parallelization of raster modules for GRASS GIS

2021-06-28 Thread Aaron Saw Min Sern
Hi everyone,

Week 3 has concluded, and this is my report for this week.


1) What did I get done this week?

Upon discussion with the mentors, we have decided to explore alternative 
designs to using Segment library as intermediate output buffer. Specifically, 
there are two designs in mind, one which simply increases the size of the 
buffer but does sequential I/O to fill and output from the buffer with 
intermediate parallel computation, and a more complicated one which tries to 
eliminate having the threads to wait for the I/O.

2) What do I plan on doing next week?
I plan to finalize the design by this week. [1]


3) Am I blocked on anything?

No, it has been good so far.


[1] https://github.com/aaronsms/grass


Any suggestions are welcome. Thanks!


Warmest regards,

Aaron

___
grass-dev mailing list
grass-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/grass-dev


[GRASS-dev] [SoC] GSoC 2021 - Parallelization of raster modules for GRASS GIS

2021-06-21 Thread Aaron Saw Min Sern
Hi everyone,

Week 2 has concluded, and this is my report for this week.

1) What did I get done this week?

r.univar

  *   Address changes for the PR  [1] , e.g. to use a standard option "nprocs" 
will now be a parameter for users to indicate the number of threads

r.neighbor

  *   Write test cases for parallel execution
  *   Drafted a PR alongside its implementation [2]

r.proj

  *   Write new test cases for the modules [3]

2) What do I plan on doing next week?

I have managed to come up with a way to parallelize output-based modules like 
r.neighbor. The idea is to make use of a temporary segment file to allow 
threads to perform random write operations, which is not possible directly on 
compressed raster format file without using intermediate cache. With this 
design in mind, I intend to continue to parallelize similar modules next week. 
Also, there may be ideas in discussion to encapsulate a benchmarking framework 
possibly under grass.benchmark as this will be used repeatedly in the future to 
measure performance.


3) Am I blocked on anything?

No, it has been good so far.


Warmest regards,

Aaron

[1] 
https://github.com/OSGeo/grass/pull/1634
[2] https://github.com/OSGeo/grass/pull/1654
[3] https://github.com/OSGeo/grass/pull/1663
[4] https://github.com/OSGeo/grass/pull/1670

___
grass-dev mailing list
grass-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/grass-dev


[GRASS-dev] [SoC] GSoC 2021 - Parallelization of raster modules for GRASS GIS

2021-06-13 Thread Aaron Saw Min Sern
Hi everyone,

Week 1 has concluded, and this is my report for this week.

1) What did I get done this week?

r.univar

  *   Updated Makefile to include OpenMP dependencies
  *   Wrote multi-threaded test cases to ensure consistency of the program
  *   Wrote benchmarking script to measure speedup
  *   Implemented parallel support
  *   Drafted the PR of abovementioned changes [1]

r.neighbor

  *   Investigated Segment library to support random access and write operations

2) What do I plan on doing next week?

The goal is to come out with a design for output-based modules. The next step 
is to finish the implementation for r.neighbor. Furthermore, I plan to 
investigate the thread-safety of Raster3D module for pthread implementation of 
r.mapcalc.


3) Am I blocked on anything?

No, it has been good so far, but I hope to improve on my pace.

[1] 
https://github.com/OSGeo/grass/pull/1634

___
grass-dev mailing list
grass-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/grass-dev