Re: [QGIS-Developer] Running grass algorithms in threads

2018-08-15 Thread Rudi von Staden
15T11:47:30 INFOprocessCommands end. Commands: ['g.proj -c
proj4="+proj=aea +lat_1=-24 +lat_2=-32 +lat_0=0 +lon_0=24 +x_0=0 +y_0=0
+datum=WGS84 +units=m +no_defs"', 'r.external
input="C:\\Users\\rudi\\GIS\\Projects\\LizeSuitableHabitatModel\\HabitatModels\\3800-25.tif"
band=1 output="rast_5b73f6b20afa112" --overwrite -o', 'r.external
input="C:\\Users\\rudi\\GIS\\Projects\\LizeSuitableHabitatModel\\Analysis\\Data\\HS90.tif"
band=1 output="rast_5b73f6b20afa113" --overwrite -o', 'g.region
n=-2332657.7474449594 s=-3803200.178477332 e=984393.5909882308
w=-834195.3236270341 res=92.07578930764342', 'r.stats.zonal
base=rast_5b73f6b20afa112 cover=rast_5b73f6b20afa113 method="average"
output=output2c6300f21c524e0da74e65776d6292cf --overwrite']

2018-08-15T11:48:01 INFOprocessInputs end. Commands: ['g.proj -c
proj4="+proj=aea +lat_1=-24 +lat_2=-32 +lat_0=0 +lon_0=24 +x_0=0 +y_0=0
+datum=WGS84 +units=m +no_defs"', 'r.external
input="C:\\Users\\rudi\\GIS\\Projects\\LizeSuitableHabitatModel\\HabitatModels\\3800-25.tif"
band=1 output="rast_5b73f6d1a6fa514" --overwrite -o', 'r.external
input="C:\\Users\\rudi\\GIS\\Projects\\LizeSuitableHabitatModel\\Analysis\\Data\\HS14.tif"
band=1 output="rast_5b73f6d1a6fa515" --overwrite -o', 'g.region
n=-2332657.7474449594 s=-3803200.178477332 e=984393.5909882308
w=-834195.3236270341 res=92.07578930764342']

2018-08-15T11:48:01 INFOprocessCommands end. Commands: ['g.proj -c
proj4="+proj=aea +lat_1=-24 +lat_2=-32 +lat_0=0 +lon_0=24 +x_0=0 +y_0=0
+datum=WGS84 +units=m +no_defs"', 'r.external
input="C:\\Users\\rudi\\GIS\\Projects\\LizeSuitableHabitatModel\\HabitatModels\\3800-25.tif"
band=1 output="rast_5b73f6d1a6fa514" --overwrite -o', 'r.external
input="C:\\Users\\rudi\\GIS\\Projects\\LizeSuitableHabitatModel\\Analysis\\Data\\HS14.tif"
band=1 output="rast_5b73f6d1a6fa515" --overwrite -o', 'g.region
n=-2332657.7474449594 s=-3803200.178477332 e=984393.5909882308
w=-834195.3236270341 res=92.07578930764342', 'r.stats.zonal
base=rast_5b73f6d1a6fa514 cover=rast_5b73f6d1a6fa515 method="average"
output=output744c02ef470a4ddbb2dbaaedf1bda2b0 --overwrite']

On Wed, 15 Aug 2018 at 08:16, Stefan Blumentrath 
wrote:

> Dear Rudi, Nyall,
>
> GRASS is being used on HPC systems for heavily parallelisation. So, in
> principle, the answer is yes, you can for sure run GRASS algorithms in
> parallel.
> On Linux, I often run several commands in parallel using xargs. So it
> works just fine in many cases. GRASS also has some specific python
> functions for parallel processing. See also:
> https://grasswiki.osgeo.org/wiki/Parallel_GRASS_jobs
> https://grasswiki.osgeo.org/wiki/Parallelizing_Scripts
>
> However, if GRASS algorithms can be run in parallel in this particular
> case depends.
>
> E.g., if the algorithm in question temporarily modifies the computational
> region, parallel processes can get in the way for each other.
> Also, with SQLite as DB backend writing several vector maps (and attribute
> tables) in parallel will be a problem (due to SQLite locks).
>
> In addition, if GRASS commands can be executed in parallel in the QGIS
> Processing framework is probably yet another question, depending on how
> e.g. QGIS handles data management (locations and mapsets) esp. in more
> complex workflows / models...
>
> CCing also grass-dev list for more qualified answers...
>
> Cheers
> Stefan
>
>
> -Original Message-
> From: QGIS-Developer  On Behalf
> Of Nyall Dawson
> Sent: onsdag 15. august 2018 01:10
> To: Rudi von Staden 
> Cc: qgis-developer 
> Subject: Re: [QGIS-Developer] Running grass algorithms in threads
>
> On Tue, 14 Aug 2018 at 21:43, Rudi von Staden  wrote:
> >
> > Hi all,
> >
> > The bottleneck in my script at the moment is the calculation of zonal
> stats using 'grass7:r.stats.zonal'. I thought I might speed things up by
> using QgsTask.fromFunction() or QgsProcessingAlgRunnerTask() to run these
> calculations in parallel. In my tests of both approaches the tasks seem to
> complete (task.status() == QgsTask.Complete), but the output file is only
> generated for 1 of 4 parallel tasks (the task that finishes first).
> >
> > I'm assuming this is because grass algorithms are not thread safe? Or am
> I missing something in my implementation that could make this work?
>
> I strongly suspect that grass algorithms cannot be run in parallel.
> This is why they cannot run in the background in QGIS like the native/GDAL
> algorithms can. But I'd love for confirmation about t

Re: [QGIS-Developer] Running grass algorithms in threads

2018-08-14 Thread Stefan Blumentrath
Dear Rudi, Nyall,

GRASS is being used on HPC systems for heavily parallelisation. So, in 
principle, the answer is yes, you can for sure run GRASS algorithms in parallel.
On Linux, I often run several commands in parallel using xargs. So it works 
just fine in many cases. GRASS also has some specific python functions for 
parallel processing. See also:
https://grasswiki.osgeo.org/wiki/Parallel_GRASS_jobs
https://grasswiki.osgeo.org/wiki/Parallelizing_Scripts

However, if GRASS algorithms can be run in parallel in this particular case 
depends.

E.g., if the algorithm in question temporarily modifies the computational 
region, parallel processes can get in the way for each other.
Also, with SQLite as DB backend writing several vector maps (and attribute 
tables) in parallel will be a problem (due to SQLite locks).

In addition, if GRASS commands can be executed in parallel in the QGIS 
Processing framework is probably yet another question, depending on how e.g. 
QGIS handles data management (locations and mapsets) esp. in more complex 
workflows / models...

CCing also grass-dev list for more qualified answers...

Cheers
Stefan


-Original Message-
From: QGIS-Developer  On Behalf Of 
Nyall Dawson
Sent: onsdag 15. august 2018 01:10
To: Rudi von Staden 
Cc: qgis-developer 
Subject: Re: [QGIS-Developer] Running grass algorithms in threads

On Tue, 14 Aug 2018 at 21:43, Rudi von Staden  wrote:
>
> Hi all,
>
> The bottleneck in my script at the moment is the calculation of zonal stats 
> using 'grass7:r.stats.zonal'. I thought I might speed things up by using 
> QgsTask.fromFunction() or QgsProcessingAlgRunnerTask() to run these 
> calculations in parallel. In my tests of both approaches the tasks seem to 
> complete (task.status() == QgsTask.Complete), but the output file is only 
> generated for 1 of 4 parallel tasks (the task that finishes first).
>
> I'm assuming this is because grass algorithms are not thread safe? Or am I 
> missing something in my implementation that could make this work?

I strongly suspect that grass algorithms cannot be run in parallel.
This is why they cannot run in the background in QGIS like the native/GDAL 
algorithms can. But I'd love for confirmation about this and whether there's 
any way to make GRASS multi-thread safe.

Because this is grass related (and not QGIS specific) I'd suggest asking on the 
grass mailing list, and relaying any responses back here.

Nyall

>
> Thanks,
> Rudi
>
>
>
> My code for the QgsTask approach is as below:
>
> def getZonal(task, habitatModelFile, cover):
> tempFile = QgsProcessingUtils.generateTempFilename("output.tif")
> processing.run("grass7:r.stats.zonal", {
> 'base':habitatModelFile,
> 'cover':cover,
> 'method':5,
> '-c':False,
> '-r':False,
> 'output':tempFile,
> 'GRASS_REGION_PARAMETER':None,
> 'GRASS_REGION_CELLSIZE_PARAMETER':0,
> 'GRASS_RASTER_FORMAT_OPT':'',
> 
> 'GRASS_RASTER_FORMAT_META':''},context=context,feedback=algFeedback)
>
> if task.isCanceled():
> deleteFile(tempFile)
> return
>
> return tempFile
>
> ls90Task = QgsTask.fromFunction('LS90', getZonal, 
> habitatModelFile=hm1, cover=ls90Layer)
> QgsApplication.taskManager().addTask(ls90Task)
> feedback.pushInfo("Calculating LS14 mean...") ls14Task = 
> QgsTask.fromFunction('LS14 ', getZonal, habitatModelFile=hm2, 
> cover=ls14Layer)
> QgsApplication.taskManager().addTask(ls14Task)
> hs90Task = QgsTask.fromFunction('HS90 ', getZonal, 
> habitatModelFile=hm3, cover=hs90Layer)
> QgsApplication.taskManager().addTask(hs90Task)
> hs14Task = QgsTask.fromFunction('HS14 ', getZonal, 
> habitatModelFile=hm4, cover=hs14Layer)
> QgsApplication.taskManager().addTask(hs14Task)
>
> while (len([t for t in [ls90Task.status(), ls14Task.status(), 
> hs90Task.status(),
> hs14Task.status()] if t in [QgsTask.Running, QgsTask.Queued]]) > 
> 0)
> and not feedback.isCanceled():
> sleep(1)
>
> if feedback.isCanceled():
> # some cleanup code (send task.cancel() and wait for tasks to terminate)
> break
>
> ls90Result = ls90Task.returned_values
> ls14Result = ls14Task.returned_values
> hs90Result = hs90Task.returned_values   # only this file exists
> hs14Result = hs14Task.returned_values
>
>
> ___
> QGIS-Developer mailing list
> QGIS-Developer@lists.osgeo.org
> List info: https://lists.osgeo.org/mailman/listinfo/qgis-developer
> Unsubscribe: https://lists.osge

Re: [QGIS-Developer] Running grass algorithms in threads

2018-08-14 Thread Vaclav Petras
On Tue, Aug 14, 2018 at 8:52 PM, Vaclav Petras  wrote:

>
> # and delete the rest
> rm ~/grassdata/nc_spm/par1
> rm ~/grassdata/nc_spm/par2
> rm ~/grassdata/nc_spm/par3
> rm ~/grassdata/nc_spm/par4
>


Of course `rm -r` here.
___
QGIS-Developer mailing list
QGIS-Developer@lists.osgeo.org
List info: https://lists.osgeo.org/mailman/listinfo/qgis-developer
Unsubscribe: https://lists.osgeo.org/mailman/listinfo/qgis-developer

Re: [QGIS-Developer] Running grass algorithms in threads

2018-08-14 Thread Vaclav Petras
On Tue, Aug 14, 2018 at 7:10 PM, Nyall Dawson 
wrote:

> On Tue, 14 Aug 2018 at 21:43, Rudi von Staden  wrote:
> >
> > Hi all,
> >
> > The bottleneck in my script at the moment is the calculation of zonal
> stats using 'grass7:r.stats.zonal'. I thought I might speed things up by
> using QgsTask.fromFunction() or QgsProcessingAlgRunnerTask() to run these
> calculations in parallel. In my tests of both approaches the tasks seem to
> complete (task.status() == QgsTask.Complete), but the output file is only
> generated for 1 of 4 parallel tasks (the task that finishes first).
> >
> > I'm assuming this is because grass algorithms are not thread safe? Or am
> I missing something in my implementation that could make this work?
>
> I strongly suspect that grass algorithms cannot be run in parallel.
> This is why they cannot run in the background in QGIS like the
> native/GDAL algorithms can. But I'd love for confirmation about this
> and whether there's any way to make GRASS multi-thread safe.
>


In general, it works. You can run GRASS modules in parallel if you set
things right which is best achieved by running the parallel processes in
separate GRASS mapsets.

GRASS modules are separate processes, so we are talking about parallel
processes, rather than threads, so there are pretty separated. You can run
for example (assuming GRASS GIS session in NC SPM location, new/empty
mapset, and Linux command line, so that & starts process in the background):

r.neighbors input=elevation output=elevation_1 size=21 &
r.neighbors input=elevation output=elevation_2 size=21 &
r.neighbors input=elevation output=elevation_3 size=21 &
r.neighbors input=elevation output=elevation_4 size=21 &

However, conflicts may arise if you are changing computational region at
the same time as doing calculations or if you are writing vectors using the
default settings for attribute table, i.e. one SQLite db for all vector
maps in a mapset. You can make these things work, e.g. by passing a
computational region through environment rather than by g.region or by
using different backend for attributes. However, the safest way are the
separate mapsets, for example (assuming Linux for & and an existing
location called nc_spm):

# create the mapsets
grass -e -c ~/grassdata/nc_spm/par1
grass -e -c ~/grassdata/nc_spm/par2
grass -e -c ~/grassdata/nc_spm/par3
grass -e -c ~/grassdata/nc_spm/par4
# run v.random (just an example which creates vector with attributes)
grass ~/grassdata/nc_spm/par1 --exec v.random output=points_1 column=value
npoints=100 &
grass ~/grassdata/nc_spm/par2 --exec v.random output=points_2 column=value
npoints=100 &
grass ~/grassdata/nc_spm/par3 --exec v.random output=points_3 column=value
npoints=100 &
grass ~/grassdata/nc_spm/par4 --exec v.random output=points_4 column=value
npoints=100 &
# just to finish the example, let's merge the vectors in a new mapset
grass -e -c ~/grassdata/nc_spm/par
grass ~/grassdata/nc_spm/par --exec v.patch input=points_1@par1
,points_2@par2,points_3@par3,points_4@par4 output=points
grass ~/grassdata/nc_spm/par --exec v.info map=points -t
# and delete the rest
rm ~/grassdata/nc_spm/par1
rm ~/grassdata/nc_spm/par2
rm ~/grassdata/nc_spm/par3
rm ~/grassdata/nc_spm/par4

I'm assuming that we are talking about running algorithms in parallel in
QGIS, not parallelism inside the algorithms. Other considerations apply to
that (parallelization is controlled by the modules themselves, see e.g.
nprocs option for r.sun or v.surf.rst in G 7.4). Note that I'm talking
about (pure) GRASS, so it depends how QGIS is handling it (I recall it is
using --exec, but I don't know what it is doing with locations and
mapsets). Please also note that I didn't measure if the v.random example
would be actually more advantageous the a single process.

Best,
Vaclav


> Because this is grass related (and not QGIS specific) I'd suggest
> asking on the grass mailing list, and relaying any responses back
> here.
>

> Nyall
>
> >
> > Thanks,
> > Rudi
> >
> >
> >
> > My code for the QgsTask approach is as below:
> >
> > def getZonal(task, habitatModelFile, cover):
> > tempFile = QgsProcessingUtils.generateTempFilename("output.tif")
> > processing.run("grass7:r.stats.zonal", {
> > 'base':habitatModelFile,
> > 'cover':cover,
> > 'method':5,
> > '-c':False,
> > '-r':False,
> > 'output':tempFile,
> > 'GRASS_REGION_PARAMETER':None,
> > 'GRASS_REGION_CELLSIZE_PARAMETER':0,
> > 'GRASS_RASTER_FORMAT_OPT':'',
> > 'GRASS_RASTER_FORMAT_META':''},context=context,feedback=
> algFeedback)
> >
> > if task.isCanceled():
> > deleteFile(tempFile)
> > return
> >
> > return tempFile
> >
> > ls90Task = QgsTask.fromFunction('LS90', getZonal, habitatModelFile=hm1,
> cover=ls90Layer)
> > QgsApplication.taskManager().addTask(ls90Task)
> > feedback.pushInfo("Calculating LS14 mean...")
> > ls14Task = QgsTask.fromFunction('LS14 ', getZonal, hab

Re: [QGIS-Developer] Running grass algorithms in threads

2018-08-14 Thread Nyall Dawson
On Tue, 14 Aug 2018 at 21:43, Rudi von Staden  wrote:
>
> Hi all,
>
> The bottleneck in my script at the moment is the calculation of zonal stats 
> using 'grass7:r.stats.zonal'. I thought I might speed things up by using 
> QgsTask.fromFunction() or QgsProcessingAlgRunnerTask() to run these 
> calculations in parallel. In my tests of both approaches the tasks seem to 
> complete (task.status() == QgsTask.Complete), but the output file is only 
> generated for 1 of 4 parallel tasks (the task that finishes first).
>
> I'm assuming this is because grass algorithms are not thread safe? Or am I 
> missing something in my implementation that could make this work?

I strongly suspect that grass algorithms cannot be run in parallel.
This is why they cannot run in the background in QGIS like the
native/GDAL algorithms can. But I'd love for confirmation about this
and whether there's any way to make GRASS multi-thread safe.

Because this is grass related (and not QGIS specific) I'd suggest
asking on the grass mailing list, and relaying any responses back
here.

Nyall

>
> Thanks,
> Rudi
>
>
>
> My code for the QgsTask approach is as below:
>
> def getZonal(task, habitatModelFile, cover):
> tempFile = QgsProcessingUtils.generateTempFilename("output.tif")
> processing.run("grass7:r.stats.zonal", {
> 'base':habitatModelFile,
> 'cover':cover,
> 'method':5,
> '-c':False,
> '-r':False,
> 'output':tempFile,
> 'GRASS_REGION_PARAMETER':None,
> 'GRASS_REGION_CELLSIZE_PARAMETER':0,
> 'GRASS_RASTER_FORMAT_OPT':'',
> 'GRASS_RASTER_FORMAT_META':''},context=context,feedback=algFeedback)
>
> if task.isCanceled():
> deleteFile(tempFile)
> return
>
> return tempFile
>
> ls90Task = QgsTask.fromFunction('LS90', getZonal, habitatModelFile=hm1, 
> cover=ls90Layer)
> QgsApplication.taskManager().addTask(ls90Task)
> feedback.pushInfo("Calculating LS14 mean...")
> ls14Task = QgsTask.fromFunction('LS14 ', getZonal, habitatModelFile=hm2, 
> cover=ls14Layer)
> QgsApplication.taskManager().addTask(ls14Task)
> hs90Task = QgsTask.fromFunction('HS90 ', getZonal, habitatModelFile=hm3, 
> cover=hs90Layer)
> QgsApplication.taskManager().addTask(hs90Task)
> hs14Task = QgsTask.fromFunction('HS14 ', getZonal, habitatModelFile=hm4, 
> cover=hs14Layer)
> QgsApplication.taskManager().addTask(hs14Task)
>
> while (len([t for t in [ls90Task.status(), ls14Task.status(), 
> hs90Task.status(),
> hs14Task.status()] if t in [QgsTask.Running, QgsTask.Queued]]) > 
> 0)
> and not feedback.isCanceled():
> sleep(1)
>
> if feedback.isCanceled():
> # some cleanup code (send task.cancel() and wait for tasks to terminate)
> break
>
> ls90Result = ls90Task.returned_values
> ls14Result = ls14Task.returned_values
> hs90Result = hs90Task.returned_values   # only this file exists
> hs14Result = hs14Task.returned_values
>
>
> ___
> QGIS-Developer mailing list
> QGIS-Developer@lists.osgeo.org
> List info: https://lists.osgeo.org/mailman/listinfo/qgis-developer
> Unsubscribe: https://lists.osgeo.org/mailman/listinfo/qgis-developer
___
QGIS-Developer mailing list
QGIS-Developer@lists.osgeo.org
List info: https://lists.osgeo.org/mailman/listinfo/qgis-developer
Unsubscribe: https://lists.osgeo.org/mailman/listinfo/qgis-developer

[QGIS-Developer] Running grass algorithms in threads

2018-08-14 Thread Rudi von Staden
Hi all,

The bottleneck in my script at the moment is the calculation of zonal stats
using 'grass7:r.stats.zonal'. I thought I might speed things up by using
QgsTask.fromFunction() or QgsProcessingAlgRunnerTask() to run these
calculations in parallel. In my tests of both approaches the tasks seem to
complete (task.status() == QgsTask.Complete), but the output file is only
generated for 1 of 4 parallel tasks (the task that finishes first).

I'm assuming this is because grass algorithms are not thread safe? Or am I
missing something in my implementation that could make this work?

Thanks,
Rudi



My code for the QgsTask approach is as below:

def getZonal(task, habitatModelFile, cover):
tempFile = QgsProcessingUtils.generateTempFilename("output.tif")
processing.run("grass7:r.stats.zonal", {
'base':habitatModelFile,
'cover':cover,
'method':5,
'-c':False,
'-r':False,
'output':tempFile,
'GRASS_REGION_PARAMETER':None,
'GRASS_REGION_CELLSIZE_PARAMETER':0,
'GRASS_RASTER_FORMAT_OPT':'',
'GRASS_RASTER_FORMAT_META':''},context=context,feedback=algFeedback)

if task.isCanceled():
deleteFile(tempFile)
return

return tempFile

ls90Task = QgsTask.fromFunction('LS90', getZonal, habitatModelFile=hm1,
cover=ls90Layer)
QgsApplication.taskManager().addTask(ls90Task)
feedback.pushInfo("Calculating LS14 mean...")
ls14Task = QgsTask.fromFunction('LS14 ', getZonal, habitatModelFile=hm2,
cover=ls14Layer)
QgsApplication.taskManager().addTask(ls14Task)
hs90Task = QgsTask.fromFunction('HS90 ', getZonal, habitatModelFile=hm3,
cover=hs90Layer)
QgsApplication.taskManager().addTask(hs90Task)
hs14Task = QgsTask.fromFunction('HS14 ', getZonal, habitatModelFile=hm4,
cover=hs14Layer)
QgsApplication.taskManager().addTask(hs14Task)

while (len([t for t in [ls90Task.status(), ls14Task.status(),
hs90Task.status(),
hs14Task.status()] if t in [QgsTask.Running, QgsTask.Queued]])
> 0)
and not feedback.isCanceled():
sleep(1)

if feedback.isCanceled():
# some cleanup code (send task.cancel() and wait for tasks to terminate)
break

ls90Result = ls90Task.returned_values
ls14Result = ls14Task.returned_values
hs90Result = hs90Task.returned_values   # only this file exists
hs14Result = hs14Task.returned_values
___
QGIS-Developer mailing list
QGIS-Developer@lists.osgeo.org
List info: https://lists.osgeo.org/mailman/listinfo/qgis-developer
Unsubscribe: https://lists.osgeo.org/mailman/listinfo/qgis-developer