Re: [QGIS-Developer] Running grass algorithms in threads
15T11:47:30 INFOprocessCommands end. Commands: ['g.proj -c proj4="+proj=aea +lat_1=-24 +lat_2=-32 +lat_0=0 +lon_0=24 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs"', 'r.external input="C:\\Users\\rudi\\GIS\\Projects\\LizeSuitableHabitatModel\\HabitatModels\\3800-25.tif" band=1 output="rast_5b73f6b20afa112" --overwrite -o', 'r.external input="C:\\Users\\rudi\\GIS\\Projects\\LizeSuitableHabitatModel\\Analysis\\Data\\HS90.tif" band=1 output="rast_5b73f6b20afa113" --overwrite -o', 'g.region n=-2332657.7474449594 s=-3803200.178477332 e=984393.5909882308 w=-834195.3236270341 res=92.07578930764342', 'r.stats.zonal base=rast_5b73f6b20afa112 cover=rast_5b73f6b20afa113 method="average" output=output2c6300f21c524e0da74e65776d6292cf --overwrite'] 2018-08-15T11:48:01 INFOprocessInputs end. Commands: ['g.proj -c proj4="+proj=aea +lat_1=-24 +lat_2=-32 +lat_0=0 +lon_0=24 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs"', 'r.external input="C:\\Users\\rudi\\GIS\\Projects\\LizeSuitableHabitatModel\\HabitatModels\\3800-25.tif" band=1 output="rast_5b73f6d1a6fa514" --overwrite -o', 'r.external input="C:\\Users\\rudi\\GIS\\Projects\\LizeSuitableHabitatModel\\Analysis\\Data\\HS14.tif" band=1 output="rast_5b73f6d1a6fa515" --overwrite -o', 'g.region n=-2332657.7474449594 s=-3803200.178477332 e=984393.5909882308 w=-834195.3236270341 res=92.07578930764342'] 2018-08-15T11:48:01 INFOprocessCommands end. Commands: ['g.proj -c proj4="+proj=aea +lat_1=-24 +lat_2=-32 +lat_0=0 +lon_0=24 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs"', 'r.external input="C:\\Users\\rudi\\GIS\\Projects\\LizeSuitableHabitatModel\\HabitatModels\\3800-25.tif" band=1 output="rast_5b73f6d1a6fa514" --overwrite -o', 'r.external input="C:\\Users\\rudi\\GIS\\Projects\\LizeSuitableHabitatModel\\Analysis\\Data\\HS14.tif" band=1 output="rast_5b73f6d1a6fa515" --overwrite -o', 'g.region n=-2332657.7474449594 s=-3803200.178477332 e=984393.5909882308 w=-834195.3236270341 res=92.07578930764342', 'r.stats.zonal base=rast_5b73f6d1a6fa514 cover=rast_5b73f6d1a6fa515 method="average" output=output744c02ef470a4ddbb2dbaaedf1bda2b0 --overwrite'] On Wed, 15 Aug 2018 at 08:16, Stefan Blumentrath wrote: > Dear Rudi, Nyall, > > GRASS is being used on HPC systems for heavily parallelisation. So, in > principle, the answer is yes, you can for sure run GRASS algorithms in > parallel. > On Linux, I often run several commands in parallel using xargs. So it > works just fine in many cases. GRASS also has some specific python > functions for parallel processing. See also: > https://grasswiki.osgeo.org/wiki/Parallel_GRASS_jobs > https://grasswiki.osgeo.org/wiki/Parallelizing_Scripts > > However, if GRASS algorithms can be run in parallel in this particular > case depends. > > E.g., if the algorithm in question temporarily modifies the computational > region, parallel processes can get in the way for each other. > Also, with SQLite as DB backend writing several vector maps (and attribute > tables) in parallel will be a problem (due to SQLite locks). > > In addition, if GRASS commands can be executed in parallel in the QGIS > Processing framework is probably yet another question, depending on how > e.g. QGIS handles data management (locations and mapsets) esp. in more > complex workflows / models... > > CCing also grass-dev list for more qualified answers... > > Cheers > Stefan > > > -Original Message- > From: QGIS-Developer On Behalf > Of Nyall Dawson > Sent: onsdag 15. august 2018 01:10 > To: Rudi von Staden > Cc: qgis-developer > Subject: Re: [QGIS-Developer] Running grass algorithms in threads > > On Tue, 14 Aug 2018 at 21:43, Rudi von Staden wrote: > > > > Hi all, > > > > The bottleneck in my script at the moment is the calculation of zonal > stats using 'grass7:r.stats.zonal'. I thought I might speed things up by > using QgsTask.fromFunction() or QgsProcessingAlgRunnerTask() to run these > calculations in parallel. In my tests of both approaches the tasks seem to > complete (task.status() == QgsTask.Complete), but the output file is only > generated for 1 of 4 parallel tasks (the task that finishes first). > > > > I'm assuming this is because grass algorithms are not thread safe? Or am > I missing something in my implementation that could make this work? > > I strongly suspect that grass algorithms cannot be run in parallel. > This is why they cannot run in the background in QGIS like the native/GDAL > algorithms can. But I'd love for confirmation about t
Re: [QGIS-Developer] Running grass algorithms in threads
Dear Rudi, Nyall, GRASS is being used on HPC systems for heavily parallelisation. So, in principle, the answer is yes, you can for sure run GRASS algorithms in parallel. On Linux, I often run several commands in parallel using xargs. So it works just fine in many cases. GRASS also has some specific python functions for parallel processing. See also: https://grasswiki.osgeo.org/wiki/Parallel_GRASS_jobs https://grasswiki.osgeo.org/wiki/Parallelizing_Scripts However, if GRASS algorithms can be run in parallel in this particular case depends. E.g., if the algorithm in question temporarily modifies the computational region, parallel processes can get in the way for each other. Also, with SQLite as DB backend writing several vector maps (and attribute tables) in parallel will be a problem (due to SQLite locks). In addition, if GRASS commands can be executed in parallel in the QGIS Processing framework is probably yet another question, depending on how e.g. QGIS handles data management (locations and mapsets) esp. in more complex workflows / models... CCing also grass-dev list for more qualified answers... Cheers Stefan -Original Message- From: QGIS-Developer On Behalf Of Nyall Dawson Sent: onsdag 15. august 2018 01:10 To: Rudi von Staden Cc: qgis-developer Subject: Re: [QGIS-Developer] Running grass algorithms in threads On Tue, 14 Aug 2018 at 21:43, Rudi von Staden wrote: > > Hi all, > > The bottleneck in my script at the moment is the calculation of zonal stats > using 'grass7:r.stats.zonal'. I thought I might speed things up by using > QgsTask.fromFunction() or QgsProcessingAlgRunnerTask() to run these > calculations in parallel. In my tests of both approaches the tasks seem to > complete (task.status() == QgsTask.Complete), but the output file is only > generated for 1 of 4 parallel tasks (the task that finishes first). > > I'm assuming this is because grass algorithms are not thread safe? Or am I > missing something in my implementation that could make this work? I strongly suspect that grass algorithms cannot be run in parallel. This is why they cannot run in the background in QGIS like the native/GDAL algorithms can. But I'd love for confirmation about this and whether there's any way to make GRASS multi-thread safe. Because this is grass related (and not QGIS specific) I'd suggest asking on the grass mailing list, and relaying any responses back here. Nyall > > Thanks, > Rudi > > > > My code for the QgsTask approach is as below: > > def getZonal(task, habitatModelFile, cover): > tempFile = QgsProcessingUtils.generateTempFilename("output.tif") > processing.run("grass7:r.stats.zonal", { > 'base':habitatModelFile, > 'cover':cover, > 'method':5, > '-c':False, > '-r':False, > 'output':tempFile, > 'GRASS_REGION_PARAMETER':None, > 'GRASS_REGION_CELLSIZE_PARAMETER':0, > 'GRASS_RASTER_FORMAT_OPT':'', > > 'GRASS_RASTER_FORMAT_META':''},context=context,feedback=algFeedback) > > if task.isCanceled(): > deleteFile(tempFile) > return > > return tempFile > > ls90Task = QgsTask.fromFunction('LS90', getZonal, > habitatModelFile=hm1, cover=ls90Layer) > QgsApplication.taskManager().addTask(ls90Task) > feedback.pushInfo("Calculating LS14 mean...") ls14Task = > QgsTask.fromFunction('LS14 ', getZonal, habitatModelFile=hm2, > cover=ls14Layer) > QgsApplication.taskManager().addTask(ls14Task) > hs90Task = QgsTask.fromFunction('HS90 ', getZonal, > habitatModelFile=hm3, cover=hs90Layer) > QgsApplication.taskManager().addTask(hs90Task) > hs14Task = QgsTask.fromFunction('HS14 ', getZonal, > habitatModelFile=hm4, cover=hs14Layer) > QgsApplication.taskManager().addTask(hs14Task) > > while (len([t for t in [ls90Task.status(), ls14Task.status(), > hs90Task.status(), > hs14Task.status()] if t in [QgsTask.Running, QgsTask.Queued]]) > > 0) > and not feedback.isCanceled(): > sleep(1) > > if feedback.isCanceled(): > # some cleanup code (send task.cancel() and wait for tasks to terminate) > break > > ls90Result = ls90Task.returned_values > ls14Result = ls14Task.returned_values > hs90Result = hs90Task.returned_values # only this file exists > hs14Result = hs14Task.returned_values > > > ___ > QGIS-Developer mailing list > QGIS-Developer@lists.osgeo.org > List info: https://lists.osgeo.org/mailman/listinfo/qgis-developer > Unsubscribe: https://lists.osge
Re: [QGIS-Developer] Running grass algorithms in threads
On Tue, Aug 14, 2018 at 8:52 PM, Vaclav Petras wrote: > > # and delete the rest > rm ~/grassdata/nc_spm/par1 > rm ~/grassdata/nc_spm/par2 > rm ~/grassdata/nc_spm/par3 > rm ~/grassdata/nc_spm/par4 > Of course `rm -r` here. ___ QGIS-Developer mailing list QGIS-Developer@lists.osgeo.org List info: https://lists.osgeo.org/mailman/listinfo/qgis-developer Unsubscribe: https://lists.osgeo.org/mailman/listinfo/qgis-developer
Re: [QGIS-Developer] Running grass algorithms in threads
On Tue, Aug 14, 2018 at 7:10 PM, Nyall Dawson wrote: > On Tue, 14 Aug 2018 at 21:43, Rudi von Staden wrote: > > > > Hi all, > > > > The bottleneck in my script at the moment is the calculation of zonal > stats using 'grass7:r.stats.zonal'. I thought I might speed things up by > using QgsTask.fromFunction() or QgsProcessingAlgRunnerTask() to run these > calculations in parallel. In my tests of both approaches the tasks seem to > complete (task.status() == QgsTask.Complete), but the output file is only > generated for 1 of 4 parallel tasks (the task that finishes first). > > > > I'm assuming this is because grass algorithms are not thread safe? Or am > I missing something in my implementation that could make this work? > > I strongly suspect that grass algorithms cannot be run in parallel. > This is why they cannot run in the background in QGIS like the > native/GDAL algorithms can. But I'd love for confirmation about this > and whether there's any way to make GRASS multi-thread safe. > In general, it works. You can run GRASS modules in parallel if you set things right which is best achieved by running the parallel processes in separate GRASS mapsets. GRASS modules are separate processes, so we are talking about parallel processes, rather than threads, so there are pretty separated. You can run for example (assuming GRASS GIS session in NC SPM location, new/empty mapset, and Linux command line, so that & starts process in the background): r.neighbors input=elevation output=elevation_1 size=21 & r.neighbors input=elevation output=elevation_2 size=21 & r.neighbors input=elevation output=elevation_3 size=21 & r.neighbors input=elevation output=elevation_4 size=21 & However, conflicts may arise if you are changing computational region at the same time as doing calculations or if you are writing vectors using the default settings for attribute table, i.e. one SQLite db for all vector maps in a mapset. You can make these things work, e.g. by passing a computational region through environment rather than by g.region or by using different backend for attributes. However, the safest way are the separate mapsets, for example (assuming Linux for & and an existing location called nc_spm): # create the mapsets grass -e -c ~/grassdata/nc_spm/par1 grass -e -c ~/grassdata/nc_spm/par2 grass -e -c ~/grassdata/nc_spm/par3 grass -e -c ~/grassdata/nc_spm/par4 # run v.random (just an example which creates vector with attributes) grass ~/grassdata/nc_spm/par1 --exec v.random output=points_1 column=value npoints=100 & grass ~/grassdata/nc_spm/par2 --exec v.random output=points_2 column=value npoints=100 & grass ~/grassdata/nc_spm/par3 --exec v.random output=points_3 column=value npoints=100 & grass ~/grassdata/nc_spm/par4 --exec v.random output=points_4 column=value npoints=100 & # just to finish the example, let's merge the vectors in a new mapset grass -e -c ~/grassdata/nc_spm/par grass ~/grassdata/nc_spm/par --exec v.patch input=points_1@par1 ,points_2@par2,points_3@par3,points_4@par4 output=points grass ~/grassdata/nc_spm/par --exec v.info map=points -t # and delete the rest rm ~/grassdata/nc_spm/par1 rm ~/grassdata/nc_spm/par2 rm ~/grassdata/nc_spm/par3 rm ~/grassdata/nc_spm/par4 I'm assuming that we are talking about running algorithms in parallel in QGIS, not parallelism inside the algorithms. Other considerations apply to that (parallelization is controlled by the modules themselves, see e.g. nprocs option for r.sun or v.surf.rst in G 7.4). Note that I'm talking about (pure) GRASS, so it depends how QGIS is handling it (I recall it is using --exec, but I don't know what it is doing with locations and mapsets). Please also note that I didn't measure if the v.random example would be actually more advantageous the a single process. Best, Vaclav > Because this is grass related (and not QGIS specific) I'd suggest > asking on the grass mailing list, and relaying any responses back > here. > > Nyall > > > > > Thanks, > > Rudi > > > > > > > > My code for the QgsTask approach is as below: > > > > def getZonal(task, habitatModelFile, cover): > > tempFile = QgsProcessingUtils.generateTempFilename("output.tif") > > processing.run("grass7:r.stats.zonal", { > > 'base':habitatModelFile, > > 'cover':cover, > > 'method':5, > > '-c':False, > > '-r':False, > > 'output':tempFile, > > 'GRASS_REGION_PARAMETER':None, > > 'GRASS_REGION_CELLSIZE_PARAMETER':0, > > 'GRASS_RASTER_FORMAT_OPT':'', > > 'GRASS_RASTER_FORMAT_META':''},context=context,feedback= > algFeedback) > > > > if task.isCanceled(): > > deleteFile(tempFile) > > return > > > > return tempFile > > > > ls90Task = QgsTask.fromFunction('LS90', getZonal, habitatModelFile=hm1, > cover=ls90Layer) > > QgsApplication.taskManager().addTask(ls90Task) > > feedback.pushInfo("Calculating LS14 mean...") > > ls14Task = QgsTask.fromFunction('LS14 ', getZonal, hab
Re: [QGIS-Developer] Running grass algorithms in threads
On Tue, 14 Aug 2018 at 21:43, Rudi von Staden wrote: > > Hi all, > > The bottleneck in my script at the moment is the calculation of zonal stats > using 'grass7:r.stats.zonal'. I thought I might speed things up by using > QgsTask.fromFunction() or QgsProcessingAlgRunnerTask() to run these > calculations in parallel. In my tests of both approaches the tasks seem to > complete (task.status() == QgsTask.Complete), but the output file is only > generated for 1 of 4 parallel tasks (the task that finishes first). > > I'm assuming this is because grass algorithms are not thread safe? Or am I > missing something in my implementation that could make this work? I strongly suspect that grass algorithms cannot be run in parallel. This is why they cannot run in the background in QGIS like the native/GDAL algorithms can. But I'd love for confirmation about this and whether there's any way to make GRASS multi-thread safe. Because this is grass related (and not QGIS specific) I'd suggest asking on the grass mailing list, and relaying any responses back here. Nyall > > Thanks, > Rudi > > > > My code for the QgsTask approach is as below: > > def getZonal(task, habitatModelFile, cover): > tempFile = QgsProcessingUtils.generateTempFilename("output.tif") > processing.run("grass7:r.stats.zonal", { > 'base':habitatModelFile, > 'cover':cover, > 'method':5, > '-c':False, > '-r':False, > 'output':tempFile, > 'GRASS_REGION_PARAMETER':None, > 'GRASS_REGION_CELLSIZE_PARAMETER':0, > 'GRASS_RASTER_FORMAT_OPT':'', > 'GRASS_RASTER_FORMAT_META':''},context=context,feedback=algFeedback) > > if task.isCanceled(): > deleteFile(tempFile) > return > > return tempFile > > ls90Task = QgsTask.fromFunction('LS90', getZonal, habitatModelFile=hm1, > cover=ls90Layer) > QgsApplication.taskManager().addTask(ls90Task) > feedback.pushInfo("Calculating LS14 mean...") > ls14Task = QgsTask.fromFunction('LS14 ', getZonal, habitatModelFile=hm2, > cover=ls14Layer) > QgsApplication.taskManager().addTask(ls14Task) > hs90Task = QgsTask.fromFunction('HS90 ', getZonal, habitatModelFile=hm3, > cover=hs90Layer) > QgsApplication.taskManager().addTask(hs90Task) > hs14Task = QgsTask.fromFunction('HS14 ', getZonal, habitatModelFile=hm4, > cover=hs14Layer) > QgsApplication.taskManager().addTask(hs14Task) > > while (len([t for t in [ls90Task.status(), ls14Task.status(), > hs90Task.status(), > hs14Task.status()] if t in [QgsTask.Running, QgsTask.Queued]]) > > 0) > and not feedback.isCanceled(): > sleep(1) > > if feedback.isCanceled(): > # some cleanup code (send task.cancel() and wait for tasks to terminate) > break > > ls90Result = ls90Task.returned_values > ls14Result = ls14Task.returned_values > hs90Result = hs90Task.returned_values # only this file exists > hs14Result = hs14Task.returned_values > > > ___ > QGIS-Developer mailing list > QGIS-Developer@lists.osgeo.org > List info: https://lists.osgeo.org/mailman/listinfo/qgis-developer > Unsubscribe: https://lists.osgeo.org/mailman/listinfo/qgis-developer ___ QGIS-Developer mailing list QGIS-Developer@lists.osgeo.org List info: https://lists.osgeo.org/mailman/listinfo/qgis-developer Unsubscribe: https://lists.osgeo.org/mailman/listinfo/qgis-developer
[QGIS-Developer] Running grass algorithms in threads
Hi all, The bottleneck in my script at the moment is the calculation of zonal stats using 'grass7:r.stats.zonal'. I thought I might speed things up by using QgsTask.fromFunction() or QgsProcessingAlgRunnerTask() to run these calculations in parallel. In my tests of both approaches the tasks seem to complete (task.status() == QgsTask.Complete), but the output file is only generated for 1 of 4 parallel tasks (the task that finishes first). I'm assuming this is because grass algorithms are not thread safe? Or am I missing something in my implementation that could make this work? Thanks, Rudi My code for the QgsTask approach is as below: def getZonal(task, habitatModelFile, cover): tempFile = QgsProcessingUtils.generateTempFilename("output.tif") processing.run("grass7:r.stats.zonal", { 'base':habitatModelFile, 'cover':cover, 'method':5, '-c':False, '-r':False, 'output':tempFile, 'GRASS_REGION_PARAMETER':None, 'GRASS_REGION_CELLSIZE_PARAMETER':0, 'GRASS_RASTER_FORMAT_OPT':'', 'GRASS_RASTER_FORMAT_META':''},context=context,feedback=algFeedback) if task.isCanceled(): deleteFile(tempFile) return return tempFile ls90Task = QgsTask.fromFunction('LS90', getZonal, habitatModelFile=hm1, cover=ls90Layer) QgsApplication.taskManager().addTask(ls90Task) feedback.pushInfo("Calculating LS14 mean...") ls14Task = QgsTask.fromFunction('LS14 ', getZonal, habitatModelFile=hm2, cover=ls14Layer) QgsApplication.taskManager().addTask(ls14Task) hs90Task = QgsTask.fromFunction('HS90 ', getZonal, habitatModelFile=hm3, cover=hs90Layer) QgsApplication.taskManager().addTask(hs90Task) hs14Task = QgsTask.fromFunction('HS14 ', getZonal, habitatModelFile=hm4, cover=hs14Layer) QgsApplication.taskManager().addTask(hs14Task) while (len([t for t in [ls90Task.status(), ls14Task.status(), hs90Task.status(), hs14Task.status()] if t in [QgsTask.Running, QgsTask.Queued]]) > 0) and not feedback.isCanceled(): sleep(1) if feedback.isCanceled(): # some cleanup code (send task.cancel() and wait for tasks to terminate) break ls90Result = ls90Task.returned_values ls14Result = ls14Task.returned_values hs90Result = hs90Task.returned_values # only this file exists hs14Result = hs14Task.returned_values ___ QGIS-Developer mailing list QGIS-Developer@lists.osgeo.org List info: https://lists.osgeo.org/mailman/listinfo/qgis-developer Unsubscribe: https://lists.osgeo.org/mailman/listinfo/qgis-developer