On 04/05/17 19:22, Markus Neteler wrote: > Hi, > > in order to parallelize some heavy computation I was wondering how to > do spatial clustering of vector objects, i.e. building footprints > (vector polygons). > > I have to perform zonal statistics on thousands of buildings and would > like to split them up into "tiles" and then run the computation in > parallel for each tile. > > The examples in v.cluster look somehow promising > https://grass.osgeo.org/grass72/manuals/v.cluster.html > > but in the best case each "tile" would contain a similar amount of > buildings in order to balance the computation across the CPUs.
Hi, I think that you would need to partition space into overlapping tiles, with the amount of overlap depending on the maximum distance parameter of the clustering algorithm. Otherwise you would get a serious edge effect in each tile. Prior to spatial clustering, you could use a cluster algorithm that aims to produce clusters with (nearly) equal number of points for "tiling": https://stats.stackexchange.com/questions/8744/clustering-procedure-where-each-cluster-has-an-equal-number-of-points You would then select the points for each cluster, buffer their convex hull by the max distance of your spatial cluster algorithm and set the working region for each "tile" to be the bounding box of the buffered convex hull (don't forget to catch all points from all other clusters that fall within the "tile" and add them to the working region's set). If that works, please make it a GRASS add-on... Regarding building footprints, I guess another tricky part is how to represent them as points: Centroids? Outer edge vertices? Both? Oh, by the way: A fellow computer scientist who works a lot with concurrent processing once told me that the frequently used number of processes = number of CPUs/cores is actually not ideal! Apparently, modern CPU schedulers are optimized to handle many more processes than there are CPUs/cores, and if the two counts match, then you can get fringe situations where processes keep getting transferred between cores, which incurs a huge performance penalty. His recommendation was to use a factor of about 2.5 (times more processes than cores). I never got around to testing his theory, but if you have the time, I'd love to know! Best, Ben > > Any idea? > > thanks, > Markus > _______________________________________________ > grass-dev mailing list > grass-dev@lists.osgeo.org > https://lists.osgeo.org/mailman/listinfo/grass-dev > -- Dr. Benjamin Ducke {*} Geospatial Consultant {*} GIS Developer Spatial technology for the masses, not the classes: experience free and open source GIS at http://gvsigce.org _______________________________________________ grass-dev mailing list grass-dev@lists.osgeo.org https://lists.osgeo.org/mailman/listinfo/grass-dev