On Thu, May 4, 2017 at 2:18 PM, Benjamin Ducke <bendu...@fastmail.fm> wrote: > On 04/05/17 19:22, Markus Neteler wrote: >> Hi, >> >> in order to parallelize some heavy computation I was wondering how to >> do spatial clustering of vector objects, i.e. building footprints >> (vector polygons). >> >> I have to perform zonal statistics on thousands of buildings and would >> like to split them up into "tiles" and then run the computation in >> parallel for each tile. >> >> The examples in v.cluster look somehow promising >> https://grass.osgeo.org/grass72/manuals/v.cluster.html >> >> but in the best case each "tile" would contain a similar amount of >> buildings in order to balance the computation across the CPUs. > > Hi, > > I think that you would need to partition > space into overlapping tiles, with the > amount of overlap depending on the maximum > distance parameter of the clustering algorithm. > Otherwise you would get a serious edge effect > in each tile. > > Prior to spatial clustering, you could use a cluster > algorithm that aims to produce clusters with > (nearly) equal number of points for "tiling": > > https://stats.stackexchange.com/questions/8744/clustering-procedure-where-each-cluster-has-an-equal-number-of-points > > You would then select the points for each > cluster, buffer their convex hull by the max > distance of your spatial cluster algorithm > and set the working region for each "tile" to > be the bounding box of the buffered convex > hull (don't forget to catch all points from > all other clusters that fall within the "tile" > and add them to the working region's set). > > If that works, please make it a GRASS add-on... > > Regarding building footprints, I guess another > tricky part is how to represent them as > points: Centroids? Outer edge vertices? Both? > > Oh, by the way: A fellow computer scientist > who works a lot with concurrent processing > once told me that the frequently used > > number of processes = number of CPUs/cores > > is actually not ideal! Apparently, modern > CPU schedulers are optimized to handle many > more processes than there are CPUs/cores, > and if the two counts match, then you can > get fringe situations where processes keep > getting transferred between cores, which > incurs a huge performance penalty. His > recommendation was to use a factor of > about 2.5 (times more processes than cores). > > I never got around to testing his theory, > but if you have the time, I'd love to know! > > Best, > > Ben > >> >> Any idea? >> >> thanks, >> Markus >> _______________________________________________ >> grass-dev mailing list >> grass-dev@lists.osgeo.org >> https://lists.osgeo.org/mailman/listinfo/grass-dev >> > > > > -- > Dr. Benjamin Ducke > {*} Geospatial Consultant > {*} GIS Developer > > Spatial technology for the masses, not the classes: > experience free and open source GIS at http://gvsigce.org > _______________________________________________ > grass-dev mailing list > grass-dev@lists.osgeo.org > https://lists.osgeo.org/mailman/listinfo/grass-dev
Not sure if it's applicable here, but you could also try to use the quadtree segmentation in v.surf.rst, there is an output parameter treeseg. You need to postprocess it - v.category, v.type, v.centroid to get areas. Anna _______________________________________________ grass-dev mailing list grass-dev@lists.osgeo.org https://lists.osgeo.org/mailman/listinfo/grass-dev