On Wed, Apr 28, 2010, Soeren Gebbert wrote: >> >> Hmm, if I understand the code right, only the innermost for loop [of the >> tchol solver] can be >> executed in parallel because the first two for-loops need results of >> previous runs (not possible to run i + 1 before i, same for j). But I don't >> know that parallel stuff, I would in any case first optimize the code >> without parallelization, only then add multithreading. > > I fully agree. The band cholesky solver is well suited for the job but > not designed for parallelization. Thus i have implemented an iterative > Conjugate Gradient (CG) solver for band matrices (just by replacing > the matrix - vector multiplication) to replace the cholesky band > solver. Well, the cholesky solver out-performes the CG solver by a > factor of 10+. I have partly parallelized the CG solver for band > matrices, but even the speedup on a 8 core system is to small to > compete with the cholesky band solver. Besides of that, the created > band matrices are of bad condition, so the CG solver may fail for > subregions.
IMHO, it doesn't really make sense to replace a function with another function that's 10+ times slower and produces worse results... If you need a parallelized version and a 10+ core system to achieve the same speed, and the results are not as good as with the original tchol solver, I would suggest to keep using the original tchol solver until the CG solver is faster and produces identical results (also for asymmetrical matrices). > Hence, i would suggest to use MPI parallelization or something else on > process base, to concurrently compute the subregions which are created > by default. Maybe a python script using overlapping tiling and > subprocess can do the job? The handling of the overlapping tiles is deeply embedded in the module code, at first glance it seems to me that you would have to completely translate the module to python and then call the interpolation routines from python? Seems messy to me. The subregion tiling is a bit tricky and I spent quite some time fixing and optimizing it. On Fri, Apr 30 Soeren Gebbert wrote: > > I have renamed some variable (italian -> english) in the lidar ilb and > modules, to get a better understanding of what is happening in the > code. I have committed the changes to svn, in case the changes are > wrong, please tell me, i will revert it. Thanks, looks good to me! > >> >>>[...] i would suggest to use MPI parallelization or something else on >>> process base, to concurrently compute the subregions which are created >>> by default. Maybe a python script using overlapping tiling and >>> subprocess can do the job? >> >> I think that should be integrated into the current lidar tools because >> they make use of the current subregion tiling and C code is still >> faster than Python code (right?). Treatment of overlapping zones of > > The computation time of overlapping regions should be the same in > python as in C. > IMHO it is easier to program a python module which computes the tiles, > split up the vector points based on the tiles and distribute the data > across a grid or cloud for computation. After the distributed > processes are finished, the script can gather the data and merge it > together. As above, IMHO that would be a complete rewrite of the modules in python. Maybe because I am more familiar with C. Changing the C code so that tiles can be processed in parallel should not be too hard. > > If you implement this into the lidar modules, each module needs to be > changed to support MPI. Using MPI is IMHO much more complex, then just > running the modules on different datasets on different nodes, > gathering and patching the data. There are already modules which > support this process (g.region, r.tileset, r.patch ... ). r.tileset is not doing the same like the subregion tiling done in the lidartools because of the overlapping. The outermost overlapping edge is used for interpolation but the values interpolated there are never used, it exists to avoid edge effects along the borders of the subregion. The interpolation results of the second overlapping zone are used but weighted according to their distance to the core zone. The interpolation results of the core zone are used as is. The extends of the overlapping zones dramatically influence accuracy and speed, and the current design should be kept in order to avoid edge effects. AFAICT, the interpolated surfaces are seamless, no edge effects whatsoever visible in various diagnostic maps calculated. > Well, is > there a module which supports the extraction of vector points of the > current region without topology support? The only way to extract vector points for the current region without topology is to go through all points and check for each point if it falls into the current region. > >> neighbouring subregions is done sometimes in the lidar_lib, sometimes >> by the modules themselves. It took me quite some time to understand it >> and fix it, therefore I would advise against code porting/duplication. > > Maybe the overlapping algorithm can be simplified in the python version? I had accuracy of results in mind when I modified the overlapping algorithm because the previous algorithm produced bad results, the reason why v.surf.bpline was restricted in such a way that only one subregion was needed, i.e. processing everything at once. If there is a way to simplify it without changing the results, great! > In my vision, the LIDAR computation could be run in an elastic > computation cloud (EC2) as WPS backend using recursive tiling > algorithm for data distribution. > For massive LiDAR datasets, an approach like this would be really cool! Markus M _______________________________________________ grass-dev mailing list grass-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-dev