http://www.keittlab.org/
On Wed, Aug 17, 2016 at 3:26 PM, James Ramm <jamessr...@gmail.com> wrote: > Hi, > I wasn't talking about MapReduce per se (although this would also be very > interesting), but rather the 'classic' functional programming functions of > 'map' and 'reduce'. > > A 'map' function essentially takes some user defined function and applies > it to every element of a list/vector or some other iterable. For a raster > dataset, I was proposing that it take a user defined function or class > instance and applies it to every 'block' of a raster dataset - with plenty > of choices about how each block is supplied ('natural' blocks, user defined > sizes, overlaps or 'pixel' buffers etc). > > A 'reduce' function aims to apply a user defined function cumulatively to > the iterable, so as to reduce it to a single value. For raster dataset(s) > then, a 'reduce' function would take in multiple datasets and pass the > block of each dataset to the user-defined function together, expecting a > single block to be returned. I.E it reduces multiple datasets to a single > datasets. Image mosaicking would be an example of this. > > I think a 'map' and 'reduce' function is all that is needed to allow the > user to do pretty much any kind of processing they want, without actually > having to worry about how to apply to a raster dataset of any size and all > that 'boring' boilerplate. Just define the algorithm and hand it over. > > A MapReduce system would be, perhaps, a higher level 'organiser' of how > these tasks are run - perhaps distributing them across multiple machines. > Wikipedia does a much better job of explaining MapReduce vs map & reduce > than me: > https://en.wikipedia.org/wiki/MapReduce > > I started to gather my ideas and snippets of code in a repository here: > https://github.com/JamesRamm/GeoAlg > > It is more of a place to help me form ideas than anything usable atm. > > I would be open to merging relevant parts etc. > > What I think is important is: > > - Different kinds of block 'iterators' so you can support e.g. overlapping > neighbourhoods, mosaicking etc. Iterating in blocks ofc is essential for > handling rasters of any size > - Ability to save state in the callbacks (I decided to support this by > making the callback a class instance conforming to some interface rather > than a function...i guess there are other ways too) > - It would be good to support complex raster 'masks' as well (e.g. > different regions of the mask indicating different processing parameters or > something). I am going to clarify my ideas on this on a wiki page in the > above repo. > > When I have tried parallel processing in the past, I have generally done > it on the block array, which neatly avoids having to have multiple > GDALDataset instances and possible getting in a tangle with that. However, > this limits the usefulness of parallel processing to only the most > complicated of algorithms as most of the time I find processing to be > heavily I/O bound. > I was hoping to at some point come up with a parallel, block based > alternative for gdal Polygonize as it runs quite slowly for large, tiled > geotiffs, where access by scanline is sub optimal. A polygonize function > also seems sufficiently complex to benefit from parallelisation. OpenCV > probably has much of the necessary work already done - of course, work > relying on OpenCV might be better outside gdal, rather than introducing > such a large & specialised dependency. > I would tend to agree. GDAL is really good at translation, but not such a great platform for implementing generic iterator-based algorithms. I would suggest using GDAL to translate and then write algorithms separately, where you could e.g. take advantage of newer C++ features. The Generic Image Library is one possibility. THK > > Ill be in Bonn from late Tues afternoon for the conference... > > I have t > > On 17 August 2016 at 13:28, Ari Jolma <ari.jo...@gmail.com> wrote: > >> Hi, >> >> This relates to RFC 62, raster algebra. >> >> I realized that parallel processing is really an essential element of >> this. I don't have a lot of experience with parallel processing and threads >> so please let me know if I'm writing silly or ignorant things. >> >> James, in your emails you write that map and reduce functions are >> essential. That seems to point to parallel processing - can you elaborate a >> bit more, what's you approach there and are you using some specific >> libraries etc? >> >> Rutger mentioned Dask and Numba, which seem to be a high level solution. >> >> Anyway, I thought I'd make a try with OpenMP and the C++ code I have >> written so far. On a very very simple level it seems that it might be >> enough to add "#pragma omp parallel for" before the for loop, which >> iterates over the (cached) blocks. And then compile the code with -fopenmp. >> Of course this does not work (or it seems to work but not make the code use >> more than one cpu at the same time) since a single GDAL Dataset object >> should not be used by several threads (GDAL FAQ). >> >> There seems to be a solution in a book "Remote Sensing Raster >> Programming", which I found with google and google books shows the relevant >> page. The book suggests adding #pragma omp barrier before GDALRasterIO. To >> me it seems that that would cause all the raster data to accumulate into >> the RAM. I did not try it though. >> >> It seems that I should somehow make the code spawn a new Dataset object >> for each thread. The function for that is GDALOpenShared. Now a simple >> question: What if the raster is created in the code? My test application >> for this is a simple one, which takes an existing raster and returns a 0/1 >> raster, where the cell has 1 if the original raster has value 48 and 0 >> elsewhere. Is the solution to create the dataset, and then open new >> connections to it using OpenShared? >> >> By the way, I'll be at the FOSS4G code sprint Tuesday afternoon and >> Saturday morning if anyone wants to discuss this. >> >> Best, >> >> Ari >> >> >> _______________________________________________ >> gdal-dev mailing list >> gdal-dev@lists.osgeo.org >> http://lists.osgeo.org/mailman/listinfo/gdal-dev > > > > _______________________________________________ > gdal-dev mailing list > gdal-dev@lists.osgeo.org > http://lists.osgeo.org/mailman/listinfo/gdal-dev >
_______________________________________________ gdal-dev mailing list gdal-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/gdal-dev