Le mardi 13 septembre 2016 09:02:09, Rutger a écrit : > Hi Even, > > This is really amazing. I'm becoming more and more a fan of Numba for > number crunching, so this certainly makes my day. As soon as i can find a > Win x64 dev version on a Conda channel I'll give it a try. > > A use case that comes to mind, and which i run into regularly, is when i > want to do some simple aggregation before using something like gdalwarp. > For example when you have a file containing 24 hourly temperature values, > and you are only interested in the daily mean. Currently i either > aggregate before warping and write the intermediates to disk, or aggregate > after warping which is computationally inefficient. Neither is optimal. > > A few questions, can you access a files metadata from within a pixel > function? This would perhaps allow for example interpolating atmospheric > data to the overpass time of a satellite image.
This is indeed a question I've asked myself, if the prototype of the pixel function contained enough information. I had not put initially the geotransform and whole raster dimensions for example, but found it was necessary to have correct behaviour for hillshading at different zoom scales. And I also added afterwards the xoff, yoff to be able to do Mandelbrot generation (I guess there might be some more valid use cases ;-)). And then the user provided dictionnary to have some parametrization of the algorithm instead of hardcoding constants in it, so that you can off-load it into a general lib. And I stopped at that point. For your use case, 2 possibilities currently, using the <PixelFunctionArguments> capability: - either you get the metadata items you need at VRT generation and put them as arguments - either you only pass the names of the sources as arguments, and you use the GDAL Python bindings themselves inside the pixel function itself to get access to everything needed. But I'm just thinking it might be a bit inefficient to do that for each RasterIO() request. <loud_thinking> Perhaps a compromise would be to allow, in addition to simple functions, to specify a class name, where the constructor would receive values that don't change from one call to another one, and a calc() method would receive the ones that change at each RasterIO() request def MyProcessingClass: def __init__(self, raster_xsize, raster_ysize, buf_radius, gt, source_filenames, **kwargs): save above parameters that you may need in calc() do one time things like gdal.Open()'ing sources to get metadata def calc( self, in_ar, out_ar, xoff, yoff, xsize, ysize ): do your computation We could possibly pass the sources as datasets themselves, since they are actually already opened, but that would make the osgeo.gdal bindings availability a requirement (well, we could as a fallback pass the filenames if the import fails) </loud_thinking> > > Do the pixel functions also work with @numba.vectorize(), in particular > when targeting 'parallel' or 'cuda'. I've only scratched up the surface of Numba (didn't know it 2 days ago). I guess this might work (might only be interested if the VRTDerivedRasterBand::IRasterIO() is called with a big enough region. Which depends on the pixel access pattern of upper layers). The VRT drivers just calls a Python function that takes numpy arrays and a few extra args. > And would that give parallel > processing for both IO and calculations? Only calculations. I/O is done before going to Python and after returning from it. Actually... if you specify zero source in the VRT, then it is up to you to do the read operations the way you like in Python, so you could possibly parallelize them there. >You should give the folks at Continuum a heads up, i'm sure they appreciate > seeing Numba used like this. I have no connection with them, but yes their project rocks. Even -- Spatialys - Geospatial professional services http://www.spatialys.com _______________________________________________ gdal-dev mailing list gdal-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/gdal-dev