Hi Eelco, Eelco Hoogendoorn <[email protected]> writes: > I have some code that I would like to contribute to pycuda. What would > the preferred way of doing so be? Create a branch in git? > > I have been working on some generalizations of the 'elementwise' > functionality recently. Specifically, I have made a GriddedKernel class, > that operates on nd-arrays, and handles computation of grid indices. As > a simple example, this allows one to write an outer product kernel along > the lines of: > > GriddedKernel( > arguments = 'float32[:,:] out, float32[:] x, float32[:] y', > body = 'out[i] = x[xi] * y[yi];') > > Furthermore, I have created a StencilKernel class. It takes an arbitrary > stencil as input, and creates an unrolled kernel from that. I am using > it for a watershed segmentation algorithm at the moment, but a simpler > example would be something like a laplacian: > > StencilKernel( > stencil = laplacian3x3, > arguments = 'float32[512,512] out, float32[512,512] in', > loop_start = 'float32 o = 0', > loop_body = 'o += in[j] * weight[j];', > loop_end = 'out[i] = o') > > The stencilkernel contains convenience functions for allocating and > transferring padded memory, so the stencil can always read safely, and > one can easily implement boundary conditions of ones choice. > > As you might have noticed, I have deviated from the elementwise way of > specifying arguments. numpy dtypes get translated into c-types (i hate > the impicitness of c-types), plus, one can specify size and > dimensionality constraints on arguments, which by default are translated > into runtime type/shape checks. > > However, there is still lots of work to be done. I am quite new to all > this cuda stuff, and I bet my code could be greatly improved, and > expanded to use cases that have not yet occurred to me. For instance, my > kernel code is still quite naive, and does not use shared memory. Also, > I have only covered 2d and 3d use cases so far, but by looping over the > larger strides, arbitrary nd-kernels could be created. My template code > is a mess and I should probably use codepy instead. And so on... > > If there is anybody willing to help me advance this, id love to create a > git branch for it and try my best document and clean up my code, and > integrate it with pycuda style conventions (perhaps create an > elementwise branch that adheres to the same interface, and so on?). But > if its just me being excited about this, I probably wont bother. Even if > you dont want to help directly, your thoughts and comments are most welcome.
I agree that in the medium term, we need something more nd-aware than the current ElementwiseKernel, and your stuff looks like a good first step. Frédéric Bastien and Arnaud Bergeron have been hard at work creating an array object that does just that, from (at least) Python and C, and for CUDA and OpenCL. So, in some sense this is great--we're spoiled for choice to pick the best approach, and I'm looking forward to having this discussion. In the meantime, it would certainly be good if we could take a look at your code. Given how easy it is to ship add-on packages in Python, I would suggest that you set up your code to be separate from PyCUDA (i.e. not as a branch, but as a separate package). For example, I'd be hesitant to merge something like your StencilKernel--while it's certainly commonly used, it's still a bit too special purpose to be in a supposedly general-purpose package. Thanks for getting the discussion started--looking forward to seeing what you've got. :) Andreas _______________________________________________ PyCUDA mailing list [email protected] http://lists.tiker.net/listinfo/pycuda
