Keegan, Keegan Owsley <keeg...@gmail.com> writes: > I've just slapped together a patch to pycuda that makes most elementwise > operations work with noncontiguous arrays. There are a bunch of hacks in > there, and the code needs some reorg before it's ready to be considered for > upstream (I made these changes while learning the pycuda codebase, so > there's a bunch of crud that can be cleaned out), but I figure I might as > well put it out there in its current state and see what you guys think. > It's also not extremely well-tested (I have no idea if it interferes with > skcuda, for example), but all of the main functions appear to work. > > You can check out the code at https://bitbucket.org/owsleyk_omega/pycuda. > > Briefly, this works by adding new parameters into elementwise kernels that > describe the stride and shape of your arrays, then using a function that > computes the location in memory from the stride, shape, and index. > Elementwise kernel ops are modified so that they use the proper indexing. > See an example of a kernel that's generated below:
Thanks for putting this together and sharing it! I have one main question about this, regarding performance: Modulo (especially variable-denominator modulo) has a habit of being fantastically slow on GPUs. Could you time contiguous vs. noncontiguous for various levels of "gappiness" and number of axes? I'm asking this because I'd be OK with a 50% slowdown, but not necessarily a factor of 5 slowdown on actual GPU hardware. Thanks! Andreas _______________________________________________ PyCUDA mailing list PyCUDA@tiker.net https://lists.tiker.net/listinfo/pycuda