Hi Tomasz, On Tue, 21 Dec 2010 23:21:59 +0100, Tomasz Rybak <bogom...@post.pl> wrote: > I have wrote attached code that calculates parallel prefix sum. > There are two variants - the first one (exclusive) is based on article > by Mark Harris from NVIDIA, second one (inclusive) is based on diagram > from Wikipedia. > Inclusive scan could be optimised with regard to shared memory access > conflicts, similarly to the exclusive version. > Also it seems that inclusive scan is less stable numerically - results > differ from CPU version when calculating scan of large arrays.
Again, thanks for your contribution! Here are a few comments on the scan: - It doesn't seem like the inclusive and the exclusive version are so dissimilar. As such, I don't think we should duplicate code for the two. If necessary, I'd even prefer to make this code depend on Mako or Jinja (template engines) to avoid code duplication. - Use warnings.warn, not plain print, for warnings. - Tests should go into tests/test_gpu_array or some such. - Formal nitpicks: Please indent comments with the rest of the code. PEP 8 says a=value (no spaces) for keyword arguments. Camel case in C is yucky, too. :) - 'Sum' is poor wording for the general associative operation that the scan uses--use scan_op perhaps. Andreas
pgpaHdQHwyiDk.pgp
Description: PGP signature
_______________________________________________ PyCUDA mailing list PyCUDA@tiker.net http://lists.tiker.net/listinfo/pycuda