Hi Tomasz,

On Tue, 21 Dec 2010 23:21:59 +0100, Tomasz Rybak <bogom...@post.pl> wrote:
> I have wrote attached code that calculates parallel prefix sum.
> There are two variants - the first one (exclusive) is based on article
> by Mark Harris from NVIDIA, second one (inclusive) is based on diagram
> from Wikipedia.
> Inclusive scan could be optimised with regard to shared memory access
> conflicts, similarly to the exclusive version.
> Also it seems that inclusive scan is less stable numerically - results
> differ from CPU version when calculating scan of large arrays.

Again, thanks for your contribution! Here are a few comments on the
scan:

- It doesn't seem like the inclusive and the exclusive version are so
  dissimilar. As such, I don't think we should duplicate code for the
  two. If necessary, I'd even prefer to make this code depend on Mako or
  Jinja (template engines) to avoid code duplication.

- Use warnings.warn, not plain print, for warnings.

- Tests should go into tests/test_gpu_array or some such.

- Formal nitpicks: Please indent comments with the rest of the code.
  PEP 8 says a=value (no spaces) for keyword arguments. Camel case in C
  is yucky, too. :)

- 'Sum' is poor wording for the general associative operation that the
  scan uses--use scan_op perhaps.

Andreas

Attachment: pgpaHdQHwyiDk.pgp
Description: PGP signature

_______________________________________________
PyCUDA mailing list
PyCUDA@tiker.net
http://lists.tiker.net/listinfo/pycuda

Reply via email to