Hi Tomasz,

Tomasz Rybak <[email protected]> writes:
> Sorry if there are two copies of this message.
> I have sent it to the list but received no confirmation
> (nor any error) and checked that archive does not show
> any message from January.

Greylisting.

> I can see that there is already new version (2013.1) in docs,
> marked "in development". I would like for it not to be released
> before fixing problems with parallel prefix scan.

Agreed.

> Problems with scan are only visible on APU Loveland. They do not
> occur on ION, nor on GTX 460. I do not have access to machine
> with NVIDIA CC 3.x so I cannot test prefix scan there.
> I first encountered it in August, and mentioned them in email
> to the list from 2012-08-08 ("Python3 test failures").
> Only recently I had some time and eagerness to look closer into them.
> Tests still fail on recent git version c31944d1e81a.

I've run a bigger bunch of tests across different machines, here's what
I found:

- AMD CPU (13.1):
  OK

- Intel CPU (2012):
  OK

- Nvidia Fermi (CUDA 4.2):
  OK

- Nvidia GT200 (CUDA 4.2):
  OK

- AMD Cypress GPU (12.4 driver):
  (this is as close to Loveland as I can get)

  - Two spurious complex number issues in test_array, I'll attribute
    those to a broken compiler in the old driver. I've asked for the
    driver to be upgraded.

  - "out of resources" in test_sort

  - Everything else fine.
      
- AMD Devastator APU (13.1 driver, Northern Islands):

  - "out of resources" in test_sort

  - Everything else fine.

- AMD Tahiti GPU (13.1 driver, Southern Islands):
  OK

I haven't yet figured out what's behind these "out of resources" errors,
but I'll keep poking. I'd be glad to receive clues. On the whole, I find
these results pretty encouraging, and I'd like to get 2013.1 out as soon
as I can (before I get a chance to go back and break stuff again).

> Failing tests are now in test_algorithm.py, in third group (marked
> scan-related, starting in line 418). I'll describe my observations
> of test_scan function.
> My APU has 2 Computing Units. GenericScanKernel chooses
> k_group_size to be 4096, max_scan_wg_size to be 256,
> and max_intervals to 6.
>
> The first error occurs when there is enough work to fill two Computing
> Units - in my case 2**12+5. It looks like there is problem with passing
> partial result from computations occurring on fist CU to the second one.
> Prefix sum is computed correctly on the second half of the array but
> starting with the wrong point. I have printed interval_results array
> and I have observed that error (difference between the correct value
> of the interval's first element and actual one) is not the value
> of any of the elements of interval_results, nor it is difference
> between interval_results elements. On the other hand difference
> between real and wanted value is similar (i.e. in the same range)
> to the difference between interval_results[4] and interval_results[3].
> In the test I have run just now the error is 10724571 and
> the difference is 10719275; I am not sure if this is relevant though.
>
> Errors are not repeatable - sometimes they occur for small arrays
> (e.g. for 2**12+5) sometimes for larger ones (test I have run
> right now failed for ExclusiveScan of size 2**24+5). The tests'
> failures also depend on order of tests - after changing order of
> elements of array scan_test_counts I got failures for different
> sizes, but always for sizes larger than 2**12. It might be
> some race condition, but I do not understand new scan fully
> and cannot point my finger at one place.
>
> If there is any additional test I can perform please let me know.
> I'll try to investigate it further but I am not sure whether
> it'll work.

I did not observe the failure mode that you saw. What version of the
driver were you using? Any chance you could upgrade to 13.1 to see if
that changes anything?

Andreas

_______________________________________________
PyOpenCL mailing list
[email protected]
http://lists.tiker.net/listinfo/pyopencl

Reply via email to