Re: [PyOpenCL] More good news

Tomasz Rybak Wed, 08 Aug 2012 13:53:32 -0700

Dnia 2012-07-31, wto o godzinie 02:16 -0400, Andreas Kloeckner pisze:
> Hi all,
> 
> now that we have a flexible scan, a lot of stuff becomes quite easy:
> 
> http://documen.tician.de/pyopencl/array.html#sorting
> 
> :)
> 
> Performance isn't a dream yet, but I've also done exactly zero
> tuning. It manages 34 MKeys/s on Fermi and 42 MKeys/s on Tahiti. For
> comparison, numpy does about 10 MKeys/s on a CPU with a decent memory
> system. The CL code on the CPU achieves about 10 MKeys/s on 4+ cores,
> with the AMD implementation being 50% faster than Intel. (All this is on
> 32-bit integers.) If you've got some time to help tune this... :P
> 
> But the real good news here is that a) this was pretty easy to put
> together on top of the existing scan primitive, and b) it actually
> yields code that works on quite a bunch of CL implementations.
> 
> Hope you're finding this as exciting as me. :)


Yes, it seems very interesting - thanks.

But to worsen your mood a bit ( ;-) ) few tests fail.
All tests run on Loveland (E-350). I started trying
to deal with sorting, but decided to put all found
errors into this email.

test_array.py fails in line 1074 at
sort = RadixSort(context, "int *ary", key_expr="ary[i]",
sort_arg_names=["ary"])

It seems like each RadixSort object tries to register its own
(through _make_sort_scan_type) dtype without alias_ok - which
raises RuntimeError("dtype '%s' already registered
in compyte/dtypes.py line 50.
I am not sure how to deal with that - whether to allow for
aliases, have some sort of caching of RadixSort objects used
to sort the same data type, or what. But IMO it is some sort
of design decision.

Similar case is in test_struct_reduce (line 738).

There is something wrong with test_scan (line 880).
For n=1048577 (2**20+1) and ExclusiveScanKernel I got series of 
26707443 (want: 3173493, got: 26707443, orig: 5)
where it looks like first number (not in parentheses)
is the same as the third one - while third should be
equal to the second one.
InclusiveKernel works OK for all sizes.

I also attach patch fixing some small problems with this test file.
There are sometimes still some problems with
"RuntimeError: clEnqueueNDRangeKernel failed: out of resources"
even though I've added gc.collect() in tests - so I am not sure
whether they all are required.

Best regards.

-- 
Tomasz Rybak  GPG/PGP key ID: 2AD5 9860
Fingerprint A481 824E 7DD3 9C0E C40A  488E C654 FB33 2AD5 9860
http://member.acm.org/~tomaszrybak

diff --git a/test/test_array.py b/test/test_array.py
index 36fb90b..fdf9991 100644
--- a/test/test_array.py
+++ b/test/test_array.py
@@ -589,7 +589,7 @@ def test_astype(ctx_factory):
 
     if not has_double_support(context.devices[0]):
         from py.test import skip
-        skip("double precision not supported on %s" % device)
+        skip("double precision not supported on %s" % context.devices[0])
 
     a_gpu = clrand(queue, (2000,), dtype=np.float32)
 
@@ -898,6 +898,8 @@ def test_copy_if(ctx_factory):
         selected_dev, count_dev = copy_if(a_dev, "ary[i] > myval", [("myval", crit)])
 
         assert (selected_dev.get()[:count_dev.get()] == selected).all()
+        from gc import collect
+        collect()
 
 @pytools.test.mark_test.opencl
 def test_partition(ctx_factory):
@@ -941,6 +943,8 @@ def test_unique(ctx_factory):
         count_unique_dev = count_unique_dev.get()
 
         assert (a_unique_dev.get()[:count_unique_dev] == a_unique_host).all()
+        from gc import collect
+        collect()
 
 @pytools.test.mark_test.opencl
 def test_index_preservation(ctx_factory):
@@ -969,6 +973,8 @@ def test_index_preservation(ctx_factory):
             knl(out)
 
             assert (out.get() == np.arange(n)).all()
+            from gc import collect
+            collect()
 
 @pytools.test.mark_test.opencl
 def test_segmented_scan(ctx_factory):
@@ -1058,6 +1064,8 @@ def test_segmented_scan(ctx_factory):
                     print(n, list(seg_boundaries))
 
                 assert is_correct
+                from gc import collect
+                collect()
 
             print("%d excl:%s done" % (n, is_exclusive))

signature.asc
Description: This is a digitally signed message part

_______________________________________________
PyOpenCL mailing list
[email protected]
http://lists.tiker.net/listinfo/pyopencl

Re: [PyOpenCL] More good news

Reply via email to