Hi Andreas,
I'm currently working on different ideas for the C++ bindings. It should be
noted that none of the KernelFunctor/kernel.bind stuff made it into the OpenCL
1.1 C++ specification so don't base your design decisions on it. Leaving it out
was on purpose since a good solution wasn't obvious. Only a bare minimum
wrapper for the C is in the spec, none of the sugar coating. Two key issues
with the KernelFunctor approach:
- It implies an atomic object in regards to setting arguments on a kernel.
However, setting kernel arguments will never be atomic, and therefore, never
thread safe. It is better if the API doesn't lead people into thinking
something is safe if it is not.
- All the launch parameters are actually optional. There are four parameters to
think about: global size, local size, offset, and wait event list.
To address the former the first idea was to merge the operator() methods from
cl::KernelFunctor into cl::Kernel. To address the latter operator() will only
handle kernel arguments, and the other four parameters would have setters on
the cl::Kernel object. (Note, Python does have more flexibility in this regard
than C++. In C++ the wait event list argument is fundamentally broken as an
operator() argument in a templated C++ method.) So accounting for the these
fixes the API would look something like this:
cl::Kernel kernel(....);
kernel.setGlobalRange(...);
kernel.setOffsetRange(...);
kernel.setLocalRange(...);
kernel.setWaitList(...);
cl::Event e = kernel(arg1, arg2, arg3);
e.wait();
Any of the setters could be omitted as there are reasonable defaults for all of
them. Not a terrible solution, and maybe one for PyOpenCL to consider. However,
there is something for the C++ bindings to consider. Currently, the
sizeof(cl::Kernel) == sizeof(cl_kernel), and this is true for every C++ OpenCL
type. Where this may matter is that there is currently ideas on the table that
kernels can be passed as arguments to a kernel. (Think if you wanted to set up
a rendering pipeline and be able to interchange different pieces at runtime. Or
a numerical optimizer where the objective function can be tweaked with
different components.) So this has lead me to think that we should can the
whole idea of a function like kernel object and just go with something a little
more explicit:
cl::CommandQueue queue(....);
cl::Kernel kernel(....);
kernel.setArgs(arg1, arg2, arg3);
cl::Event event;
queue.enqueueNDRangeKernel(kernel,
cl::NullRange, // offset
cl::NDRange(...), // global
cl::NDRange(...), // local
NULL, // wait
list
&event); // new
event
event.wait();
Definitely not pretty, but it does adhere to the Zen of Python, "Explicit is
better than implicit". Thats the second statement of the Zen of Python, the
previous API (the one with the setters) may better adhere to the first
statement of the Zen of Python, "Beautiful is better than ugly". And to be
fair, the C++ wrapper could workaround the kernels as kernel arguments using a
template specialization of the kernel argument handler. So I'm really split
50/50, waiting for the wind to push me one way or the other.
-Brian
On Jun 26, 2010, at 12:51 PM, Andreas Kloeckner wrote:
> On Sat, 26 Jun 2010 12:30:17 -0600, Cyrus Omar <[email protected]> wrote:
>> I think it makes more sense to force both global and local size to be
>> keyword arguments, as folks are used to the order of positional
>> arguments in a call being the same as those in a definition and it's
>> not clear on first glance why e.g. a.shape is being passed instead.
>> Similarly with queue, these are sort of meta-arguments and should
>> probably be distinguished as such. This seems more clear:
>>
>> sum(a, b, dest, queue=queue, global_size=a.shape, local_size=(256,))
>>
>> Perhaps shorter names could be used (gsize, lsize, q) to minimize
>> typing? Or I wrote some extensions that implicitly pass a default
>> queue around and allow kernels to specify how to calculate sizes in
>> their definitions to mostly eliminate the issue as well.
>
> I'm -1 on your proposal, since it's a lot of typing.
>
> Here's a counterproposal, partially based on how the C++ wrapper does
> things:
>
> bound_sum = sum.bind(queue, None, (256,))
> bound_sum(a, b, dest, global_size=a.shape)
>
> I.e. create an instance of a 'BoundKernel' type, which represents a
> binding of a kernel to a queue, and which may also hold defaults for
> local and global size, all of which *can* be overriden by kwargs at
> invocation time. This provides the clear separation of meta- and actual
> arguments that you're after.
>
> Opinions? (Just to be clear--whatever gets decided here will be
> backward-compatible, although it might deprecate current behavior.)
>
> Andreas
>
> PS: It'd be good if we had a consensus on this by Tuesday, because
> that's when I'll teach the pyopencl tutorial at SciPy'10. :)
>
> <ATT00001><ATT00002..txt>
_______________________________________________
PyOpenCL mailing list
[email protected]
http://lists.tiker.net/listinfo/pyopencl