Hi Andreas, 

I'm currently working on different ideas for the C++ bindings. It should be 
noted that none of the KernelFunctor/kernel.bind stuff made it into the OpenCL 
1.1 C++ specification so don't base your design decisions on it. Leaving it out 
was on purpose since a good solution wasn't obvious. Only a bare minimum 
wrapper for the C is in the spec, none of the sugar coating. Two key issues 
with the KernelFunctor approach:

- It implies an atomic object in regards to setting arguments on a kernel. 
However, setting kernel arguments will never be atomic, and therefore, never 
thread safe. It is better if the API doesn't lead people into thinking 
something is safe if it is not. 
- All the launch parameters are actually optional. There are four parameters to 
think about: global size, local size, offset, and wait event list. 

To address the former the first idea was to merge the operator() methods from 
cl::KernelFunctor into cl::Kernel. To address the latter operator() will only 
handle kernel arguments, and the other four parameters would have setters on 
the cl::Kernel object. (Note, Python does have more flexibility in this regard 
than C++. In C++ the wait event list argument is fundamentally broken as an 
operator() argument in a templated C++ method.) So accounting for the these 
fixes the API would look something like this:

cl::Kernel kernel(....);
kernel.setGlobalRange(...);
kernel.setOffsetRange(...);
kernel.setLocalRange(...);
kernel.setWaitList(...);

cl::Event e = kernel(arg1, arg2, arg3);
e.wait();

Any of the setters could be omitted as there are reasonable defaults for all of 
them. Not a terrible solution, and maybe one for PyOpenCL to consider. However, 
there is something for the C++ bindings to consider. Currently, the 
sizeof(cl::Kernel) == sizeof(cl_kernel), and this is true for every C++ OpenCL 
type. Where this may matter is that there is currently ideas on the table that 
kernels can be passed as arguments to a kernel. (Think if you wanted to set up 
a rendering pipeline and be able to interchange different pieces at runtime. Or 
a numerical optimizer where the objective function can be tweaked with 
different components.) So this has lead me to think that we should can the 
whole idea of a function like kernel object and just go with something a little 
more explicit: 

cl::CommandQueue queue(....);
cl::Kernel kernel(....);

kernel.setArgs(arg1, arg2, arg3);

cl::Event event;
queue.enqueueNDRangeKernel(kernel, 
                                                cl::NullRange, // offset
                                                cl::NDRange(...), // global
                                                cl::NDRange(...), // local
                                                NULL,                 // wait 
list
                                                &event);             // new 
event
event.wait();

Definitely not pretty, but it does adhere to the Zen of Python, "Explicit is 
better than implicit". Thats the second statement of the Zen of Python, the 
previous API (the one with the setters) may better adhere to the first 
statement of the Zen of Python, "Beautiful is better than ugly". And to be 
fair, the C++ wrapper could workaround the kernels as kernel arguments using a 
template specialization of the kernel argument handler. So I'm really split 
50/50, waiting for the wind to push me one way or the other. 

-Brian

On Jun 26, 2010, at 12:51 PM, Andreas Kloeckner wrote:

> On Sat, 26 Jun 2010 12:30:17 -0600, Cyrus Omar <[email protected]> wrote:
>> I think it makes more sense to force both global and local size to be  
>> keyword arguments, as folks are used to the order of positional  
>> arguments in a call being the same as those in a definition and it's  
>> not clear on first glance why e.g. a.shape is being passed instead.  
>> Similarly with queue, these are sort of meta-arguments and should  
>> probably be distinguished as such. This seems more clear:
>> 
>> sum(a, b, dest, queue=queue, global_size=a.shape, local_size=(256,))
>> 
>> Perhaps shorter names could be used (gsize, lsize, q) to minimize  
>> typing? Or I wrote some extensions that implicitly pass a default  
>> queue around and allow kernels to specify how to calculate sizes in  
>> their definitions to mostly eliminate the issue as well.
> 
> I'm -1 on your proposal, since it's a lot of typing.
> 
> Here's a counterproposal, partially based on how the C++ wrapper does
> things:
> 
> bound_sum = sum.bind(queue, None, (256,))
> bound_sum(a, b, dest, global_size=a.shape)
> 
> I.e. create an instance of a 'BoundKernel' type, which represents a
> binding of a kernel to a queue, and which may also hold defaults for
> local and global size, all of which *can* be overriden by kwargs at
> invocation time. This provides the clear separation of meta- and actual
> arguments that you're after.
> 
> Opinions? (Just to be clear--whatever gets decided here will be
> backward-compatible, although it might deprecate current behavior.)
> 
> Andreas
> 
> PS: It'd be good if we had a consensus on this by Tuesday, because
> that's when I'll teach the pyopencl tutorial at SciPy'10. :)
> 
> <ATT00001><ATT00002..txt>


_______________________________________________
PyOpenCL mailing list
[email protected]
http://lists.tiker.net/listinfo/pyopencl

Reply via email to