On Oct 28, CRV§ADER//KY modulated:

> ... Individual instruments
> of the same type may have more or less sample points for their curve
> attribute.
>

OK, I understand that it would be awkward to pack multiple
instruments' worth of attributes into one NDarray.  So you'll need at
least 20K OpenCL jobs, each of which evalutes 2M scenarios.  The job
will parallelize scenarios for the same instrument.

This is a very different operating regime than I am familar with.  I
might run about 5 jobs to process one whole image, or perhaps 50-100
if I am using sub-block decomposition (repeating the 5 jobs in a
Python loop with different NDarrays as inputs).


>     Can the instrument attributes fit in local device memory?
> 
> Yes, as I said it's 50MB after compressing, more like 300-500MB
> uncompressed.
> I can optimize though.
> 

I don't mean node memory in a cluster, but OpenCL "local memory" that
is shared by compute units in a work group.  So, the question is about
the attributes for a single instrument.  Ideally, you'd want one
leader to fetch the attributes from OpenCL global memory into local
memory, and then let all work group members reuse these attributes.

The scenario values will be distinct for each work item, so those can
be fetched directly from global memory by each work item.  Hopefully
you can vectorize these global loads, whether explicitly in your
kernel or just by auto-vectorization in the OpenCL compiler.


> Didn't start writing the code yet... I'm still in an exploration phase.
>  

I don't know how complex one instrument calculation is, but it seems
to me that you should try to write kernels for a couple instrument
sub-classes so you can measure their run times.  E.g. try the simplest
kernel (in both programming effort and runtime) and then a more costly
one if it still seems viable to proceed.

With a rough idea of the OpenCL run time for one instrument on all 2M
scenarios, you'll have to return to the per-job preparation cost. I
would be concerned that this cost might dominate the total run time,
if the Python interpreter is doing too much work to marshal the OpenCL
input and output buffers and manage file IO.


Karl


_______________________________________________
PyOpenCL mailing list
[email protected]
http://lists.tiker.net/listinfo/pyopencl

Reply via email to