On Oct 28, CRV§ADER//KY modulated: > ... Individual instruments > of the same type may have more or less sample points for their curve > attribute. >
OK, I understand that it would be awkward to pack multiple instruments' worth of attributes into one NDarray. So you'll need at least 20K OpenCL jobs, each of which evalutes 2M scenarios. The job will parallelize scenarios for the same instrument. This is a very different operating regime than I am familar with. I might run about 5 jobs to process one whole image, or perhaps 50-100 if I am using sub-block decomposition (repeating the 5 jobs in a Python loop with different NDarrays as inputs). > Can the instrument attributes fit in local device memory? > > Yes, as I said it's 50MB after compressing, more like 300-500MB > uncompressed. > I can optimize though. > I don't mean node memory in a cluster, but OpenCL "local memory" that is shared by compute units in a work group. So, the question is about the attributes for a single instrument. Ideally, you'd want one leader to fetch the attributes from OpenCL global memory into local memory, and then let all work group members reuse these attributes. The scenario values will be distinct for each work item, so those can be fetched directly from global memory by each work item. Hopefully you can vectorize these global loads, whether explicitly in your kernel or just by auto-vectorization in the OpenCL compiler. > Didn't start writing the code yet... I'm still in an exploration phase. > I don't know how complex one instrument calculation is, but it seems to me that you should try to write kernels for a couple instrument sub-classes so you can measure their run times. E.g. try the simplest kernel (in both programming effort and runtime) and then a more costly one if it still seems viable to proceed. With a rough idea of the OpenCL run time for one instrument on all 2M scenarios, you'll have to return to the per-job preparation cost. I would be concerned that this cost might dominate the total run time, if the Python interpreter is doing too much work to marshal the OpenCL input and output buffers and manage file IO. Karl _______________________________________________ PyOpenCL mailing list [email protected] http://lists.tiker.net/listinfo/pyopencl
