Hi,
> > 1. circa 20,000 Python objects of class "Instrument". Each instrument > > is defined by a subclass and a dict of attributes, which will be a > > handful of scalars most of the times, or a couple of MB worth of numpy > > arrays in the worst case. > > Total size: 50MB after compressing the numpy arrays > > > > If I understand correctly, each attribute may be a scalar or a 1D > Numpy array of variable length? Does the attribute shape vary by > individual instrument or by instrument sub-class? > > The attribute shape varies by instrument sub-class. Some instrument types have attributes which are sampled curves. Individual instruments of the same type may have more or less sample points for their curve attribute. > > > 2. 120 risk factors, each of which is a numpy 1D array of 2 million > > doubles (the risk factor values for each simulation scenario) > > Total size: 1.8GB > > > > So, each logical kernel needs as input one 120 scalar row of risk > factors and one instrument attribute set, and it outputs one scalar? > > It needs a subset of the 120 scalar risk factors (between 0 and 20) and a subset of the output of other instruments (between 0 and 150), and it produces one scalar. > > > My calculation happens in two phases: > > 1. > > Simulation: for every one of the 20,000 instruments, calculate the > > instrument value as a function of the instrument scalar settings and a > > subset of the risk factor vectors. There will be different functions > > (kernels) depending on the instrument subclass. The output is always a > > 1D array of 2 million doubles per instrument - or if you prefer, a 2D > > arrray of 20,000 x 2,000,000 doubles. Some instruments require as input > > the output value of other instruments, in a recursive dependency tree. > > Total output size: 300GB > > > > Have you already micro-benchmarked any mappings of this to OpenCL? Didn't start writing the code yet... I'm still in an exploration phase. > It seems to me worth checking: > > A. K OpenCL jobs of shape (N,), with each worker evaluating one of K > instruments for one of N scenarios. > > B. B OpenCL jobs of shape (K/B, N), with each worker evaluating one > of K instruments for one of N scenarios in one of B blocks. > > Can the instrument attributes fit in local device memory? Yes, as I said it's 50MB after compressing, more like 300-500MB uncompressed. I can optimize though. > If so, this > can easily benefit (A) and may also help in (B) if you can structure > the global shape (K/B, N) into smaller (W, N) workgroups that share a > single instrument...? > > These are of course for K instruments in a single class, so they can > share the same kernel. The output should be a K x N array of scalars, > if I understand your problem statement. Theoretically yes, but it's probably simpler to just invoke the same kernel multiple times, once per instrument. With out-of-order execution enabled, I would expect not to get too much penalty for that. > I'd limit the numbers K and N > for testing, before worrying about further decomposition to fit the > device and driver limits which probably cannot cope with a 20K much > less 2M job shape axis. > > I'd test on both GPU and CPU devices, including existing devices in > your cluster. If your cluster isn't the latest generation of CPUs > and/or GPUs, I'd also try to test on newer equipment; there could be > dramatic performance improvements that would allow a much smaller > number of new devices to meet or exceed a large pool of older ones... > > > > 2. > > Vertical aggregation: > > I calculate the value of circa 150 nodes, each of which is a vector of > > 2 million doubles defined as a weighted sum of the value of up to 8,000 > > instruments (with the weights being scalar): node_value = instr_value1 > > * k1 + instr_value2 * k2 + ... + instr_valueN * kn > > Each of the 20,000 instruments can contribute to more than one of the > > output 150 nodes. > > > > This phase seems trivially parallelizable and vectorizable. You can > almost dismiss it while optimizing the phase 1 work and overall data > transfers. > > > Karl > >
_______________________________________________ PyOpenCL mailing list [email protected] http://lists.tiker.net/listinfo/pyopencl
