Hi Roger,
1. I'm still new to reading J. Is this basically: multiply two degree N=2^n-1 polynomials p & q. Brute force = N^2 ops. Using DFT, we can evalute p(1), p(w), ..., p(w^(N-1)) in O(N log N) [***] Thus, we get p(w^i)*q(w^i) for 0 <= i <= N in O(N log N) Using one more O(N log N), we do inverse DFT and recover coefficients of p*q. [***] The computation of p(1), p(w), ..., p(w^(N-1)) can be formulated as a specialized Matrix-Vector multiply. This (2k by 2k) * 2k MV can be reduced to two (k by k) * k MV + some post processing work. This is where the O(N log N) comes in. Is this specialized MV multiply where the rank n binary hypercube" comes in? As J's way of formulating the O(N log N) algorithm. If this is the case, it appears the rank-n binary hypercube is cute but unnecessary (and offers no additional parallelism), as it's encoding the recursive specialized MV. 2. I'm a big fan of your book AIOJ. Thanks for writing it! 3. Most of the work I'm interested in is of the form: 3a. We have some model, with some parameters, say 1GB worth of parameters 3b. We have lots of data, too much to fit in GPU Memory (or even CPU Memory). 3c. We randomly sample a batch of data from the data. 3d. We evaluate the model + calculate derivatives + do some variant of gradient descent. 3e. Repeat 4. cudnn: http://docs.nvidia.com/deeplearning/sdk/cudnn-developer-guide/index.html#four-D-tensor-descriptor only supports dim 3 -> dim 8 tensors; with focus on dim 4 + dim 5 tensors 5. I'm trying to find a subset of J to target cuda/cudnn. Deciding to only support dim4 tensors would dramatically simplify targeting cuda/cudnn functions, but it's not clear to me how much of the power/expressiveness of J (with regards to numerical optimization of the above form) will be lost in doing so. Thanks, --TongKe On Fri, Dec 29, 2017 at 11:15 PM, Roger Hui <[email protected]> wrote: > I don't know how often, but there is an application of higher-ranked > arrays: FFT. http://code.jsoftware.com/wiki/Essays/FFT . For argument > vectors of length 2^n, the algorithm creates rank n binary hypercubes, > shape n$2. (Subfunction "cube".) > > > > On Fri, Dec 29, 2017 at 8:10 PM, TongKe Xue <[email protected]> wrote: > > > Hi, > > > > > > 1. I agree that the concept of rank + cell + frames + implicit > > parallel8ism is very cool. > > > > > > 2. 5 dimension is commonly defined as N, C, D, H, W: > > N = number of images > > C = channels > > D = depth > > H = height > > W = width > > > > In practice, how often do we use algorithms with > 5 dim tensors? > > > > > > 3. In particular, in the case of GPU, if we consider a float array with > > 10,000 image samples, 10 features, each being 100x100, we're already at: > > > > 10^(4+1+2+2) = 10^9 floats. At 4 bytes/float, we're at 4GB already, on > > video consumer cards that are generally <= 12GB. > > > > Going back to the original question -- for numerical computing in J, > how > > often do we use algorithms with > 5 dimension tensors? > > > > > > Thanks, > > --TongKe > > ---------------------------------------------------------------------- > > For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
