Hi Roger,

  1. I'm still new to reading J. Is this basically: multiply two degree
N=2^n-1 polynomials p & q.

  Brute force = N^2 ops.

  Using DFT, we can evalute p(1), p(w), ..., p(w^(N-1)) in O(N log N) [***]
  Thus, we get p(w^i)*q(w^i) for 0 <= i <= N in O(N log N)
  Using one more O(N log N), we do inverse DFT and recover coefficients of
p*q.

  [***] The computation of p(1), p(w), ..., p(w^(N-1)) can be formulated as
a specialized Matrix-Vector multiply. This (2k by 2k) * 2k MV can be
reduced to two (k by k) * k MV + some post processing work. This is where
the O(N log N) comes in.

  Is this specialized MV multiply where the rank n binary hypercube" comes
in? As J's way of formulating the O(N log N) algorithm.

  If this is the case, it appears the rank-n binary hypercube is cute but
unnecessary (and offers no additional parallelism), as it's encoding the
recursive specialized MV.


  2. I'm a big fan of your book AIOJ. Thanks for writing it!


  3. Most of the work I'm interested in is of the form:

  3a. We have some model, with some parameters, say 1GB worth of parameters
  3b. We have lots of data, too much to fit in GPU Memory (or even CPU
Memory).
  3c. We randomly sample a batch of data from the data.
  3d. We evaluate the model + calculate derivatives + do some variant of
gradient descent.
  3e. Repeat


  4. cudnn:
http://docs.nvidia.com/deeplearning/sdk/cudnn-developer-guide/index.html#four-D-tensor-descriptor
only supports dim 3 -> dim 8 tensors; with focus on dim 4 + dim 5 tensors


  5. I'm trying to find a subset of J to target cuda/cudnn. Deciding to
only support dim4 tensors would dramatically simplify targeting cuda/cudnn
functions, but it's not clear to me how much of the power/expressiveness of
J (with regards to numerical optimization of the above form) will be lost
in doing so.


Thanks,
--TongKe


On Fri, Dec 29, 2017 at 11:15 PM, Roger Hui <[email protected]>
wrote:

> I don't know how often, but there is an application of higher-ranked
> arrays: FFT.  http://code.jsoftware.com/wiki/Essays/FFT .  For argument
> vectors of length 2^n, the algorithm creates rank n binary hypercubes,
> shape n$2.  (Subfunction "cube".)
>
>
>
> On Fri, Dec 29, 2017 at 8:10 PM, TongKe Xue <[email protected]> wrote:
>
> > Hi,
> >
> >
> >   1. I agree that the concept of rank + cell + frames + implicit
> > parallel8ism is very cool.
> >
> >
> >   2. 5 dimension is commonly defined as N, C, D, H, W:
> >   N = number of images
> >   C = channels
> >   D = depth
> >   H = height
> >   W = width
> >
> >   In practice, how often do we use algorithms with > 5 dim tensors?
> >
> >
> >   3. In particular, in the case of GPU, if we consider a float array with
> > 10,000 image samples, 10 features, each being 100x100, we're already at:
> >
> > 10^(4+1+2+2) = 10^9 floats. At 4 bytes/float, we're at 4GB already, on
> > video consumer cards that are generally <= 12GB.
> >
> >   Going back to the original question -- for numerical computing in J,
> how
> > often do we use algorithms with > 5 dimension tensors?
> >
> >
> > Thanks,
> > --TongKe
> > ----------------------------------------------------------------------
> > For information about J forums see http://www.jsoftware.com/forums.htm
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to