Actually, I think matrix multiplication IS exceptional: at least it's different from vector addition or multiplication. The operation takes O(n^3) arithmetic operations to produce a result of O(n^2) atoms, so even if the data-transfer is relatively expensive, a big reduction in time spent on arithmetic may outweigh the cost of data management. For n>100 I'd expect the GPU to be a winner.

Henry Rich


On 6/21/2016 9:43 PM, bill lam wrote:
The main difficulty in using GPU is memory, not just memory
bandwidth, but also how to pipe data into GPU and fine tuning
block size so that memory reference can be localized within
each core. matrix multiplication is no exception.

Ср, 22 июн 2016, JGeneral написал(а):
in my tessts with Arrayfire (bindings here:
https://github.com/Pascal-J/Jfire )

what I found annoying was the JIT compilation step.  I think Futhark does away 
with this step, or at least provides a saveable version.

all recent Intel/AMD chips have decent built in GPUs with low latency.

Even on faster dedicated cards though, you can keep data/results there if there 
is further processing to do.

things like martix multiplication and other similar tasks are 10x to 100x 
faster (iirc) including the round trip back to cpu.




----- Original Message -----
From: bill lam <[email protected]>
To: 'Pascal Jasmin' via General <[email protected]>
Sent: Tuesday, June 21, 2016 8:26 PM
Subject: Re: [Jgeneral] GPU APL compiler work

INO benefit of using GPU for implementing APL (and J) primiivies
is questionable. Most primitives are simple and the efficiency
of APL/J comes the processing large arrays. The time needed to
read/write GPU memory for large array is not justified
unless the job is highly looped eg, encoding/decoding jpeg.


Пн, 20 июн 2016, JGeneral написал(а):
Interesting recent projects,

TAIL - typed array intermediate language
http://www.elsman.com/pdf/array14_final.pdf

uses structures very similar to J's internal noun format.  (all of the items 
are the same anyway, though it perhaps only has int and double data types)

Semantics for core operations are similar to J (take with negative index takes 
from the end)


used with a SML apl to TAIL compiler

https://github.com/melsman/apltail/

A more interesting project is the Futhark language, and its leveraging of the 
above 2 projects to target GPUs, and extends datatypes to char, bool, tuples.

Futhark feels higher level and cleaner than TAIL.


spec paper: http://futhark-lang.org/publications/fhpc16.pdf

more general overview/benchmark/example site:

http://futhark-lang.org/index.html

pretty much every link there is interesting.
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
--
regards,
====================================================
GPG key 1024D/4434BAB3 2008-08-24
gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3
gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to