Hi Deron, Good points. I vote that we keep JCUDA and other accelerators we add as an external dependency. This means the user will have to ensure JCuda.jar in the class path and JCuda.DLL/JCuda.so in the LD_LIBRARY_PATH.
I don't think JCuda.jar is platform-specific. Thanks, Niketan Pansare IBM Almaden Research Center E-mail: npansar At us.ibm.com http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar From: Deron Eriksson <deroneriks...@gmail.com> To: dev@systemml.incubator.apache.org Date: 05/18/2016 10:51 AM Subject: Re: Discussion on GPU backend Hi, I'm wondering what would be a good way to handle JCuda in terms of the build release packages. Currently we have 11 artifacts that we are building: systemml-0.10.0-incubating-SNAPSHOT-inmemory.jar systemml-0.10.0-incubating-SNAPSHOT-javadoc.jar systemml-0.10.0-incubating-SNAPSHOT-sources.jar systemml-0.10.0-incubating-SNAPSHOT-src.tar.gz systemml-0.10.0-incubating-SNAPSHOT-src.zip systemml-0.10.0-incubating-SNAPSHOT-standalone.jar systemml-0.10.0-incubating-SNAPSHOT-standalone.tar.gz systemml-0.10.0-incubating-SNAPSHOT-standalone.zip systemml-0.10.0-incubating-SNAPSHOT.jar systemml-0.10.0-incubating-SNAPSHOT.tar.gz systemml-0.10.0-incubating-SNAPSHOT.zip It looks like JCuda is platform-specific, so you typically need different jars/dlls/sos/etc for each platform. If I'm understanding things correctly, if we generated Windows/Linux/LinuxPowerPC/MacOS-specific SystemML artifacts for JCuda, we'd potentially have an enormous number of artifacts. Is this something that could be potentially handled by specific profiles in the pom so that a user might be able to do something like "mvn clean package -P jcuda-windows" so that a user could be responsible for building the platform-specific SystemML jar for jcuda? Or is this something that could be handled differently, by putting the platform-specific jcuda jar on the classpath and any dlls or other needed libraries on the path? Deron On Tue, May 17, 2016 at 10:50 PM, Niketan Pansare <npan...@us.ibm.com> wrote: > Hi Luciano, > > Like all our backends, there is no change in the programming model. The > user submits a DML script and specifies whether she wants to use an > accelerator. Assuming that we compile jcuda jars into SystemML.jar, the > user can use GPU backend using following command: > spark-submit --master yarn-client ... -f MyAlgo.dml -accelerator -exec > hybrid_spark > > The user also needs to set LD_LIBRARY_PATH that points to JCuda DLL or so > files. Please see *https://issues.apache.org/jira/browse/SPARK-1720* > <https://issues.apache.org/jira/browse/SPARK-1720> ... For example: the > user can add following to spark-env.sh > export LD_LIBRARY_PATH=<path to jcuda so>:$LD_LIBRARY_PATH > > The first version of GPU backend will only accelerate CP. In this case, we > have four types of instructions: > 1. CP > 2. GPU (requires GPU on the driver) > 3. SPARK > 4. MR > > Note, the first version will require the CUDA/JCuda dependency to be > installed on the driver only. > > The next version will accelerate our distributed instructions as well. In > this case, we will have six types of instructions: > 1. CP > 2. GPU > 3. SPARK > 4. MR > 5. SPARK-GPU (requires GPU cluster) > 6. MR-GPU (requires GPU cluster) > > Thanks, > > Niketan Pansare > IBM Almaden Research Center > E-mail: npansar At us.ibm.com > http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar > > [image: Inactive hide details for Luciano Resende ---05/17/2016 09:13:24 > PM---Great to see detailed information on this topic Niketan,]Luciano > Resende ---05/17/2016 09:13:24 PM---Great to see detailed information on > this topic Niketan, I guess I have missed when you posted it in > > From: Luciano Resende <luckbr1...@gmail.com> > To: dev@systemml.incubator.apache.org > Date: 05/17/2016 09:13 PM > Subject: Re: Discussion on GPU backend > ------------------------------ > > > > Great to see detailed information on this topic Niketan, I guess I have > missed when you posted it initially. > > Could you elaborate a little more on what is the programming model for when > the user wants to leverage GPU ? Also, today I can submit a job to spark > using --jars and it will handle copying the dependencies to the worker > nodes. If my application wants to leverage GPU, what extras dependencies > will be required on the worker nodes, and how they are going to be > installed/updated on the Spark cluster ? > > > > On Tue, May 3, 2016 at 1:26 PM, Niketan Pansare <npan...@us.ibm.com> > wrote: > > > > > > > Hi all, > > > > I have updated the design document for our GPU backend in the JIRA > > https://issues.apache.org/jira/browse/SYSTEMML-445. The implementation > > details are based on the prototype I created and is available in PR > > https://github.com/apache/incubator-systemml/pull/131. Once we are done > > with the discussion, I can clean up and separate out the GPU backend in a > > separate PR for easier review :) > > > > Here are key design points: > > A GPU backend would implement two abstract classes: > > 1. GPUContext > > 2. GPUObject > > > > > > > > The GPUContext is responsible for GPU memory management and gets > call-backs > > from SystemML's bufferpool on following methods: > > 1. void acquireRead(MatrixObject mo) > > 2. void acquireModify(MatrixObject mo) > > 3. void release(MatrixObject mo, boolean isGPUCopyModified) > > 4. void exportData(MatrixObject mo) > > 5. void evict(MatrixObject mo) > > > > > > > > A GPUObject (like RDDObject and BroadcastObject) is stored in > CacheableData > > object. It contains following methods that are called back from the > > corresponding GPUContext: > > 1. void allocateMemoryOnDevice() > > 2. void deallocateMemoryOnDevice() > > 3. long getSizeOnDevice() > > 4. void copyFromHostToDevice() > > 5. void copyFromDeviceToHost() > > > > > > > > In the initial implementation, we will add JCudaContext and JCudaPointer > > that will extend the above abstract classes respectively. The > JCudaContext > > will be created by ExecutionContextFactory depending on the > user-specified > > accelarator. Analgous to MR/SPARK/CP, we will add a new ExecType: GPU and > > implement GPU instructions. > > > > The above design is general enough so that other people can implement > > custom accelerators (for example: OpenCL) and also follows the design > > principles of our CP bufferpool. > > > > Thanks, > > > > Niketan Pansare > > IBM Almaden Research Center > > E-mail: npansar At us.ibm.com > > http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar > > > > > > -- > Luciano Resende > http://twitter.com/lresende1975 > http://lresende.blogspot.com/ > > > >