[ 
https://issues.apache.org/jira/browse/SPARK-3785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14163712#comment-14163712
 ] 

RJ Nowling commented on SPARK-3785:
-----------------------------------

Part of my graduate work involved implementing physics simulations on GPUs and 
managing multi-user GPU clusters.  

>From a performance perspective, we saw 100x+ speed ups on a single machine 
>with a GPU vs multiple cores using specialized GPU implementations such as 
>OpenMM or Gromacs.  But this was using hand-optimized GPU implementations that 
>were pipelined to prevent unnecessary host/GPU copies and do as much work as 
>possible on the GPU.

For clusters, we'd get the 2-5x speed up due to communication overhead between 
host/GPU and other nodes.  In these cases, you could only run a few iterations 
on the GPU before you had to communicate with other nodes.

Thus, GPUs are great if you're doing computation that will run using 
hand-optimized GPU implementations for long periods of time before 
communicating outside the GPU.  But I think you won't get much of a performance 
improvement using simple operations (like RDD operations) without explicit (and 
challenging) pipeline optimization work.

I think the most practical case for Spark/GPU integration is jobs involving 
large chunks of image processing, rendering, linear algebra, etc. work that can 
be done independently in each task.  For example, Naive Bayes where the number 
of features is large enough to fit on the GPUs in a single node but there are 
many, many samples to classify.  In this case, you may be able use a GPU linear 
algebra library to do the GPU operations and move data asynchronously and in 
large chunks to reduce performance issues.

Further, GPU scheduling is immature.  Very little isolation, GPUs often get 
into bad states that require machine reboots, and no OS support so mostly done 
by each application.  It's like MacOS 9 -- have to hope each process is a 
responsible citizen.  I think that would end up being a huge distraction for 
Spark's developers.

I think [~srowen]'s point about calling GPU libraries from your Spark driver is 
probably the most practical solution.


> Support off-loading computations to a GPU
> -----------------------------------------
>
>                 Key: SPARK-3785
>                 URL: https://issues.apache.org/jira/browse/SPARK-3785
>             Project: Spark
>          Issue Type: Brainstorming
>          Components: MLlib
>            Reporter: Thomas Darimont
>            Priority: Minor
>
> Are there any plans to adding support for off-loading computations to the 
> GPU, e.g. via an open-cl binding? 
> http://www.jocl.org/
> https://code.google.com/p/javacl/
> http://lwjgl.org/wiki/index.php?title=OpenCL_in_LWJGL



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to