Unfortunately the answer you got from the forum is true.  The current Spark-rapids package doesn't support RDD.  Please see https://nvidia.github.io/spark-rapids/docs/FAQ.html#what-parts-of-apache-spark-are-accelerated

I guess to be able to use spark-rapids, one option you have would be to convert Hail to use the DataFrame API instead of RDD.  Hope this helps...

-- ND

On 9/21/21 1:38 PM, Abhishek Shakya wrote:


I am currently trying to run genomic analyses pipelines using Hail(library for genomics analyses written in python and Scala). Recently, Apache Spark 3 was released and it supported GPU usage.

I tried spark-rapids library to start an on-premise slurm cluster with gpu nodes. I was able to initialise the cluster. However, when I tried running hail tasks, the executors kept getting killed.

On querying in Hail forum, I got the response that

That’s a GPU code generator for Spark-SQL, and Hail doesn’t use any Spark-SQL interfaces, only the RDD interfaces.
So, does Spark3 not support GPU usage for RDD interfaces?

PS: The question is posted in stackoverflow as well: Link <https://stackoverflow.com/questions/69273205/does-apache-spark-3-support-gpu-usage-for-spark-rdds>


Abhishek Shakya
Senior Data Scientist 1,
Contact: +919002319890 | Email ID: abhishek.sha...@aganitha.ai <mailto:abhishek.sha...@aganitha.ai>
Aganitha Cognitive Solutions <https://aganitha.ai/>

Reply via email to