Unfortunately the answer you got from the forum is true. The current
Spark-rapids package doesn't support RDD. Please see
https://nvidia.github.io/spark-rapids/docs/FAQ.html#what-parts-of-apache-spark-are-accelerated
I guess to be able to use spark-rapids, one option you have would be to
convert Hail to use the DataFrame API instead of RDD. Hope this helps...
-- ND
On 9/21/21 1:38 PM, Abhishek Shakya wrote:
Hi,
I am currently trying to run genomic analyses pipelines using
Hail(library for genomics analyses written in python and Scala).
Recently, Apache Spark 3 was released and it supported GPU usage.
I tried spark-rapids library to start an on-premise slurm cluster with
gpu nodes. I was able to initialise the cluster. However, when I tried
running hail tasks, the executors kept getting killed.
On querying in Hail forum, I got the response that
That’s a GPU code generator for Spark-SQL, and Hail doesn’t use any
Spark-SQL interfaces, only the RDD interfaces.
So, does Spark3 not support GPU usage for RDD interfaces?
PS: The question is posted in stackoverflow as well: Link
<https://stackoverflow.com/questions/69273205/does-apache-spark-3-support-gpu-usage-for-spark-rdds>
Regards,
-----------------------------
Abhishek Shakya
Senior Data Scientist 1,
Contact: +919002319890 | Email ID: abhishek.sha...@aganitha.ai
<mailto:abhishek.sha...@aganitha.ai>
Aganitha Cognitive Solutions <https://aganitha.ai/>