Re: GPU Acceleration for spark-3.0.0

2020-06-17 Thread charles_cai
Bobby Thanks for your answer, it seems that I have misunderstood this paragraph in the website : *"GPU-accelerate your Apache Spark 3.0 data science pipelines—without code changes—and speed up data processing and model training while substantially lowering infrastructure costs."* . So if I am

GPU Acceleration for spark-3.0.0

2020-06-12 Thread charles_cai
hi, I have configured the GPU scheduling for spark-3.0.0 on yarn following the official document ,but the job seems not runing with GPU . Do I need to modify my code to invoke CUDA ? Is there any tutorial can be shared ? running logs: ... 2020-06-13 10:58:01,938 INFO spark.SparkContext: Running

Re: NoClassDefFoundError: scala/Product$class

2020-06-07 Thread charles_cai
The org.bdgenomics.adam is one of the Components of the GATK, and I just download the release version from its github website . However, when I build a new docker image with spark2.4.5 and scala 2.12.4,It works well and that makes me confused. root@master2:~# pyspark Python 2.7.17 (default,

Re: NoClassDefFoundError: scala/Product$class

2020-06-05 Thread charles_cai
Hi Pol, thanks for your suggestion, I am going to use Spark-3.0.0 for GPU acceleration,so I update the scala to the *version 2.12.11* and the latest *2.13* ,but the error is still there, and by the way , the Spark version is *spark-3.0.0-preview2-bin-without-hadoop* Caused by:

NoClassDefFoundError: scala/Product$class

2020-06-02 Thread charles_cai
Hi, I run the GATK MarkDuplicates in Spark mode and it throws an *NoClassDefFoundError: scala/Product$class*. The GATK version is 4.1.7 and 4.0.0,the environment is: spark-3.0.0, scala-2.11.12 *GATK commands:* gatk MarkDuplicatesSpark \ -I