Bobby
Thanks for your answer, it seems that I have misunderstood this paragraph in
the website : *"GPU-accelerate your Apache Spark 3.0 data science
pipelines—without code changes—and speed up data processing and model
training while substantially lowering infrastructure costs."* . So if I am
hi,
I have configured the GPU scheduling for spark-3.0.0 on yarn following the
official document ,but the job seems not runing with GPU . Do I need to
modify my code to invoke CUDA ? Is there any tutorial can be shared ?
running logs:
...
2020-06-13 10:58:01,938 INFO spark.SparkContext: Running
The org.bdgenomics.adam is one of the Components of the GATK, and I just
download the release version from its github website . However, when I build
a new docker image with spark2.4.5 and scala 2.12.4,It works well and that
makes me confused.
root@master2:~# pyspark
Python 2.7.17 (default,
Hi Pol,
thanks for your suggestion, I am going to use Spark-3.0.0 for GPU
acceleration,so I update the scala to the *version 2.12.11* and the latest
*2.13* ,but the error is still there, and by the way , the Spark version is
*spark-3.0.0-preview2-bin-without-hadoop*
Caused by:
Hi,
I run the GATK MarkDuplicates in Spark mode and it throws an
*NoClassDefFoundError: scala/Product$class*. The GATK version is 4.1.7 and
4.0.0,the environment is: spark-3.0.0, scala-2.11.12
*GATK commands:*
gatk MarkDuplicatesSpark \
-I