Re: GPU Acceleration for spark-3.0.0

2020-06-17 Thread charles_cai
Bobby

Thanks for your answer, it seems that I have misunderstood this paragraph in
the website : *"GPU-accelerate your Apache Spark 3.0 data science
pipelines—without code changes—and speed up data processing and model
training while substantially lowering infrastructure costs."* . So if I am
going to use GPU in my job running on the spark , I still need to code the
map and reduce function in cuda or in c++ and then invoke them throught jni
or something like GPUEnabler , is that right ? 

thanks
Charles



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



GPU Acceleration for spark-3.0.0

2020-06-12 Thread charles_cai
hi,

I have configured the GPU scheduling for spark-3.0.0 on yarn following the
official document ,but the job seems not runing with GPU . Do I need to
modify my code to invoke CUDA ?  Is there any tutorial can be shared ?

running logs:
...
2020-06-13 10:58:01,938 INFO spark.SparkContext: Running Spark version
3.0.0-preview2
2020-06-13 10:58:04,101 INFO resource.ResourceUtils:
==
2020-06-13 10:58:04,105 INFO resource.ResourceUtils: Resources for
spark.driver:
gpu -> [name: gpu, addresses: 0]


spark-default.conf:
...
spark.executor.resource.gpu.amount  1
spark.worker.resource.gpu.amount1
spark.driver.resource.gpu.amount1
spark.driver.resource.gpu.discoveryScript  
/usr/local/spark-3.0.0/examples/src/main/scripts/getGpusResources.sh
spark.worker.resource.gpu.discoveryScript  
/usr/local/spark-3.0.0/examples/src/main/scripts/getGpusResources.sh
 

nodemanager log:
...
2020-06-13 10:55:07,702 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.ResourcePluginManager:
Found Resource plugins from configuration: [yarn.io/gpu]
2020-06-13 10:55:07,745 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.gpu.GpuDiscoverer:
Trying to discover GPU information ...
2020-06-13 10:55:10,601 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.gpu.GpuDiscoverer:
Discovered GPU information: === GPUs in the system ===
Driver Version:440.82
ProductName=GeForce GTX 950M, MinorNumber=0, TotalMemory=2004MiB,
Utilization=2.0%


Thanks
charles



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: NoClassDefFoundError: scala/Product$class

2020-06-07 Thread charles_cai
The org.bdgenomics.adam is one of the Components of the GATK, and I just
download the release version from its github website . However, when I build
a new  docker image with spark2.4.5 and scala 2.12.4,It works well and that
makes me confused.


root@master2:~# pyspark 
Python 2.7.17 (default, Apr 15 2020, 17:20:14) 
[GCC 7.5.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
20/06/08 01:44:16 WARN NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile:
org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use
setLogLevel(newLevel).
Welcome to
    __
 / __/__  ___ _/ /__
_\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 2.4.5
  /_/

Using Python version 2.7.17 (default, Apr 15 2020 17:20:14)
SparkSession available as 'spark'.


root@master2:~# scala -version
Scala code runner version 2.12.4 -- Copyright 2002-2017, LAMP/EPFL and
Lightbend, Inc.




--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: NoClassDefFoundError: scala/Product$class

2020-06-05 Thread charles_cai
Hi Pol, 

thanks for your suggestion, I am going to use Spark-3.0.0 for GPU
acceleration,so I update the scala to the *version 2.12.11* and the latest
*2.13* ,but the error is still there, and by the way , the Spark version is
*spark-3.0.0-preview2-bin-without-hadoop*

Caused by: java.lang.ClassNotFoundException: scala.Product$class
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)

Charles cai



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



NoClassDefFoundError: scala/Product$class

2020-06-02 Thread charles_cai
Hi,

I run the GATK MarkDuplicates in Spark mode and it throws an
*NoClassDefFoundError: scala/Product$class*. The GATK version is 4.1.7 and
4.0.0,the environment is: spark-3.0.0, scala-2.11.12

*GATK commands:*

gatk MarkDuplicatesSpark \
-I hdfs://master2:9000/Drosophila/output/Drosophila.sorted.bam \
-O hdfs://master2:9000/Drosophila/output/Drosophila.sorted.markdup.bam \
-M
hdfs://master2:9000/Drosophila/output/Drosophila.sorted.markdup_metrics.txt
\
-- \
--spark-runner SPARK --spark-master spark://master2:7077

*error logs:*

Exception in thread "main" java.lang.NoClassDefFoundError:
scala/Product$class 
   at
org.bdgenomics.adam.serialization.InputStreamWithDecoder.(ADAMKryoRegistrator.scala:35)
 
   at
org.bdgenomics.adam.serialization.AvroSerializer.(ADAMKryoRegistrator.scala:45)
 
   at
org.bdgenomics.adam.models.VariantContextSerializer.(VariantContext.scala:94)
 
   at
org.bdgenomics.adam.serialization.ADAMKryoRegistrator.registerClasses(ADAMKryoRegistrator.scala:179)
 
   at
org.broadinstitute.hellbender.engine.spark.GATKRegistrator.registerClasses(GATKRegistrator.java:78)
 
   at
org.apache.spark.serializer.KryoSerializer.$anonfun$newKryo$8(KryoSerializer.scala:170)
 
   at
org.apache.spark.serializer.KryoSerializer.$anonfun$newKryo$8$adapted(KryoSerializer.scala:170)
 
   at scala.Option.foreach(Option.scala:407) 
   at
org.apache.spark.serializer.KryoSerializer.$anonfun$newKryo$5(KryoSerializer.scala:170)
 
   at
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) 
   at
org.apache.spark.util.Utils$.withContextClassLoader(Utils.scala:221) 
   at
org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:161) 
   at
org.apache.spark.serializer.KryoSerializer$$anon$1.create(KryoSerializer.scala:102)
 
   at
com.esotericsoftware.kryo.pool.KryoPoolQueueImpl.borrow(KryoPoolQueueImpl.java:48)
 
   at
org.apache.spark.serializer.KryoSerializer$PoolWrapper.borrow(KryoSerializer.scala:109)
 
   at
org.apache.spark.serializer.KryoSerializerInstance.borrowKryo(KryoSerializer.scala:336)
 
   at
org.apache.spark.serializer.KryoSerializationStream.(KryoSerializer.scala:256)
 
   at
org.apache.spark.serializer.KryoSerializerInstance.serializeStream(KryoSerializer.scala:422)
 
   at
org.apache.spark.broadcast.TorrentBroadcast$.blockifyObject(TorrentBroadcast.scala:309)
 
   at
org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:137)
 
   at
org.apache.spark.broadcast.TorrentBroadcast.(TorrentBroadcast.scala:91) 
   at
org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:35)
 
   at
org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:77)
 
   at org.apache.spark.SparkContext.broadcast(SparkContext.scala:1494) 
   at org.apache.spark.rdd.NewHadoopRDD.(NewHadoopRDD.scala:80) 
   at
org.apache.spark.SparkContext.$anonfun$newAPIHadoopFile$2(SparkContext.scala:1235)
 
   at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) 
   at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) 
   at org.apache.spark.SparkContext.withScope(SparkContext.scala:771) 
   at
org.apache.spark.SparkContext.newAPIHadoopFile(SparkContext.scala:1221) 
   at
org.apache.spark.api.java.JavaSparkContext.newAPIHadoopFile(JavaSparkContext.scala:484)
 
   at
org.broadinstitute.hellbender.engine.spark.datasources.ReadsSparkSource.getParallelReads(ReadsSparkSource
.java:112) 
   at
org.broadinstitute.hellbender.engine.spark.GATKSparkTool.getUnfilteredReads(GATKSparkTool.java:254)
 
   at
org.broadinstitute.hellbender.engine.spark.GATKSparkTool.getReads(GATKSparkTool.java:220)
 
   at
org.broadinstitute.hellbender.tools.spark.transforms.markduplicates.MarkDuplicatesSpark.runTool(MarkDupli
catesSpark.java:72) 
   at
org.broadinstitute.hellbender.engine.spark.GATKSparkTool.runPipeline(GATKSparkTool.java:387)
 
   at
org.broadinstitute.hellbender.engine.spark.SparkCommandLineProgram.doWork(SparkCommandLineProgram.java:30
) 
   at
org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:136)
 
   at
org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.jav
a:179) 
   at
org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:198)
 
   at
org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:152) 
   at org.broadinstitute.hellbender.Main.mainEntry(Main.java:195) 
   at org.broadinstitute.hellbender.Main.main(Main.java:275) 
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
   at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
   at