I don't see anything in this job that would use a GPU?

On Fri, Apr 9, 2021 at 11:19 AM Martin Somers <sono...@gmail.com> wrote:

>
> Hi Everyone !!
>
> Im trying to get on premise GPU instance of Spark 3 running on my ubuntu
> box, and I am following:
>
> https://nvidia.github.io/spark-rapids/docs/get-started/getting-started-on-prem.html#example-join-operation
>
> Anyone with any insight into why a spark job isnt being ran on the GPU -
> appears to be all on the CPU, hadoop binary installed and appears to be
> functioning fine
>
> export SPARK_DIST_CLASSPATH=$(/usr/local/hadoop/bin/hadoop classpath)
>
> here is my setup on ubuntu20.10
>
>
> ▶ nvidia-smi
>
>
> +-----------------------------------------------------------------------------+
> | NVIDIA-SMI 460.39       Driver Version: 460.39       CUDA Version: 11.2
>     |
>
> |-------------------------------+----------------------+----------------------+
> | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr.
> ECC |
> | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute
> M. |
> |                               |                      |               MIG
> M. |
>
> |===============================+======================+======================|
> |   0  GeForce RTX 3090    Off  | 00000000:21:00.0  On |
>  N/A |
> |  0%   38C    P8    19W / 370W |    478MiB / 24265MiB |      0%
>  Default |
> |                               |                      |
>  N/A |
>
> +-------------------------------+----------------------+----------------------+
>
> /opt/sparkRapidsPlugin
>
>
> ▶ ls
> cudf-0.18.1-cuda11.jar  getGpusResources.sh  rapids-4-spark_2.12-0.4.1.jar
>
> ▶ scalac --version
> Scala compiler version 2.13.0 -- Copyright 2002-2019, LAMP/EPFL and
> Lightbend, Inc.
>
>
> ▶ spark-shell --version
> 2021-04-09 17:05:36,158 WARN util.Utils: Your hostname, studio resolves to
> a loopback address: 127.0.1.1; using 192.168.0.221 instead (on interface
> wlp71s0)
> 2021-04-09 17:05:36,159 WARN util.Utils: Set SPARK_LOCAL_IP if you need to
> bind to another address
> WARNING: An illegal reflective access operation has occurred
> WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform
> (file:/opt/spark/jars/spark-unsafe_2.12-3.1.1.jar) to constructor
> java.nio.DirectByteBuffer(long,int)
> WARNING: Please consider reporting this to the maintainers of
> org.apache.spark.unsafe.Platform
> WARNING: Use --illegal-access=warn to enable warnings of further illegal
> reflective access operations
> WARNING: All illegal access operations will be denied in a future release
> Welcome to
>       ____              __
>      / __/__  ___ _____/ /__
>     _\ \/ _ \/ _ `/ __/  '_/
>    /___/ .__/\_,_/_/ /_/\_\   version 3.1.1
>       /_/
>
> Using Scala version 2.12.10, OpenJDK 64-Bit Server VM, 11.0.10
> Branch HEAD
> Compiled by user ubuntu on 2021-02-22T01:04:02Z
> Revision 1d550c4e90275ab418b9161925049239227f3dc9
> Url https://github.com/apache/spark
> Type --help for more information.
>
>
> here is how I calling spark prior to adding the test job
>
> $SPARK_HOME/bin/spark-shell \
>        --master local \
>        --num-executors 1 \
>        --conf spark.executor.cores=16 \
>        --conf spark.rapids.sql.concurrentGpuTasks=1 \
>        --driver-memory 10g \
>        --conf
> spark.executor.extraClassPath=${SPARK_CUDF_JAR}:${SPARK_RAPIDS_PLUGIN_JAR}
>
>        --conf spark.rapids.memory.pinnedPool.size=16G \
>        --conf spark.locality.wait=0s \
>        --conf spark.sql.files.maxPartitionBytes=512m \
>        --conf spark.sql.shuffle.partitions=10 \
>        --conf spark.plugins=com.nvidia.spark.SQLPlugin \
>        --files $SPARK_RAPIDS_DIR/getGpusResources.sh \
>        --jars ${SPARK_CUDF_JAR},${SPARK_RAPIDS_PLUGIN_JAR}
>
>
> Test job is from the example join-operation
>
> val df = sc.makeRDD(1 to 10000000, 6).toDF
> val df2 = sc.makeRDD(1 to 10000000, 6).toDF
> df.select( $"value" as "a").join(df2.select($"value" as "b"), $"a" ===
> $"b").count
>
>
> I just noticed that the scala versions are out of sync - that shouldnt
> affect it?
>
>
> is there anything else I can try in the --conf or is there any logs to see
> what might be failing behind the scenes, any suggestions?
>
>
> Thanks
> Martin
>
>
> --
> M
>

Reply via email to