[ 
https://issues.apache.org/jira/browse/SPARK-32411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17173944#comment-17173944
 ] 

Chitral Verma edited comment on SPARK-32411 at 8/9/20, 6:38 PM:
----------------------------------------------------------------

[~vinhdiesal] were you able to resolve this issue?

I'm also facing the same issue, my spark config us as below. the spark session 
initializes but no tasks execute as they stay in a waiting state.

 
{quote}{color:#000000}spark = SparkSession
 .builder
 .master({color}{color:#a31515}"local"{color}{color:#000000}) 
 .config({color}{color:#a31515}"spark.ui.port"{color}{color:#000000}, 
spark_ui_port) 
 .config({color}{color:#a31515}"spark.jars"{color}{color:#000000}, 
{color}{color:#a31515}","{color}{color:#000000}.join(jars))
 .config({color}{color:#a31515}"spark.plugins"{color}{color:#000000}, 
{color}{color:#a31515}"com.nvidia.spark.SQLPlugin"{color}{color:#000000})
 
.config({color}{color:#a31515}"spark.sql.shuffle.partitions"{color}{color:#000000},
 {color}{color:#a31515}"10"{color}{color:#000000})
 
.config({color}{color:#a31515}"spark.driver.resource.gpu.discoveryScript"{color}{color:#000000},
 
{color}{color:#a31515}"/content/sparkRapidsPlugin/getGpusResources.sh"{color}{color:#000000})
 
.config({color}{color:#a31515}"spark.driver.resource.gpu.amount"{color}{color:#000000},
 {color}{color:#a31515}"1"{color}{color:#000000})
 
.config({color}{color:#a31515}"spark.rapids.memory.pinnedPool.size"{color}{color:#000000},
 {color}{color:#a31515}"2G"{color}{color:#000000}) 
 .getOrCreate(){color}
{quote}


was (Author: chitralverma):
[~vinhdiesal] were you able to resolve this issue?

I'm also facing the same issue, my spark config us as below. the spark session 
initializes but no tasks execute as they stay in a waiting state.

 
{quote}{color:#000000}spark = SparkSession \{color}
{color:#000000}.builder \{color}
{color:#000000}.master({color}{color:#a31515}"local"{color}{color:#000000}) 
\{color}
{color:#000000}.config({color}{color:#a31515}"spark.ui.port"{color}{color:#000000},
 spark_ui_port) \{color}
{color:#000000}.config({color}{color:#a31515}"spark.jars"{color}{color:#000000},
 {color}{color:#a31515}","{color}{color:#000000}.join(jars)) \{color}
{color:#000000}.config({color}{color:#a31515}"spark.plugins"{color}{color:#000000},
 {color}{color:#a31515}"com.nvidia.spark.SQLPlugin"{color}{color:#000000}) 
\{color}
{color:#000000}.config({color}{color:#a31515}"spark.sql.shuffle.partitions"{color}{color:#000000},
 {color}{color:#a31515}"10"{color}{color:#000000}) \{color}
{color:#000000}.config({color}{color:#a31515}"spark.driver.resource.gpu.discoveryScript"{color}{color:#000000},
 
{color}{color:#a31515}"/content/sparkRapidsPlugin/getGpusResources.sh"{color}{color:#000000})
 \{color}
{color:#000000}.config({color}{color:#a31515}"spark.driver.resource.gpu.amount"{color}{color:#000000},
 {color}{color:#a31515}"1"{color}{color:#000000}) \{color}
{color:#000000}.config({color}{color:#a31515}"spark.rapids.memory.pinnedPool.size"{color}{color:#000000},
 {color}{color:#a31515}"2G"{color}{color:#000000}) \{color}
{color:#000000}.getOrCreate(){color}{quote}

> GPU Cluster Fail
> ----------------
>
>                 Key: SPARK-32411
>                 URL: https://issues.apache.org/jira/browse/SPARK-32411
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark, Web UI
>    Affects Versions: 3.0.0
>         Environment: Ihave a Apache Spark 3.0 cluster consisting of machines 
> with multiple nvidia-gpus and I connect my jupyter notebook to the cluster 
> using pyspark,
>            Reporter: Vinh Tran
>            Priority: Major
>
> I'm having a difficult time getting a GPU cluster started on Apache Spark 
> 3.0. It was hard to find documentation on this, but I stumbled on a NVIDIA 
> github page for Rapids which suggested the following additional edits to the 
> spark-defaults.conf:
> {code:java}
> spark.task.resource.gpu.amount 0.25
> spark.executor.resource.gpu.discoveryScript 
> ./usr/local/spark/getGpusResources.sh{code}
> I have a Apache Spark 3.0 cluster consisting of machines with multiple 
> nvidia-gpus and I connect my jupyter notebook to the cluster using pyspark, 
> however it results in the following error: 
> {code:java}
> Py4JJavaError: An error occurred while calling 
> None.org.apache.spark.api.java.JavaSparkContext.
> : org.apache.spark.SparkException: You must specify an amount for gpu
>       at 
> org.apache.spark.resource.ResourceUtils$.$anonfun$parseResourceRequest$1(ResourceUtils.scala:142)
>       at scala.collection.immutable.Map$Map1.getOrElse(Map.scala:119)
>       at 
> org.apache.spark.resource.ResourceUtils$.parseResourceRequest(ResourceUtils.scala:142)
>       at 
> org.apache.spark.resource.ResourceUtils$.$anonfun$parseAllResourceRequests$1(ResourceUtils.scala:159)
>       at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
>       at scala.collection.mutable.ArraySeq.foreach(ArraySeq.scala:75)
>       at scala.collection.TraversableLike.map(TraversableLike.scala:238)
>       at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
>       at scala.collection.AbstractTraversable.map(Traversable.scala:108)
>       at 
> org.apache.spark.resource.ResourceUtils$.parseAllResourceRequests(ResourceUtils.scala:159)
>       at 
> org.apache.spark.SparkContext$.checkResourcesPerTask$1(SparkContext.scala:2773)
>       at 
> org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2884)
>       at org.apache.spark.SparkContext.<init>(SparkContext.scala:528)
>       at 
> org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
>       at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>       at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>       at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>       at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
>       at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
>       at py4j.Gateway.invoke(Gateway.java:238)
>       at 
> py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
>       at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
>       at py4j.GatewayConnection.run(GatewayConnection.java:238)
>       at java.lang.Thread.run(Thread.java:748)
> {code}
> After this, I then tried adding another line to the conf per the instructions 
> which results in no errors, however when I log in to the Web UI at 
> localhost:8080, under Running Applications, the state remains at waiting.
> {code:java}
> spark.task.resource.gpu.amount                  2
> spark.executor.resource.gpu.discoveryScript    
> ./usr/local/spark/getGpusResources.sh
> spark.executor.resource.gpu.amount              1
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to