Re: [CONNECT] Why Can't We Specify Cluster Deploy Mode for Spark Connect?

2024-09-09 Thread Prabodh Agarwal
Oh. This issue is pretty straightforward to solve actually. Particularly,
in spark-3.5.2.

Just download the `spark-connect` maven jar and place it in
`$SPARK_HOME/jars`. Then rebuild the docker image. I saw that I had posted
a comment on this Jira as well. I could fix this up for standalone cluster
at least this way.

On Mon, Sep 9, 2024 at 7:04 PM Nagatomi Yasukazu 
wrote:

> Hi Prabodh,
>
> Thank you for your response.
>
> As you can see from the following JIRA issue, it is possible to run the
> Spark Connect Driver on Kubernetes:
>
> https://issues.apache.org/jira/browse/SPARK-45769
>
> However, this issue describes a problem that occurs when the Driver and
> Executors are running on different nodes. This could potentially be the
> reason why only Standalone mode is currently supported, but I am not
> certain about it.
>
> Thank you for your attention.
>
>
> 2024年9月9日(月) 12:40 Prabodh Agarwal :
>
>> My 2 cents regarding my experience with using spark connect in cluster
>> mode.
>>
>> 1. Create a spark cluster of 2 or more nodes. Make 1 node as master &
>> other nodes as workers. Deploy spark connect pointing to the master node.
>> This works well. The approach is not well documented, but I could figure
>> it out by hit-and-trial.
>> 2. In k8s, by default; we can actually get the executors to run on
>> kubernetes itself. That is pretty straightforward, but the driver continues
>> to run on a local machine. But yeah, I agree as well, making the driver to
>> run on k8s itself would be slick.
>>
>> Thank you.
>>
>>
>> On Mon, Sep 9, 2024 at 6:17 AM Nagatomi Yasukazu 
>> wrote:
>>
>>> Hi All,
>>>
>>> Why is it not possible to specify cluster as the deploy mode for Spark
>>> Connect?
>>>
>>> As discussed in the following thread, it appears that there is an
>>> "arbitrary decision" within spark-submit that "Cluster mode is not
>>> applicable" to Spark Connect.
>>>
>>> GitHub Issue Comment:
>>>
>>> https://github.com/kubeflow/spark-operator/issues/1801#issuecomment-2000494607
>>>
>>> > This will circumvent the submission error you may have gotten if you
>>> tried to just run the SparkConnectServer directly. From my investigation,
>>> that looks to be an arbitrary decision within spark-submit that Cluster
>>> mode is "not applicable" to SparkConnect. Which is sort of true except when
>>> using this operator :)
>>>
>>> I have reviewed the following commit and pull request, but I could not
>>> find any discussion or reason explaining why cluster mode is not available:
>>>
>>> Related Commit:
>>>
>>> https://github.com/apache/spark/commit/11260310f65e1a30f6b00b380350e414609c5fd4
>>>
>>> Related Pull Request:
>>> https://github.com/apache/spark/pull/39928
>>>
>>> This restriction poses a significant obstacle when trying to use Spark
>>> Connect with the Spark Operator. If there is a technical reason for this, I
>>> would like to know more about it. Additionally, if this issue is being
>>> tracked on JIRA or elsewhere, I would appreciate it if you could provide a
>>> link.
>>>
>>> Thank you in advance.
>>>
>>> Best regards,
>>> Yasukazu Nagatomi
>>>
>>


Re: [spark connect] unable to utilize stand alone cluster

2024-08-06 Thread Prabodh Agarwal
Glad to help!

On Tue, 6 Aug, 2024, 17:37 Ilango,  wrote:

>
> Thanks Praboth. Passing —master attr in spark connect command worked like
> charm.  I am able to submit spark connect to my existing stand-alone cluster
>
> Thanks for saving my day once again :)
>
> Thanks,
> Elango
>
>
> On Tue, 6 Aug 2024 at 6:08 PM, Prabodh Agarwal 
> wrote:
>
>> Do you get some error on passing the master option to your spark connect
>> command?
>>
>> On Tue, 6 Aug, 2024, 15:36 Ilango,  wrote:
>>
>>>
>>>
>>>
>>> Thanks Prabodh. I'm having an issue with the Spark Connect connection as
>>> the `spark.master` value is set to `local[*]` in Spark Connect UI, whereas
>>> the actual master node for our Spark standalone cluster is different. I am
>>> passing that master node ip in the Spark Connect Connection. But still it
>>> is not set correctly. Could you please help me update this configuration to
>>> reflect the correct master node value?
>>>
>>>
>>>
>>> This is my spark connect connection
>>>
>>>
>>>
>>> spark = SparkSession.builder\
>>>
>>> .remote("sc://:15002")\
>>>
>>> .getOrCreate()
>>>
>>>
>>> Thanks,
>>> Elango
>>>
>>>
>>> On Tue, 6 Aug 2024 at 5:45 PM, Prabodh Agarwal 
>>> wrote:
>>>
>>>> There is an executors tab on spark connect. It's contents are generally
>>>> similar to the workers section of the spark master ui.
>>>>
>>>> You might need to specify --master option in your spark connect command
>>>> if you haven't done so yet.
>>>>
>>>> On Tue, 6 Aug, 2024, 14:19 Ilango,  wrote:
>>>>
>>>>>
>>>>> Hi all,
>>>>>
>>>>> I am evaluating the use of Spark Connect with my Spark stand-alone
>>>>> cluster, which has a master node and 3 worker nodes. I have successfully
>>>>> created a Spark Connect connection. However, when submitting Spark SQL
>>>>> queries, the jobs are being executed only on the master node, and I do not
>>>>> observe any executors running on the worker nodes, despite requesting 4
>>>>> executors.
>>>>>
>>>>>
>>>>>
>>>>> I would appreciate clarification on whether Spark stand-alone cluster
>>>>> is supported for use with Spark Connect.
>>>>>
>>>>> If so, how can I leverage the existing Spark stand-alone cluster's
>>>>> worker nodes?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Elango
>>>>>
>>>>


Re: [spark connect] unable to utilize stand alone cluster

2024-08-06 Thread Prabodh Agarwal
Do you get some error on passing the master option to your spark connect
command?

On Tue, 6 Aug, 2024, 15:36 Ilango,  wrote:

>
>
>
> Thanks Prabodh. I'm having an issue with the Spark Connect connection as
> the `spark.master` value is set to `local[*]` in Spark Connect UI, whereas
> the actual master node for our Spark standalone cluster is different. I am
> passing that master node ip in the Spark Connect Connection. But still it
> is not set correctly. Could you please help me update this configuration to
> reflect the correct master node value?
>
>
>
> This is my spark connect connection
>
>
>
> spark = SparkSession.builder\
>
> .remote("sc://:15002")\
>
>     .getOrCreate()
>
>
> Thanks,
> Elango
>
>
> On Tue, 6 Aug 2024 at 5:45 PM, Prabodh Agarwal 
> wrote:
>
>> There is an executors tab on spark connect. It's contents are generally
>> similar to the workers section of the spark master ui.
>>
>> You might need to specify --master option in your spark connect command
>> if you haven't done so yet.
>>
>> On Tue, 6 Aug, 2024, 14:19 Ilango,  wrote:
>>
>>>
>>> Hi all,
>>>
>>> I am evaluating the use of Spark Connect with my Spark stand-alone
>>> cluster, which has a master node and 3 worker nodes. I have successfully
>>> created a Spark Connect connection. However, when submitting Spark SQL
>>> queries, the jobs are being executed only on the master node, and I do not
>>> observe any executors running on the worker nodes, despite requesting 4
>>> executors.
>>>
>>>
>>>
>>> I would appreciate clarification on whether Spark stand-alone cluster is
>>> supported for use with Spark Connect.
>>>
>>> If so, how can I leverage the existing Spark stand-alone cluster's
>>> worker nodes?
>>>
>>>
>>>
>>>
>>>
>>>
>>> Thanks,
>>> Elango
>>>
>>


Re: [spark connect] unable to utilize stand alone cluster

2024-08-06 Thread Prabodh Agarwal
There is an executors tab on spark connect. It's contents are generally
similar to the workers section of the spark master ui.

You might need to specify --master option in your spark connect command if
you haven't done so yet.

On Tue, 6 Aug, 2024, 14:19 Ilango,  wrote:

>
> Hi all,
>
> I am evaluating the use of Spark Connect with my Spark stand-alone
> cluster, which has a master node and 3 worker nodes. I have successfully
> created a Spark Connect connection. However, when submitting Spark SQL
> queries, the jobs are being executed only on the master node, and I do not
> observe any executors running on the worker nodes, despite requesting 4
> executors.
>
>
>
> I would appreciate clarification on whether Spark stand-alone cluster is
> supported for use with Spark Connect.
>
> If so, how can I leverage the existing Spark stand-alone cluster's worker
> nodes?
>
>
>
>
>
>
> Thanks,
> Elango
>


Re: [Spark Connect] connection issue

2024-07-29 Thread Prabodh Agarwal
Glad it worked!

On Tue, 30 Jul, 2024, 11:12 Ilango,  wrote:

>
> Thanks Prabodh. I copied the spark connect jar to  $SPARK_HOME/jars
> folder.  And passed the location as —jars attr. Its working now. I could
> submit spark jobs via spark connect.
>
> Really appreciate the help.
>
>
>
> Thanks,
> Elango
>
>
> On Tue, 30 Jul 2024 at 11:05 AM, Prabodh Agarwal 
> wrote:
>
>> Yeah. I understand the problem. One of the ways is to actually place the
>> spark connect jar in the $SPARK_HOME/jars folder. That is how we run spark
>> connect. Using the `--packages` or the `--jars` option is flaky in case of
>> spark connect.
>>
>> You can instead manually place the relevant spark connect jar file in the
>> `$SPARK_HOME/jars` directory and remove the `--packages` or the `--jars`
>> option from your start command.
>>
>> On Mon, Jul 29, 2024 at 7:01 PM Ilango  wrote:
>>
>>>
>>> Thanks Prabodh, Yes I can see the spark connect logs in $SPARK_HOME/logs
>>> path. It seems like the spark connect dependency issue. My spark node is
>>> air gapped node so no internet is allowed. Can I download the spark connect
>>> jar and pom files locally and share the local paths? How can I share the
>>> local jars ?
>>>
>>> Error message:
>>>
>>> :: problems summary ::
>>>
>>>  WARNINGS
>>>
>>> module not found:
>>> org.apache.spark#spark-connect_2.12;3.5.1
>>>
>>>
>>>
>>>  local-m2-cache: tried
>>>
>>>
>>>
>>>
>>> file:/root/.m2/repository/org/apache/spark/spark-connect_2.12/3.5.1/spark-connect_2.12-3.5.1.pom
>>>
>>>
>>>
>>>   -- artifact
>>> org.apache.spark#spark-connect_2.12;3.5.1!spark-connect_2.12.jar:
>>>
>>>
>>>
>>>
>>> file:/root/.m2/repository/org/apache/spark/spark-connect_2.12/3.5.1/spark-connect_2.12-3.5.1.jar
>>>
>>>
>>>
>>>  local-ivy-cache: tried
>>>
>>>
>>>
>>>
>>> /root/.ivy2/local/org.apache.spark/spark-connect_2.12/3.5.1/ivys/ivy.xml
>>>
>>>
>>>
>>>   -- artifact
>>> org.apache.spark#spark-connect_2.12;3.5.1!spark-connect_2.12.jar:
>>>
>>>
>>>
>>>
>>> /root/.ivy2/local/org.apache.spark/spark-connect_2.12/3.5.1/jars/spark-connect_2.12.jar
>>>
>>>
>>>
>>>  central: tried
>>>
>>>
>>>
>>>
>>> https://repo1.maven.org/maven2/org/apache/spark/spark-connect_2.12/3.5.1/spark-connect_2.12-3.5.1.pom
>>>
>>>
>>>
>>>   -- artifact
>>> org.apache.spark#spark-connect_2.12;3.5.1!spark-connect_2.12.jar:
>>>
>>>
>>>
>>>
>>> https://repo1.maven.org/maven2/org/apache/spark/spark-connect_2.12/3.5.1/spark-connect_2.12-3.5.1.jar
>>>
>>>
>>>
>>>  spark-packages: tried
>>>
>>>
>>>
>>>
>>> https://repos.spark-packages.org/org/apache/spark/spark-connect_2.12/3.5.1/spark-connect_2.12-3.5.1.pom
>>>
>>>
>>>
>>>   -- artifact
>>> org.apache.spark#spark-connect_2.12;3.5.1!spark-connect_2.12.jar:
>>>
>>>
>>>
>>>
>>> https://repos.spark-packages.org/org/apache/spark/spark-connect_2.12/3.5.1/spark-connect_2.12-3.5.1.jar
>>>
>>>
>>>
>>> ::
>>>
>>>
>>>
>>> ::  UNRESOLVED DEPENDENCIES ::
>>>
>>>
>>>
>>> ::
>>>
>>>
>>>
>>> :: org.apache.spark#spark-connect_2.12;3.5.1: not found
>>>
>>>
>>>
>>> ::
>>>
>>>
>>>
>>>
>>>
>>>  ERRORS
>>>
>>> Server access error at url
>>> https://repo1.maven.org/maven2/org/apache/spark/spark-connect_2.12/3.5.1/spark-connect_2.12-3.5.1.pom
>>>  (java.net.ConnectException:
>>> Connection timed out (Connection timed out))
>>>
>>>
>>>
>>> Server access error at url
>>> https://repo1.maven.org/maven2/org/apache/spark/spark-connect_2.12/3.5.1/spark-co

Re: [Spark Connect] connection issue

2024-07-29 Thread Prabodh Agarwal
Yeah. I understand the problem. One of the ways is to actually place the
spark connect jar in the $SPARK_HOME/jars folder. That is how we run spark
connect. Using the `--packages` or the `--jars` option is flaky in case of
spark connect.

You can instead manually place the relevant spark connect jar file in the
`$SPARK_HOME/jars` directory and remove the `--packages` or the `--jars`
option from your start command.

On Mon, Jul 29, 2024 at 7:01 PM Ilango  wrote:

>
> Thanks Prabodh, Yes I can see the spark connect logs in $SPARK_HOME/logs
> path. It seems like the spark connect dependency issue. My spark node is
> air gapped node so no internet is allowed. Can I download the spark connect
> jar and pom files locally and share the local paths? How can I share the
> local jars ?
>
> Error message:
>
> :: problems summary ::
>
>  WARNINGS
>
> module not found: org.apache.spark#spark-connect_2.12;3.5.1
>
>
>
>  local-m2-cache: tried
>
>
>
>
> file:/root/.m2/repository/org/apache/spark/spark-connect_2.12/3.5.1/spark-connect_2.12-3.5.1.pom
>
>
>
>   -- artifact
> org.apache.spark#spark-connect_2.12;3.5.1!spark-connect_2.12.jar:
>
>
>
>
> file:/root/.m2/repository/org/apache/spark/spark-connect_2.12/3.5.1/spark-connect_2.12-3.5.1.jar
>
>
>
>  local-ivy-cache: tried
>
>
>
>
> /root/.ivy2/local/org.apache.spark/spark-connect_2.12/3.5.1/ivys/ivy.xml
>
>
>
>   -- artifact
> org.apache.spark#spark-connect_2.12;3.5.1!spark-connect_2.12.jar:
>
>
>
>
> /root/.ivy2/local/org.apache.spark/spark-connect_2.12/3.5.1/jars/spark-connect_2.12.jar
>
>
>
>  central: tried
>
>
>
>
> https://repo1.maven.org/maven2/org/apache/spark/spark-connect_2.12/3.5.1/spark-connect_2.12-3.5.1.pom
>
>
>
>   -- artifact
> org.apache.spark#spark-connect_2.12;3.5.1!spark-connect_2.12.jar:
>
>
>
>
> https://repo1.maven.org/maven2/org/apache/spark/spark-connect_2.12/3.5.1/spark-connect_2.12-3.5.1.jar
>
>
>
>  spark-packages: tried
>
>
>
>
> https://repos.spark-packages.org/org/apache/spark/spark-connect_2.12/3.5.1/spark-connect_2.12-3.5.1.pom
>
>
>
>   -- artifact
> org.apache.spark#spark-connect_2.12;3.5.1!spark-connect_2.12.jar:
>
>
>
>
> https://repos.spark-packages.org/org/apache/spark/spark-connect_2.12/3.5.1/spark-connect_2.12-3.5.1.jar
>
>
>
> ::
>
>
>
> ::  UNRESOLVED DEPENDENCIES ::
>
>
>
> ::
>
>
>
> :: org.apache.spark#spark-connect_2.12;3.5.1: not found
>
>
>
> ::
>
>
>
>
>
>  ERRORS
>
> Server access error at url
> https://repo1.maven.org/maven2/org/apache/spark/spark-connect_2.12/3.5.1/spark-connect_2.12-3.5.1.pom
>  (java.net.ConnectException:
> Connection timed out (Connection timed out))
>
>
>
> Server access error at url
> https://repo1.maven.org/maven2/org/apache/spark/spark-connect_2.12/3.5.1/spark-connect_2.12-3.5.1.jar(java.net.ConnectException:
> Connection timed out (Connection timed out))
>
>
>
> Server access error at url
> https://repos.spark-packages.org/org/apache/spark/spark-connect_2.12/3.5.1/spark-connect_2.12-3.5.1.pom
>  (java.net.ConnectException:
> Connection timed out (Connection timed out))
>
>
>
> Server access error at url
> https://repos.spark-packages.org/org/apache/spark/spark-connect_2.12/3.5.1/spark-connect_2.12-3.5.1.jar(java.net.ConnectException:
> Connection timed out (Connection timed out))
>
>
>
>
>
> :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
>
> Exception in thread "main" java.lang.RuntimeException: [unresolved
> dependency: org.apache.spark#spark-connect_2.12;3.5.1: not found]
>
> at
> org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:1608)
>
> at
> org.apache.spark.util.DependencyUtils$.resolveMavenDependencies(DependencyUtils.scala:185)
>
> at
> org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:334)
>
> at org.apache.spark.deploy.SparkSubmit.org
> $apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:964)
>
> at
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:194)
>
> at
> org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:217)
>
> at
> org.apache.spark.deploy.Spark

Re: [Spark Connect] connection issue

2024-07-29 Thread Prabodh Agarwal
The spark connect startup prints the log location. Is that not feasible for
you?
For me log comes to $SPARK_HOME/logs

On Mon, 29 Jul, 2024, 15:30 Ilango,  wrote:

>
> Hi all,
>
>
> I am facing issues with a Spark Connect application running on a Spark
> standalone cluster (without YARN and HDFS). After executing the
> start-connect-server.sh script with the specified packages, I observe a
> process ID for a short period but am unable to see the corresponding port
> (default 15002) associated with that PID. The process automatically stops
> after around 10 minutes.
>
> Since the Spark History server is not enabled, I am unable to locate the
> relevant logs or error messages. The logs for currently running Spark
> applications are accessible from the Spark UI, but I am unsure where to
> find the logs for the Spark Connect application and service.
>
> Could you please advise on where to find the logs or error messages
> related to Spark Connect?
>
>
>
>
> Thanks,
> Elango
>


running snowflake query using spark connect on a standalone cluster

2024-07-07 Thread Prabodh Agarwal
I have configured a spark standalone cluster as follows:

```
# start spark master
$SPARK_HOME/sbin/start-master.sh

# start 2 spark workers
SPARK_WORKER_INSTANCES=2 $SPARK_HOME/sbin/start-worker.sh
spark://localhost:7077

# start spark connect
$SPARK_HOME/sbin/start-connect-server.sh --properties-file
./connect.properties --master spark://localhost:7077
```

My properties file is defined as follows:

```
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.plugins=io.dataflint.spark.SparkDataflintPlugin spark.jars.packages
org.apache.spark:spark-connect_2.12:3.5.1,org.apache.hadoop:hadoop-aws:3.3.4,org.apache.hudi:hudi-aws:0.15.0,org.apache.hudi:hudi-spark3.5-bundle_2.12:0.15.0,org.apache.spark:spark-avro_2.12:3.5.1,software.amazon.awssdk:sso:2.18.40,io.dataflint:spark_2.12:0.2.2,net.snowflake:spark-snowflake_2.12:2.16.0-spark_3.4,net.snowflake:snowflake-jdbc:3.16.1
spark.driver.extraJavaOptions=-verbose:class
spark.executor.extraJavaOptions=-verbose:class
```

Now I start my pyspark job which connects with this remote instance and
then tries to query the table. The snowflake query is fired correctly. It
shows up in my Snowflake query history, but then I start getting failures.

```
24/07/07 22:37:26 INFO ErrorUtils: Spark Connect error during: execute.
UserId: pbd. SessionId: 462444eb-82d3-475b-8dd0-ce35d5047405.
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0
in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage
0.0 (TID 3) (192.168.29.6 executor 1): java.lang.ClassNotFoundException:
net.snowflake.spark.snowflake.io.SnowflakeResultSetPartition
at
org.apache.spark.executor.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:124)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:594)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:527)
at java.base/java.lang.Class.forName0(Native Method)
at java.base/java.lang.Class.forName(Class.java:398)
at
org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:71)
at
java.base/java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:2003)
at
java.base/java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1870)
at
java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2201)
at
java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1687)
at
java.base/java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2496)
at
java.base/java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2390)
at
java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2228)
at
java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1687)
at
java.base/java.io.ObjectInputStream.readObject(ObjectInputStream.java:489)
at
java.base/java.io.ObjectInputStream.readObject(ObjectInputStream.java:447)
at
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:87)
at
org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:129)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:579)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.ClassNotFoundException:
net.snowflake.spark.snowflake.io.SnowflakeResultSetPartition
at java.base/java.lang.ClassLoader.findClass(ClassLoader.java:724)
at
org.apache.spark.util.ParentClassLoader.findClass(ParentClassLoader.java:35)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:594)
at
org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.java:40)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:527)
at
org.apache.spark.executor.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:109)
... 21 more

Driver stacktrace:
at
org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2856)
at
org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2792)
at
org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2791)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2791)
at
org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1247)
at
org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1247)
at scala.Option.foreach(Option.scala:407)
at
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1247)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:3060)
at
org.