Re: Help needed regarding error with 5 node Spark cluster (shuffle error)- Comcast

Artemis User Mon, 30 Jan 2023 17:41:51 -0800

Not sure where you get the property name "spark.memory.offHeap.use". Thecorrect one should be "spark.memory.offHeap.enabled". Seehttps://spark.apache.org/docs/latest/configuration.html#spark-propertiesfor details.


On 1/30/23 10:12 AM, Jain, Sanchi wrote:

I am not sure if this is the intended DL for reaching out for help.Please redirect to the right DL
*From: *Jain, Sanchi <sanchi_j...@comcast.com>
*Date: *Monday, January 30, 2023 at 10:10 AM
*To: *priv...@spark.apache.org <priv...@spark.apache.org>
*Subject: *Request for access to create a jira account- Comcast

Hello there
I am a principal engineer at Comcast and my team is currently workingon building a standalone Spark cluster on a 5 node Linux clusterenvironment. We are running into roadblocks due to the following errorobserved when a Spark streaming application is submitted to a remotemaster.
org.apache.spark.shuffle.MetadataFetchFailedException: Missing anoutput location for shuffle 0 partition 11
atorg.apache.spark.MapOutputTracker$.validateStatus(MapOutputTracker.scala:1705)
atorg.apache.spark.MapOutputTracker$.$anonfun$convertMapStatuses$10(MapOutputTracker.scala:1652)
atorg.apache.spark.MapOutputTracker$.$anonfun$convertMapStatuses$10$adapted(MapOutputTracker.scala:1651)
at scala.collection.Iterator.foreach(Iterator.scala:943)

at scala.collection.Iterator.foreach$(Iterator.scala:943)

Here are the other details of the environment configuration –

Software version - spark-3.3.1-bin-hadoop3

Scala version – scala_2.12.15

Total memory assigned to the worker nodes – 14.5 GB (2 GB used)

CPU/Memory assigned to each node – 4 cores/16 GB

Driver memory – 4 G

Executor memory – 3G

Spark-submit command used –
/tmp/spark-3.3.1-bin-hadoop3/bin/spark-submit --master"spark://<master-host>:7077" --conf spark.submit.deployMode=client--conf spark.executor.instances=4 --conf spark.executor.memory=3g--conf spark.driver.memory=4g --conf spark.memory.offHeap.use=true--conf spark.memory.offHeap.size=3g --confspark.sql.broadcastTimeout=300s --confspark.sql.autoBroadcastThreshold=1g --class <application-class-name>./<application-jar-name>.jar
We will really appreciate if we can be assigned a jira account forsubmitting an issue in this regard or if we can reach out to the ASFcommunity for help.
Thanks

Sanchita Jain

sanchita_j...@comcast.com

Re: Help needed regarding error with 5 node Spark cluster (shuffle error)- Comcast

Reply via email to