Re: Help needed regarding error with 5 node Spark cluster (shuffle error)- Comcast

Mich Talebzadeh Mon, 30 Jan 2023 09:05:08 -0800

Hi,

Identify the cause of the shuffle. Also how are you using HDFS here?


https://community.cloudera.com/t5/Support-Questions/Spark-Metadata-Fetch-Failed-Exception-Missing-an-output/td-p/203771

HTH


   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>


 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Mon, 30 Jan 2023 at 15:15, Jain, Sanchi <sanchi_j...@comcast.com.invalid>
wrote:

> I am not sure if this is the intended DL for reaching out for help. Please
> redirect to the right DL
>
>
>
> *From: *Jain, Sanchi <sanchi_j...@comcast.com>
> *Date: *Monday, January 30, 2023 at 10:10 AM
> *To: *priv...@spark.apache.org <priv...@spark.apache.org>
> *Subject: *Request for access to create a jira account- Comcast
>
> Hello there
>
>
>
> I am a principal engineer at Comcast and my team is currently working on
> building a standalone Spark cluster on a 5 node Linux cluster environment.
> We are running into roadblocks due to the following error observed when a
> Spark streaming application is submitted to a remote master.
>
>
>
> org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output
> location for shuffle 0 partition 11
>
>                 at
> org.apache.spark.MapOutputTracker$.validateStatus(MapOutputTracker.scala:1705)
>
>                 at
> org.apache.spark.MapOutputTracker$.$anonfun$convertMapStatuses$10(MapOutputTracker.scala:1652)
>
>                 at
> org.apache.spark.MapOutputTracker$.$anonfun$convertMapStatuses$10$adapted(MapOutputTracker.scala:1651)
>
>                 at scala.collection.Iterator.foreach(Iterator.scala:943)
>
>                 at scala.collection.Iterator.foreach$(Iterator.scala:943)
>
>
>
> Here are the other details of the environment configuration –
>
> Software version - spark-3.3.1-bin-hadoop3
>
> Scala version – scala_2.12.15
>
> Total memory assigned to the worker nodes – 14.5 GB (2 GB used)
>
> CPU/Memory assigned to each node – 4 cores/16 GB
>
> Driver memory – 4 G
>
> Executor memory – 3G
>
>
>
> Spark-submit command used –
>
>
>
> /tmp/spark-3.3.1-bin-hadoop3/bin/spark-submit --master
> "spark://<master-host>:7077" --conf spark.submit.deployMode=client --conf
> spark.executor.instances=4 --conf spark.executor.memory=3g --conf
> spark.driver.memory=4g --conf spark.memory.offHeap.use=true --conf
> spark.memory.offHeap.size=3g --conf spark.sql.broadcastTimeout=300s --conf
> spark.sql.autoBroadcastThreshold=1g  --class <application-class-name>
> ./<application-jar-name>.jar
>
>
>
> We will really appreciate if we can be assigned a jira account for
> submitting an issue in this regard or if we can reach out to the ASF
> community for help.
>
>
>
> Thanks
>
> Sanchita Jain
>
> sanchita_j...@comcast.com
>
>
>
>
>

Re: Help needed regarding error with 5 node Spark cluster (shuffle error)- Comcast

Reply via email to