[ 
https://issues.apache.org/jira/browse/SPARK-48309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48309:
-----------------------------------
    Labels: pull-request-available  (was: )

> Stop am retry, in situations where some errors and retries may not be 
> successful
> --------------------------------------------------------------------------------
>
>                 Key: SPARK-48309
>                 URL: https://issues.apache.org/jira/browse/SPARK-48309
>             Project: Spark
>          Issue Type: Improvement
>          Components: YARN
>    Affects Versions: 4.0.0
>            Reporter: guihuawen
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.0.0
>
>
> In yarn cluster mode, spark.yarn.maxAppAttempts will be configured. In our 
> production environment, it is configured as 2 If the first execution fails, 
> AM will retry. However, in some scenarios, even attempting a second task may 
> fail.
> For example:
> org. apache. park. SQL AnalysisException: Table or view not found: 
> test.testxxxx_xxxxx; Line 1 pos 14;
> Project
> +-Unresolved Relationship [bigdata_qa, testxxxxx_xxxxx], [], false
>  
> Other example:
> Caused by: org. apache. hadoop. hdfs. protocol NSQuotaExceededException: The 
> NameSpace quota (directories and files) of directory/tmp/xxx_file/xxxx is 
> exceeded: quota=1000000 file count=1000001
> Would it be more appropriate to try capturing these exceptions and stopping 
> retry?
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to