[ 
https://issues.apache.org/jira/browse/SPARK-33668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Sharma updated SPARK-33668:
------------------------------------
    Description: 
The test is flaking, with multiple flaked instances - the reason for the 
failure has been similar to:
{code:java}
  The code passed to eventually never returned normally. Attempted 109 times 
over 3.0079882413999997 minutes. Last failure message: Failure executing: GET 
at: 
https://192.168.39.167:8443/api/v1/namespaces/b37fc72a991b49baa68a2eaaa1516463/pods/spark-pi-97a9bc76308e7fe3-exec-1/log?pretty=false.
 Message: pods "spark-pi-97a9bc76308e7fe3-exec-1" not found. Received status: 
Status(apiVersion=v1, code=404, details=StatusDetails(causes=[], group=null, 
kind=pods, name=spark-pi-97a9bc76308e7fe3-exec-1, retryAfterSeconds=null, 
uid=null, additionalProperties={}), kind=Status, message=pods 
"spark-pi-97a9bc76308e7fe3-exec-1" not found, metadata=ListMeta(_continue=null, 
remainingItemCount=null, resourceVersion=null, selfLink=null, 
additionalProperties={}), reason=NotFound, status=Failure, 
additionalProperties={}).. (KubernetesSuite.scala:402)
{code}

https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36854/console
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36852/console
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36850/console
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36848/console

>From the above failures, it seems, that executor finishes too quickly and is 
>removed by spark before the test can complete. 

So, in order to mitigate this situation, one way is to turn on the flag

{code}
   "spark.kubernetes.executor.deleteOnTermination"
{code}

  was:
The test is flaking, and at more than one instance and the reason for the 
failure is
{code:java}
  The code passed to eventually never returned normally. Attempted 109 times 
over 3.0079882413999997 minutes. Last failure message: Failure executing: GET 
at: 
https://192.168.39.167:8443/api/v1/namespaces/b37fc72a991b49baa68a2eaaa1516463/pods/spark-pi-97a9bc76308e7fe3-exec-1/log?pretty=false.
 Message: pods "spark-pi-97a9bc76308e7fe3-exec-1" not found. Received status: 
Status(apiVersion=v1, code=404, details=StatusDetails(causes=[], group=null, 
kind=pods, name=spark-pi-97a9bc76308e7fe3-exec-1, retryAfterSeconds=null, 
uid=null, additionalProperties={}), kind=Status, message=pods 
"spark-pi-97a9bc76308e7fe3-exec-1" not found, metadata=ListMeta(_continue=null, 
remainingItemCount=null, resourceVersion=null, selfLink=null, 
additionalProperties={}), reason=NotFound, status=Failure, 
additionalProperties={}).. (KubernetesSuite.scala:402)
{code}

>From the above failure, it seems, that executor finishes too quickly and is 
>removed by spark before the test can complete. 

So, in order to mitigate this situation, one way is to turn on the flag

{code}
   "spark.kubernetes.executor.deleteOnTermination"
{code}


> Fix flaky test "Verify logging configuration is picked from the provided 
> SPARK_CONF_DIR/log4j.properties."
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-33668
>                 URL: https://issues.apache.org/jira/browse/SPARK-33668
>             Project: Spark
>          Issue Type: Bug
>          Components: Kubernetes, Tests
>    Affects Versions: 3.1.0
>            Reporter: Prashant Sharma
>            Priority: Major
>
> The test is flaking, with multiple flaked instances - the reason for the 
> failure has been similar to:
> {code:java}
>   The code passed to eventually never returned normally. Attempted 109 times 
> over 3.0079882413999997 minutes. Last failure message: Failure executing: GET 
> at: 
> https://192.168.39.167:8443/api/v1/namespaces/b37fc72a991b49baa68a2eaaa1516463/pods/spark-pi-97a9bc76308e7fe3-exec-1/log?pretty=false.
>  Message: pods "spark-pi-97a9bc76308e7fe3-exec-1" not found. Received status: 
> Status(apiVersion=v1, code=404, details=StatusDetails(causes=[], group=null, 
> kind=pods, name=spark-pi-97a9bc76308e7fe3-exec-1, retryAfterSeconds=null, 
> uid=null, additionalProperties={}), kind=Status, message=pods 
> "spark-pi-97a9bc76308e7fe3-exec-1" not found, 
> metadata=ListMeta(_continue=null, remainingItemCount=null, 
> resourceVersion=null, selfLink=null, additionalProperties={}), 
> reason=NotFound, status=Failure, additionalProperties={}).. 
> (KubernetesSuite.scala:402)
> {code}
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36854/console
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36852/console
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36850/console
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36848/console
> From the above failures, it seems, that executor finishes too quickly and is 
> removed by spark before the test can complete. 
> So, in order to mitigate this situation, one way is to turn on the flag
> {code}
>    "spark.kubernetes.executor.deleteOnTermination"
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to