[ 
https://issues.apache.org/jira/browse/SPARK-42090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated SPARK-42090:
---------------------------
    Description: 
Previously a boolean variable, saslTimeoutSeen, was used in 
RetryingBlockTransferor. However, the boolean variable wouldn't cover the 
following scenario:

1. SaslTimeoutException
2. IOException
3. SaslTimeoutException
4. IOException

Even though IOException at #2 is retried (resulting in increment of 
retryCount), the retryCount would be cleared at step #4.
Since the intention of saslTimeoutSeen is to undo the increment due to retrying 
SaslTimeoutException, we should keep a counter for SaslTimeoutException retries 
and subtract the value of this counter from retryCount.

  was:
Previously a boolean variable, saslTimeoutSeen, was used. However, the boolean 
variable wouldn't cover the following scenario:

1. SaslTimeoutException
2. IOException
3. SaslTimeoutException
4. IOException

Even though IOException at #2 is retried (resulting in increment of 
retryCount), the retryCount would be cleared at step #4.
Since the intention of saslTimeoutSeen is to undo the increment due to retrying 
SaslTimeoutException, we should keep a counter for SaslTimeoutException retries 
and subtract the value of this counter from retryCount.


> Introduce sasl retry count in RetryingBlockTransferor
> -----------------------------------------------------
>
>                 Key: SPARK-42090
>                 URL: https://issues.apache.org/jira/browse/SPARK-42090
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 3.4.0
>            Reporter: Ted Yu
>            Priority: Major
>
> Previously a boolean variable, saslTimeoutSeen, was used in 
> RetryingBlockTransferor. However, the boolean variable wouldn't cover the 
> following scenario:
> 1. SaslTimeoutException
> 2. IOException
> 3. SaslTimeoutException
> 4. IOException
> Even though IOException at #2 is retried (resulting in increment of 
> retryCount), the retryCount would be cleared at step #4.
> Since the intention of saslTimeoutSeen is to undo the increment due to 
> retrying SaslTimeoutException, we should keep a counter for 
> SaslTimeoutException retries and subtract the value of this counter from 
> retryCount.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to