[ 
https://issues.apache.org/jira/browse/KAFKA-19981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamal Chandraprakash updated KAFKA-19981:
-----------------------------------------
    Description: 
RemoteLogManager does not distinguish between RemoteStorageException and 
RetriableRemoteStorageException. 

The plugin implementors of RemoteStorageManager might implement Circuit 
breakers and throw RetriableRemoteStorageException exceptions when the object 
storage degrades, we may have to handle the retriable error gracefully in the 
segment copy / deletion path so that it does not pollute the broker logs. 

1. When segment copy fails, then the RemoteLogManager issues a [deletion 
call|https://sourcegraph.com/github.com/apache/kafka/-/blob/storage/src/main/java/org/apache/kafka/server/log/remote/storage/RemoteLogManager.java?L1048]
 to delete that failed uploaded segment. When RemoteStorageManager throws 
RetriableRemoteStorageException, then the deletion call can be avoided. 

2. When RemoteStorageManager throws RetriableRemoteStorageException during the 
deletion, then we can skip marking the 
[failedRemoteDeleteRequestRate|https://sourcegraph.com/github.com/apache/kafka/-/blob/storage/src/main/java/org/apache/kafka/server/log/remote/storage/RemoteLogManager.java?L1517]
 metric. This approach is similar to [copy failure behaviour 
|https://sourcegraph.com/github.com/apache/kafka/-/blob/storage/src/main/java/org/apache/kafka/server/log/remote/storage/RemoteLogManager.java?L1005]
 on retryable error. 

3. Also, good to update the Javadoc of 
[RemoteStorageManager|https://sourcegraph.com/github.com/apache/kafka/-/blob/storage/api/src/main/java/org/apache/kafka/server/log/remote/storage/RemoteStorageManager.java]
 to distinguish b/w RemoteStorageException and RetriableRemoteStorageException.

  was:
RemoteLogManager does not distinguish between RemoteStorageException and 
RetriableRemoteStorageException. 

The plugin implementors of RemoteStorageManager might implement Circuit 
breakers and throw RetriableRemoteStorageException exceptions when the object 
storage degrades, we may have to handle the retriable error gracefully in the 
segment copy / deletion path so that it does not pollute the broker logs. 

1. When segment copy fails, then the RemoteLogManager issues a [deletion 
call|https://sourcegraph.com/github.com/apache/kafka/-/blob/storage/src/main/java/org/apache/kafka/server/log/remote/storage/RemoteLogManager.java?L1048]
 to delete that failed uploaded segment. When RemoteStorageManager throws 
RetriableRemoteStorageException, then the deletion call can be avoided. 


2. When RemoteStorageManager throws RetriableRemoteStorageException during the 
deletion, then we can skip marking the 
[failedRemoteDeleteRequestRate|https://sourcegraph.com/github.com/apache/kafka/-/blob/storage/src/main/java/org/apache/kafka/server/log/remote/storage/RemoteLogManager.java?L1517]
 metric. This approach is similar to [copy failure behaviour 
|https://sourcegraph.com/github.com/apache/kafka/-/blob/storage/src/main/java/org/apache/kafka/server/log/remote/storage/RemoteLogManager.java?L1005]
 on retryable error. 


> Handle retriable remote storage exception in RemoteLogManager
> -------------------------------------------------------------
>
>                 Key: KAFKA-19981
>                 URL: https://issues.apache.org/jira/browse/KAFKA-19981
>             Project: Kafka
>          Issue Type: Task
>          Components: Tiered-Storage
>            Reporter: Kamal Chandraprakash
>            Assignee: Lan Ding
>            Priority: Minor
>
> RemoteLogManager does not distinguish between RemoteStorageException and 
> RetriableRemoteStorageException. 
> The plugin implementors of RemoteStorageManager might implement Circuit 
> breakers and throw RetriableRemoteStorageException exceptions when the object 
> storage degrades, we may have to handle the retriable error gracefully in the 
> segment copy / deletion path so that it does not pollute the broker logs. 
> 1. When segment copy fails, then the RemoteLogManager issues a [deletion 
> call|https://sourcegraph.com/github.com/apache/kafka/-/blob/storage/src/main/java/org/apache/kafka/server/log/remote/storage/RemoteLogManager.java?L1048]
>  to delete that failed uploaded segment. When RemoteStorageManager throws 
> RetriableRemoteStorageException, then the deletion call can be avoided. 
> 2. When RemoteStorageManager throws RetriableRemoteStorageException during 
> the deletion, then we can skip marking the 
> [failedRemoteDeleteRequestRate|https://sourcegraph.com/github.com/apache/kafka/-/blob/storage/src/main/java/org/apache/kafka/server/log/remote/storage/RemoteLogManager.java?L1517]
>  metric. This approach is similar to [copy failure behaviour 
> |https://sourcegraph.com/github.com/apache/kafka/-/blob/storage/src/main/java/org/apache/kafka/server/log/remote/storage/RemoteLogManager.java?L1005]
>  on retryable error. 
> 3. Also, good to update the Javadoc of 
> [RemoteStorageManager|https://sourcegraph.com/github.com/apache/kafka/-/blob/storage/api/src/main/java/org/apache/kafka/server/log/remote/storage/RemoteStorageManager.java]
>  to distinguish b/w RemoteStorageException and 
> RetriableRemoteStorageException.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to