[ 
https://issues.apache.org/jira/browse/HBASE-29630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Mattingly updated HBASE-29630:
----------------------------------
    Description: 
h3. Summary

{{RetryingCallerInterceptor}} should be invoked for *batch operations* 
({{{}callWithoutRetries(...){}}}) in addition to single operations 
({{{}callWithRetries(...){}}}). This will ensure consistent behavior for 
failure handling, fast-fail logic, and custom metrics across all client 
operations.
h3. Problem
 * *Single operations* ({{{}HTable.get(Get){}}}, etc.) run through 
{{{}RpcRetryingCaller.callWithRetries(){}}}, which wires in interceptor hooks 
({{{}intercept(){}}}, {{{}handleFailure(){}}}, {{{}updateFailureInfo(){}}}).

 * *Batch operations* ({{{}HTable.batch(){}}}, {{{}HTable.get(List<Get>){}}}) 
go through {{{}RpcRetryingCaller.callWithoutRetries(){}}}, which currently 
bypasses interceptor hooks.

 * As a result:

 * 
 ** Custom interceptors (e.g., handling {{{}RpcThrottlingException{}}}) do not 
run for batch calls.

 * 
 ** Metrics and fast-fail logic only apply to single operations, not batch 
operations.

 * 
 ** Behavior is inconsistent between single and batch APIs.

h3. Proposed Change

Update {{RpcRetryingCallerImpl.callWithoutRetries(...)}} to use the same 
interceptor lifecycle as {{{}callWithRetries(...){}}}:
 # Create an interceptor context at call start.
 # Call {{interceptor.intercept()}} before executing the RPC.
 # On exception:
 ## Translate the throwable.
 ## Pass the translated exception into {{{}interceptor.handleFailure(){}}}.
 ## Preserve {{PreemptiveFastFailException}} behavior.
 # Call {{interceptor.updateFailureInfo()}} in the {{finally}} block when 
failures occur.
 # Add an {{AttemptType.NO_RETRY}} marker to the context for clarity in 
logs/metrics.
 # Optionally enrich the context when {{RetriesExhaustedWithDetailsException}} 
(REWDE) is thrown, so interceptors can see per-item failures.

h3. Benefits
 * *Consistency:* Single and batch operations follow the same interceptor 
pattern.

 * *Extensibility:* Interceptors can handle throttling and emit metrics for 
batch calls without custom forks.

 * *Maintainability:* Small, contained change in {{{}RpcRetryingCallerImpl{}}}; 
no invasive changes to {{{}AsyncProcess{}}}.

 * *Compatibility:* No breaking API changes. No-op interceptors behave as 
before.

 * *Community value:* Opens the door to richer failure handling for all client 
operations.

h3. Example Use Case

At my day job we use interceptors to detect {{RpcThrottlingException}} and fail 
fast while reporting metrics. This works for single gets/puts but is currently 
bypassed in batch calls, limiting visibility and resilience. With this change, 
batch requests can also fast-fail and report throttling metrics.
h3. Risks & Mitigation
 * *Performance regression:* Interceptor calls are lightweight; validate with 
before/after microbenchmarks.

 * *Behavior change for existing interceptors:* Only makes interceptors {_}more 
consistent{_}. Default no-op interceptor ensures compatibility.

 * *Naming confusion (“RetryingCaller” on no-retry path):* Clarify with 
Javadoc: interceptors apply to both retrying and no-retry paths.

  was:
h3. Summary

{{RetryingCallerInterceptor}} should be invoked for *batch operations* 
({{{}callWithoutRetries(...){}}}) in addition to single operations 
({{{}callWithRetries(...){}}}). This will ensure consistent behavior for 
failure handling, fast-fail logic, and custom metrics across all client 
operations.
h3. Problem
 * *Single operations* ({{{}HTable.get(Get){}}}, etc.) run through 
{{{}RpcRetryingCaller.callWithRetries(){}}}, which wires in interceptor hooks 
({{{}intercept(){}}}, {{{}handleFailure(){}}}, {{{}updateFailureInfo(){}}}).

 * *Batch operations* ({{{}HTable.batch(){}}}, {{{}HTable.get(List<Get>){}}}) 
go through {{{}RpcRetryingCaller.callWithoutRetries(){}}}, which currently 
bypasses interceptor hooks.

 * As a result:

 ** Custom interceptors (e.g., handling {{{}RpcThrottlingException{}}}) do not 
run for batch calls.

 ** Metrics and fast-fail logic only apply to single operations, not batch 
operations.

 ** Behavior is inconsistent between single and batch APIs.

h3. Proposed Change

Update {{RpcRetryingCallerImpl.callWithoutRetries(...)}} to use the same 
interceptor lifecycle as {{{}callWithRetries(...){}}}:
 # Create an interceptor context at call start.

 # Call {{interceptor.intercept()}} before executing the RPC.

 # On exception:

 ** Translate the throwable.

 ** Pass the translated exception into {{{}interceptor.handleFailure(){}}}.

 ** Preserve {{PreemptiveFastFailException}} behavior.

 # Call {{interceptor.updateFailureInfo()}} in the {{finally}} block when 
failures occur.

 # Add an {{AttemptType.NO_RETRY}} marker to the context for clarity in 
logs/metrics.

 # Optionally enrich the context when {{RetriesExhaustedWithDetailsException}} 
(REWDE) is thrown, so interceptors can see per-item failures.

h3. Benefits
 * *Consistency:* Single and batch operations follow the same interceptor 
pattern.

 * *Extensibility:* Interceptors can handle throttling and emit metrics for 
batch calls without custom forks.

 * *Maintainability:* Small, contained change in {{{}RpcRetryingCallerImpl{}}}; 
no invasive changes to {{{}AsyncProcess{}}}.

 * *Compatibility:* No breaking API changes. No-op interceptors behave as 
before.

 * *Community value:* Opens the door to richer failure handling for all client 
operations.

h3. Example Use Case

At my day job we use interceptors to detect {{RpcThrottlingException}} and fail 
fast while reporting metrics. This works for single gets/puts but is currently 
bypassed in batch calls, limiting visibility and resilience. With this change, 
batch requests can also fast-fail and report throttling metrics.
h3. Risks & Mitigation
 * *Performance regression:* Interceptor calls are lightweight; validate with 
before/after microbenchmarks.

 * *Behavior change for existing interceptors:* Only makes interceptors {_}more 
consistent{_}. Default no-op interceptor ensures compatibility.

 * *Naming confusion (“RetryingCaller” on no-retry path):* Clarify with 
Javadoc: interceptors apply to both retrying and no-retry paths.


> RetryingCallerInterceptor should be wired up for batch calls
> ------------------------------------------------------------
>
>                 Key: HBASE-29630
>                 URL: https://issues.apache.org/jira/browse/HBASE-29630
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 2.6.3
>            Reporter: Ray Mattingly
>            Assignee: Ray Mattingly
>            Priority: Major
>
> h3. Summary
> {{RetryingCallerInterceptor}} should be invoked for *batch operations* 
> ({{{}callWithoutRetries(...){}}}) in addition to single operations 
> ({{{}callWithRetries(...){}}}). This will ensure consistent behavior for 
> failure handling, fast-fail logic, and custom metrics across all client 
> operations.
> h3. Problem
>  * *Single operations* ({{{}HTable.get(Get){}}}, etc.) run through 
> {{{}RpcRetryingCaller.callWithRetries(){}}}, which wires in interceptor hooks 
> ({{{}intercept(){}}}, {{{}handleFailure(){}}}, {{{}updateFailureInfo(){}}}).
>  * *Batch operations* ({{{}HTable.batch(){}}}, {{{}HTable.get(List<Get>){}}}) 
> go through {{{}RpcRetryingCaller.callWithoutRetries(){}}}, which currently 
> bypasses interceptor hooks.
>  * As a result:
>  * 
>  ** Custom interceptors (e.g., handling {{{}RpcThrottlingException{}}}) do 
> not run for batch calls.
>  * 
>  ** Metrics and fast-fail logic only apply to single operations, not batch 
> operations.
>  * 
>  ** Behavior is inconsistent between single and batch APIs.
> h3. Proposed Change
> Update {{RpcRetryingCallerImpl.callWithoutRetries(...)}} to use the same 
> interceptor lifecycle as {{{}callWithRetries(...){}}}:
>  # Create an interceptor context at call start.
>  # Call {{interceptor.intercept()}} before executing the RPC.
>  # On exception:
>  ## Translate the throwable.
>  ## Pass the translated exception into {{{}interceptor.handleFailure(){}}}.
>  ## Preserve {{PreemptiveFastFailException}} behavior.
>  # Call {{interceptor.updateFailureInfo()}} in the {{finally}} block when 
> failures occur.
>  # Add an {{AttemptType.NO_RETRY}} marker to the context for clarity in 
> logs/metrics.
>  # Optionally enrich the context when 
> {{RetriesExhaustedWithDetailsException}} (REWDE) is thrown, so interceptors 
> can see per-item failures.
> h3. Benefits
>  * *Consistency:* Single and batch operations follow the same interceptor 
> pattern.
>  * *Extensibility:* Interceptors can handle throttling and emit metrics for 
> batch calls without custom forks.
>  * *Maintainability:* Small, contained change in 
> {{{}RpcRetryingCallerImpl{}}}; no invasive changes to {{{}AsyncProcess{}}}.
>  * *Compatibility:* No breaking API changes. No-op interceptors behave as 
> before.
>  * *Community value:* Opens the door to richer failure handling for all 
> client operations.
> h3. Example Use Case
> At my day job we use interceptors to detect {{RpcThrottlingException}} and 
> fail fast while reporting metrics. This works for single gets/puts but is 
> currently bypassed in batch calls, limiting visibility and resilience. With 
> this change, batch requests can also fast-fail and report throttling metrics.
> h3. Risks & Mitigation
>  * *Performance regression:* Interceptor calls are lightweight; validate with 
> before/after microbenchmarks.
>  * *Behavior change for existing interceptors:* Only makes interceptors 
> {_}more consistent{_}. Default no-op interceptor ensures compatibility.
>  * *Naming confusion (“RetryingCaller” on no-retry path):* Clarify with 
> Javadoc: interceptors apply to both retrying and no-retry paths.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to