Ray Mattingly created HBASE-29630:
-------------------------------------
Summary: RetryingCallerInterceptor should be wired up for batch
calls
Key: HBASE-29630
URL: https://issues.apache.org/jira/browse/HBASE-29630
Project: HBase
Issue Type: Improvement
Affects Versions: 2.6.3
Reporter: Ray Mattingly
Assignee: Ray Mattingly
h3. Summary
{{RetryingCallerInterceptor}} should be invoked for *batch operations*
({{{}callWithoutRetries(...){}}}) in addition to single operations
({{{}callWithRetries(...){}}}). This will ensure consistent behavior for
failure handling, fast-fail logic, and custom metrics across all client
operations.
h3. Problem
* *Single operations* ({{{}HTable.get(Get){}}}, etc.) run through
{{{}RpcRetryingCaller.callWithRetries(){}}}, which wires in interceptor hooks
({{{}intercept(){}}}, {{{}handleFailure(){}}}, {{{}updateFailureInfo(){}}}).
* *Batch operations* ({{{}HTable.batch(){}}}, {{{}HTable.get(List<Get>){}}})
go through {{{}RpcRetryingCaller.callWithoutRetries(){}}}, which currently
bypasses interceptor hooks.
* As a result:
** Custom interceptors (e.g., handling {{{}RpcThrottlingException{}}}) do not
run for batch calls.
** Metrics and fast-fail logic only apply to single operations, not batch
operations.
** Behavior is inconsistent between single and batch APIs.
h3. Proposed Change
Update {{RpcRetryingCallerImpl.callWithoutRetries(...)}} to use the same
interceptor lifecycle as {{{}callWithRetries(...){}}}:
# Create an interceptor context at call start.
# Call {{interceptor.intercept()}} before executing the RPC.
# On exception:
** Translate the throwable.
** Pass the translated exception into {{{}interceptor.handleFailure(){}}}.
** Preserve {{PreemptiveFastFailException}} behavior.
# Call {{interceptor.updateFailureInfo()}} in the {{finally}} block when
failures occur.
# Add an {{AttemptType.NO_RETRY}} marker to the context for clarity in
logs/metrics.
# Optionally enrich the context when {{RetriesExhaustedWithDetailsException}}
(REWDE) is thrown, so interceptors can see per-item failures.
h3. Benefits
* *Consistency:* Single and batch operations follow the same interceptor
pattern.
* *Extensibility:* Interceptors can handle throttling and emit metrics for
batch calls without custom forks.
* *Maintainability:* Small, contained change in {{{}RpcRetryingCallerImpl{}}};
no invasive changes to {{{}AsyncProcess{}}}.
* *Compatibility:* No breaking API changes. No-op interceptors behave as
before.
* *Community value:* Opens the door to richer failure handling for all client
operations.
h3. Example Use Case
At my day job we use interceptors to detect {{RpcThrottlingException}} and fail
fast while reporting metrics. This works for single gets/puts but is currently
bypassed in batch calls, limiting visibility and resilience. With this change,
batch requests can also fast-fail and report throttling metrics.
h3. Risks & Mitigation
* *Performance regression:* Interceptor calls are lightweight; validate with
before/after microbenchmarks.
* *Behavior change for existing interceptors:* Only makes interceptors {_}more
consistent{_}. Default no-op interceptor ensures compatibility.
* *Naming confusion (“RetryingCaller” on no-retry path):* Clarify with
Javadoc: interceptors apply to both retrying and no-retry paths.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)