adriangb opened a new issue, #705:
URL: https://github.com/apache/arrow-rs-object-store/issues/705

   ## Motivation
   
   `object_store` already has the seams for HTTP wrapping — `HttpService`, 
`HttpConnector`, `HttpBuilder::with_http_connector` — but every consumer who 
wants distributed tracing, header injection, custom retry/timeout policy, or a 
non-`reqwest` transport ends up writing the same boilerplate. Concrete pain 
points pulled from one downstream (a DataFusion-backed query engine running 
against both cloud durable storage and an HTTP-based local cache):
   
   1. **No support for `reqwest_middleware::ClientWithMiddleware`.** 
`HttpService` is implemented for `reqwest::Client` but not its middleware-aware 
sibling. Anyone who wants to plug in `reqwest-tracing` (or any other 
`reqwest_middleware::Middleware`) has to write a 30-line adapter that mirrors 
the existing reqwest impl.
   2. **`HttpService` is bespoke, not `tower::Service`.** Cannot directly use 
`tower_http::trace::TraceLayer`, `tower::timeout`, `tower::retry`, 
OpenTelemetry tower layers, or anything else from the tower ecosystem.
   3. **Operation context is gone by the time `HttpService::call` runs.** A 
wrapper sees `GET https://.../{bucket}/{key}` with a `Range:` header. It 
doesn't know whether the originating `ObjectStore` call was `get_opts` vs 
`head` vs `put`, the user's `Path`, or which backend (S3 vs HTTP) issued the 
request. Distinguishing reads from head probes for span attributes has to be 
done by sniffing method + headers + URL shape — brittle and backend-specific.
   4. **No built-in W3C trace-context injection.** Every consumer doing 
distributed tracing reimplements `opentelemetry_http::HeaderInjector` plumbing. 
(This becomes one existing middleware/layer once any of (1)/(2) lands.)
   
   ## Proposal at a glance
   
   Three independent additions, each behind its own opt-in feature, no breaking 
change to existing API:
   
   - **#702** — `impl HttpService for reqwest_middleware::ClientWithMiddleware` 
(high-level shortcut, ~30 lines mirroring the existing reqwest impl). Lands the 
most common use case (`reqwest_tracing::TracingMiddleware`) immediately.
   - **#704** — `TowerHttpConnector` (general primitive). Adapts any 
`tower::Service<http::Request<HttpRequestBody>, Response = 
http::Response<HttpResponseBody>>` into an `HttpConnector`, unlocking 
`tower-http`, `tower-otel-*`, and non-reqwest transports (`hyper`, `ureq`, 
`tower::service_fn` mocks, wasm-friendly clients).
   - **#703** — `ObjectStoreOperation` extension on outbound requests. Each 
backend inserts an `ObjectStoreOperation { kind, location, backend }` into 
`http::Request::extensions` so a wrapping `HttpService` (tower layer or reqwest 
middleware) can produce useful trace spans without sniffing URLs and headers. 
`http::Extensions` is in-process only, so this is invisible on the wire.
   
   ## These PRs are independent
   
   The three PRs have no logical dependency on each other and can be reviewed 
and merged in any order. PR #703 makes the wrappers from #702 and #704 *more 
useful* (operation context becomes visible) but is not a prerequisite. The only 
conflicts between the three are trivial textual merges:
   
   - All three add a `mod` line to `src/client/http/mod.rs`.
   - #702 and #704 each add an optional dep + feature stanza to `Cargo.toml`.
   
   ## Scope notes
   
   - **#702 pinned to `reqwest-middleware = \"0.4\"`** because object_store's 
`reqwest = \"0.12\"` is incompatible with `reqwest-middleware = \"0.5\"` (which 
requires `reqwest 0.13`). Bump together when object_store moves to reqwest 0.13.
   - **#703 instruments only the HTTP (WebDAV) backend** in this PR, to keep 
review surface tractable. AWS, GCP, and Azure each have their own 
request-construction style and are best landed as separate follow-up PRs (one 
per backend) once the mechanism is in place. The `ObjectStoreOperation` types 
and `HttpRequestBuilder::operation` helper are landed in #703 and ready for 
those follow-ups to use directly.
   - **#702 is non-wasm only.** `reqwest-middleware`'s `Middleware` trait 
requires `Send + Sync` while the wasm `HttpService` plumbing uses `?Send`, and 
the channel-based wasm body logic in the existing reqwest impl isn't worth 
duplicating just for this convenience layer.
   - **Tower error mapping is intentionally simple** in #704: `poll_ready` 
failures → `Connect`, `call` failures → `Request`. Can iterate later to mirror 
`HttpError::reqwest`'s richer classification if needed.
   
   ## Out of scope for this initiative
   
   - Built-in OpenTelemetry exporter integration. Once any of #702/#704 lands, 
every existing OTel-aware middleware/layer composes; the crate doesn't need a 
direct `opentelemetry` dependency.
   - Replacing `RetryConfig`. Retries already happen at a higher layer in 
`object_store` and that should remain so.
   - Flipping the `tower` and `reqwest-middleware` features to default-on. 
That's a follow-up after the three PRs soak.
   
   ## Status
   
   All three drafts are CI-green at the time of this filing.
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to