Dear both,

Thank you for your thoughts.

The key point I'd like to highlight on Alex's phased plan: fixing (1) — the
encoding side — breaks no deployments and, importantly, immediately lets
operators support literal + in identifiers across engines in environments
where they know no old Java clients are present.

Steps (2) and (3) could also be scheduled or made configurable on the
server side, so each deployment can migrate on its own timeline via a
feature flag rather than waiting for a single coordinated release.

On the web-framework point — fully agree. We hit the same issue in Rust:
frameworks do proper RFC 3986 path decoding by default, so supporting
spaces from Java clients required raw extractors and manual parsing, which
added significant complexity.

Does anyone see objections to proceeding with (1) as a first step?

Best, Christian

On Tue, 14 Apr 2026 at 18:13, Alexandre Dutra <[email protected]> wrote:

> Hi Christian,
>
> Thanks for raising this topic! I agree this is a significant source of
> compatibility problems between Java and non-Java clients.
>
> On the encoding side (client side), the fix is straightforward: encode
> spaces as per RFC 3986, that is, encode a space as `%20`. Since all
> servers, old and new, would correctly decode `%20` as a space, this
> part of the fix is benign.
>
> On the decoding side (server side), however, things are a bit complex.
> A new server can't distinguish between:
>
>   - "+" meaning space (from an old Java client)
>   - "+" meaning literal "+" (from any RFC 3986-compliant client)
>
> Interpreting as space breaks non-Java clients (as happens today), but
> interpreting as literal "+"  would then break all Java clients that
> weren't updated.
>
> I therefore suggest a phased approach:
>
> 1) Release N: Introduce a new encoding method in RESTUtil and "wire it
> up" immediately to ResourcePaths and RESTUtil.encodeNamespace. This
> would immediately benefit non-Java servers.
>
> 2) Release N: For the decoding-side, introduce a new decoding method
> in RESTUtil, but leave it "unwired" for now. In particular,
> RESTUtil.decodeNamespace should keep using the old decode method.
> Java-based servers should also keep using the old method in order to
> not disrupt existing Java clients.
>
> 3) Release N+M: A few releases later, change RESTUtil.decodeNamespace
> to use the new decoding method (behavioral change). At this point,
> each Java-based server should also adopt the new decoding method and
> become RFC compliant. Servers are of course free to adopt the new
> decoding method earlier if that's acceptable for them, e.g. by using a
> feature flag.
>
> As a side note, RESTUtil.decodeString and RESTUtil.decodeNamespace are
> hard to use by modern Java servers because generally, the path is
> decoded by the REST framework. Apache Polaris, for instance, is forced
> to go through a whole convoluted process to be able to use this
> method. We could maybe seize the opportunity to also provide a better
> way of decoding namespaces.
>
> Finally, the tests you pointed at clearly need to be revisited as they
> are implicitly validating the wrong decoding behavior.
>
> Thanks,
> Alex
>
>
> On Tue, Apr 14, 2026 at 7:58 AM Eduard Tudenhöfner
> <[email protected]> wrote:
> >
> > Thanks for bringing this up Christian. I support fixing this in the Java
> client, provided the fix is fully backward-compatible with older clients
> and servers. We should probably add a separate method in RESTUtil to encode
> path segments and be more explicit about when to use x-www-form-urlencoded
> vs RFC 3986 path encoding.
> >
> >
> >
> > On Sat, Apr 11, 2026 at 10:13 PM Christian Thiel <
> [email protected]> wrote:
> >>
> >> Dear all,
> >>
> >> I believe the Java Iceberg REST client encodes namespace and table
> identifiers slightly incorrectly when constructing request URLs. Path
> segments are built with `java.net.URLEncoder.encode(...)`, which implements
> `application/x-www-form-urlencoded` — not RFC 3986 path encoding. The
> visible symptom is that a space becomes `+` instead of `%20`, and a literal
> `+` becomes `%2B` (indistinguishable from an encoded space after
> form-decoding).
> >>
> >> Root cause: `RESTUtil.encodeString(String)` wraps `URLEncoder.encode`.
> It has two kinds of callers with incompatible requirements:
> >>
> >> 1. OAuth2 form bodies (RFC 6749) — current behavior is correct.
> >> 2. URL path segments in `ResourcePaths` (table / view / metrics / plan
> / task) and per-level namespace encoding in `RESTUtil.encodeNamespace` —
> current behavior is wrong per RFC 3986.
> >>
> >> Non-Java engines get this right. DuckDB, for example, sends `%20` for a
> space in a namespace or table name, so a spec-compliant server that
> correctly percent-decodes path segments sees a different identifier
> depending on which client issued the request.
> >>
> >> We are already using the now-customizable separator (`\u001f`) to join
> multi-level namespaces in path segments, which is itself a deviation from a
> pure "one segment per level" RFC approach. That's fine as a deliberate
> choice, but I believe we should still respect RFC 3986 for encoding the
> level contents themselves.
> >>
> >> Impact:
> >> - Any namespace or table identifier containing a space, `+`, or other
> characters where form-urlencoded and RFC 3986 path encoding disagree (I
> believe space is bar far the most important one) is sent on the wire with
> the wrong encoding from the Java client.
> >> - A server that correctly decodes path segments sees `my+ns` instead of
> `my ns` — leading to 404s, silent access of the wrong object, or catalog
> inconsistency if two identifiers collide after decoding (`"a b"` vs
> `"a+b"`).
> >> - Cross-engine interop breaks: an object created by a non-Java client
> with a space in the name is not addressable from the Java client, and vice
> versa.
> >> - At Lakekeeper we have for some time now prohibited creation of
> objects with `+` in their name and interpret `+` in path segments as space
> on read, as a pragmatic workaround. Creation is unambiguous because the
> identifier arrives in the request body, not the path, so we can reject it
> there. Read/update/drop paths are the ones where ambiguity bites. In other
> Catalogs some clients simply can't load or write to affected tables.
> >> - The OAuth2 test in `TestRESTUtil` pins form-encoding behavior, and
> `TestResourcePaths` even asserts `"plan with spaces"` →
> `"plan+with+spaces"` in a path — so the current behavior is locked in by
> tests. No tests cover namespace/table identifiers containing spaces or `+`.
> >>
> >> Does anyone see a problem with fixing this in the Java client? I'd like
> to understand whether anyone is relying on the current encoding (servers
> that form-decode path segments, proxies, intermediate tooling) before
> opening an issue/PR. If it turns out there are too many compatibility
> concerns to fix it outright, I think we should at the very least document
> the current encoding behavior explicitly in the REST spec, so server
> implementers and other clients can interoperate deliberately. Related to
> that, we should also disallow affected identifiers from being routed
> through generic OpenAPI code generation for path parameters — a
> standards-compliant generated client will encode per RFC 3986, and silently
> round-tripping names through such a client against a form-decoding server
> permanently loses the distinction between space and `+` (and the original
> name with it).
> >>
> >> Thanks,
> >> Christian
> >>
> >> References (permalinks on `main` @ `7e4aa89`):
> >> - `RESTUtil.encodeString`:
> https://github.com/apache/iceberg/blob/7e4aa89d9900a52620afd1456152b63b47f2223b/core/src/main/java/org/apache/iceberg/rest/RESTUtil.java#L154-L157
> >> - `RESTUtil.encodeNamespace` per-level encoding:
> https://github.com/apache/iceberg/blob/7e4aa89d9900a52620afd1456152b63b47f2223b/core/src/main/java/org/apache/iceberg/rest/RESTUtil.java#L288-L300
> >> - `ResourcePaths` path-segment callers:
> https://github.com/apache/iceberg/blob/7e4aa89d9900a52620afd1456152b63b47f2223b/core/src/main/java/org/apache/iceberg/rest/ResourcePaths.java#L111
> >> - `TestResourcePaths` pinning `+` for space in a path:
> https://github.com/apache/iceberg/blob/7e4aa89d9900a52620afd1456152b63b47f2223b/core/src/test/java/org/apache/iceberg/rest/TestResourcePaths.java#L321-L330
> >> - `TestRESTUtil.testOAuth2URLEncoding`:
> https://github.com/apache/iceberg/blob/7e4aa89d9900a52620afd1456152b63b47f2223b/core/src/test/java/org/apache/iceberg/rest/TestRESTUtil.java#L143-L149
> >>
>

Reply via email to