Thanks for bringing this up Christian. I support fixing this in the Java
client, provided the fix is fully backward-compatible with older clients
and servers. We should probably add a separate method in RESTUtil to encode
path segments and be more explicit about when to use x-www-form-urlencoded
vs RFC 3986 path encoding.



On Sat, Apr 11, 2026 at 10:13 PM Christian Thiel <[email protected]>
wrote:

> Dear all,
>
> I believe the Java Iceberg REST client encodes namespace and table
> identifiers slightly incorrectly when constructing request URLs. Path
> segments are built with `java.net.URLEncoder.encode(...)`, which implements
> `application/x-www-form-urlencoded` — not RFC 3986 path encoding. The
> visible symptom is that a space becomes `+` instead of `%20`, and a literal
> `+` becomes `%2B` (indistinguishable from an encoded space after
> form-decoding).
>
> Root cause: `RESTUtil.encodeString(String)` wraps `URLEncoder.encode`. It
> has two kinds of callers with incompatible requirements:
>
> 1. OAuth2 form bodies (RFC 6749) — current behavior is correct.
> 2. URL path segments in `ResourcePaths` (table / view / metrics / plan /
> task) and per-level namespace encoding in `RESTUtil.encodeNamespace` —
> current behavior is wrong per RFC 3986.
>
> Non-Java engines get this right. DuckDB, for example, sends `%20` for a
> space in a namespace or table name, so a spec-compliant server that
> correctly percent-decodes path segments sees a different identifier
> depending on which client issued the request.
>
> We are already using the now-customizable separator (`\u001f`) to join
> multi-level namespaces in path segments, which is itself a deviation from a
> pure "one segment per level" RFC approach. That's fine as a deliberate
> choice, but I believe we should still respect RFC 3986 for encoding the
> level contents themselves.
>
> Impact:
> - Any namespace or table identifier containing a space, `+`, or other
> characters where form-urlencoded and RFC 3986 path encoding disagree (I
> believe space is bar far the most important one) is sent on the wire with
> the wrong encoding from the Java client.
> - A server that correctly decodes path segments sees `my+ns` instead of
> `my ns` — leading to 404s, silent access of the wrong object, or catalog
> inconsistency if two identifiers collide after decoding (`"a b"` vs
> `"a+b"`).
> - Cross-engine interop breaks: an object created by a non-Java client with
> a space in the name is not addressable from the Java client, and vice versa.
> - At Lakekeeper we have for some time now prohibited creation of objects
> with `+` in their name and interpret `+` in path segments as space on read,
> as a pragmatic workaround. Creation is unambiguous because the identifier
> arrives in the request body, not the path, so we can reject it there.
> Read/update/drop paths are the ones where ambiguity bites. In other
> Catalogs some clients simply can't load or write to affected tables.
> - The OAuth2 test in `TestRESTUtil` pins form-encoding behavior, and
> `TestResourcePaths` even asserts `"plan with spaces"` →
> `"plan+with+spaces"` in a path — so the current behavior is locked in by
> tests. No tests cover namespace/table identifiers containing spaces or `+`.
>
> Does anyone see a problem with fixing this in the Java client? I'd like to
> understand whether anyone is relying on the current encoding (servers that
> form-decode path segments, proxies, intermediate tooling) before opening an
> issue/PR. If it turns out there are too many compatibility concerns to fix
> it outright, I think we should at the very least document the current
> encoding behavior explicitly in the REST spec, so server implementers and
> other clients can interoperate deliberately. Related to that, we should
> also disallow affected identifiers from being routed through generic
> OpenAPI code generation for path parameters — a standards-compliant
> generated client will encode per RFC 3986, and silently round-tripping
> names through such a client against a form-decoding server permanently
> loses the distinction between space and `+` (and the original name with it).
>
> Thanks,
> Christian
>
> References (permalinks on `main` @ `7e4aa89`):
> - `RESTUtil.encodeString`:
> https://github.com/apache/iceberg/blob/7e4aa89d9900a52620afd1456152b63b47f2223b/core/src/main/java/org/apache/iceberg/rest/RESTUtil.java#L154-L157
> - `RESTUtil.encodeNamespace` per-level encoding:
> https://github.com/apache/iceberg/blob/7e4aa89d9900a52620afd1456152b63b47f2223b/core/src/main/java/org/apache/iceberg/rest/RESTUtil.java#L288-L300
> - `ResourcePaths` path-segment callers:
> https://github.com/apache/iceberg/blob/7e4aa89d9900a52620afd1456152b63b47f2223b/core/src/main/java/org/apache/iceberg/rest/ResourcePaths.java#L111
> - `TestResourcePaths` pinning `+` for space in a path:
> https://github.com/apache/iceberg/blob/7e4aa89d9900a52620afd1456152b63b47f2223b/core/src/test/java/org/apache/iceberg/rest/TestResourcePaths.java#L321-L330
> - `TestRESTUtil.testOAuth2URLEncoding`:
> https://github.com/apache/iceberg/blob/7e4aa89d9900a52620afd1456152b63b47f2223b/core/src/test/java/org/apache/iceberg/rest/TestRESTUtil.java#L143-L149
>
>

Reply via email to