Thanks for bringing this up Christian. I support fixing this in the Java client, provided the fix is fully backward-compatible with older clients and servers. We should probably add a separate method in RESTUtil to encode path segments and be more explicit about when to use x-www-form-urlencoded vs RFC 3986 path encoding.
On Sat, Apr 11, 2026 at 10:13 PM Christian Thiel <[email protected]> wrote: > Dear all, > > I believe the Java Iceberg REST client encodes namespace and table > identifiers slightly incorrectly when constructing request URLs. Path > segments are built with `java.net.URLEncoder.encode(...)`, which implements > `application/x-www-form-urlencoded` — not RFC 3986 path encoding. The > visible symptom is that a space becomes `+` instead of `%20`, and a literal > `+` becomes `%2B` (indistinguishable from an encoded space after > form-decoding). > > Root cause: `RESTUtil.encodeString(String)` wraps `URLEncoder.encode`. It > has two kinds of callers with incompatible requirements: > > 1. OAuth2 form bodies (RFC 6749) — current behavior is correct. > 2. URL path segments in `ResourcePaths` (table / view / metrics / plan / > task) and per-level namespace encoding in `RESTUtil.encodeNamespace` — > current behavior is wrong per RFC 3986. > > Non-Java engines get this right. DuckDB, for example, sends `%20` for a > space in a namespace or table name, so a spec-compliant server that > correctly percent-decodes path segments sees a different identifier > depending on which client issued the request. > > We are already using the now-customizable separator (`\u001f`) to join > multi-level namespaces in path segments, which is itself a deviation from a > pure "one segment per level" RFC approach. That's fine as a deliberate > choice, but I believe we should still respect RFC 3986 for encoding the > level contents themselves. > > Impact: > - Any namespace or table identifier containing a space, `+`, or other > characters where form-urlencoded and RFC 3986 path encoding disagree (I > believe space is bar far the most important one) is sent on the wire with > the wrong encoding from the Java client. > - A server that correctly decodes path segments sees `my+ns` instead of > `my ns` — leading to 404s, silent access of the wrong object, or catalog > inconsistency if two identifiers collide after decoding (`"a b"` vs > `"a+b"`). > - Cross-engine interop breaks: an object created by a non-Java client with > a space in the name is not addressable from the Java client, and vice versa. > - At Lakekeeper we have for some time now prohibited creation of objects > with `+` in their name and interpret `+` in path segments as space on read, > as a pragmatic workaround. Creation is unambiguous because the identifier > arrives in the request body, not the path, so we can reject it there. > Read/update/drop paths are the ones where ambiguity bites. In other > Catalogs some clients simply can't load or write to affected tables. > - The OAuth2 test in `TestRESTUtil` pins form-encoding behavior, and > `TestResourcePaths` even asserts `"plan with spaces"` → > `"plan+with+spaces"` in a path — so the current behavior is locked in by > tests. No tests cover namespace/table identifiers containing spaces or `+`. > > Does anyone see a problem with fixing this in the Java client? I'd like to > understand whether anyone is relying on the current encoding (servers that > form-decode path segments, proxies, intermediate tooling) before opening an > issue/PR. If it turns out there are too many compatibility concerns to fix > it outright, I think we should at the very least document the current > encoding behavior explicitly in the REST spec, so server implementers and > other clients can interoperate deliberately. Related to that, we should > also disallow affected identifiers from being routed through generic > OpenAPI code generation for path parameters — a standards-compliant > generated client will encode per RFC 3986, and silently round-tripping > names through such a client against a form-decoding server permanently > loses the distinction between space and `+` (and the original name with it). > > Thanks, > Christian > > References (permalinks on `main` @ `7e4aa89`): > - `RESTUtil.encodeString`: > https://github.com/apache/iceberg/blob/7e4aa89d9900a52620afd1456152b63b47f2223b/core/src/main/java/org/apache/iceberg/rest/RESTUtil.java#L154-L157 > - `RESTUtil.encodeNamespace` per-level encoding: > https://github.com/apache/iceberg/blob/7e4aa89d9900a52620afd1456152b63b47f2223b/core/src/main/java/org/apache/iceberg/rest/RESTUtil.java#L288-L300 > - `ResourcePaths` path-segment callers: > https://github.com/apache/iceberg/blob/7e4aa89d9900a52620afd1456152b63b47f2223b/core/src/main/java/org/apache/iceberg/rest/ResourcePaths.java#L111 > - `TestResourcePaths` pinning `+` for space in a path: > https://github.com/apache/iceberg/blob/7e4aa89d9900a52620afd1456152b63b47f2223b/core/src/test/java/org/apache/iceberg/rest/TestResourcePaths.java#L321-L330 > - `TestRESTUtil.testOAuth2URLEncoding`: > https://github.com/apache/iceberg/blob/7e4aa89d9900a52620afd1456152b63b47f2223b/core/src/test/java/org/apache/iceberg/rest/TestRESTUtil.java#L143-L149 > >
