Dear all, I believe the Java Iceberg REST client encodes namespace and table identifiers slightly incorrectly when constructing request URLs. Path segments are built with `java.net.URLEncoder.encode(...)`, which implements `application/x-www-form-urlencoded` — not RFC 3986 path encoding. The visible symptom is that a space becomes `+` instead of `%20`, and a literal `+` becomes `%2B` (indistinguishable from an encoded space after form-decoding).
Root cause: `RESTUtil.encodeString(String)` wraps `URLEncoder.encode`. It has two kinds of callers with incompatible requirements: 1. OAuth2 form bodies (RFC 6749) — current behavior is correct. 2. URL path segments in `ResourcePaths` (table / view / metrics / plan / task) and per-level namespace encoding in `RESTUtil.encodeNamespace` — current behavior is wrong per RFC 3986. Non-Java engines get this right. DuckDB, for example, sends `%20` for a space in a namespace or table name, so a spec-compliant server that correctly percent-decodes path segments sees a different identifier depending on which client issued the request. We are already using the now-customizable separator (`\u001f`) to join multi-level namespaces in path segments, which is itself a deviation from a pure "one segment per level" RFC approach. That's fine as a deliberate choice, but I believe we should still respect RFC 3986 for encoding the level contents themselves. Impact: - Any namespace or table identifier containing a space, `+`, or other characters where form-urlencoded and RFC 3986 path encoding disagree (I believe space is bar far the most important one) is sent on the wire with the wrong encoding from the Java client. - A server that correctly decodes path segments sees `my+ns` instead of `my ns` — leading to 404s, silent access of the wrong object, or catalog inconsistency if two identifiers collide after decoding (`"a b"` vs `"a+b"`). - Cross-engine interop breaks: an object created by a non-Java client with a space in the name is not addressable from the Java client, and vice versa. - At Lakekeeper we have for some time now prohibited creation of objects with `+` in their name and interpret `+` in path segments as space on read, as a pragmatic workaround. Creation is unambiguous because the identifier arrives in the request body, not the path, so we can reject it there. Read/update/drop paths are the ones where ambiguity bites. In other Catalogs some clients simply can't load or write to affected tables. - The OAuth2 test in `TestRESTUtil` pins form-encoding behavior, and `TestResourcePaths` even asserts `"plan with spaces"` → `"plan+with+spaces"` in a path — so the current behavior is locked in by tests. No tests cover namespace/table identifiers containing spaces or `+`. Does anyone see a problem with fixing this in the Java client? I'd like to understand whether anyone is relying on the current encoding (servers that form-decode path segments, proxies, intermediate tooling) before opening an issue/PR. If it turns out there are too many compatibility concerns to fix it outright, I think we should at the very least document the current encoding behavior explicitly in the REST spec, so server implementers and other clients can interoperate deliberately. Related to that, we should also disallow affected identifiers from being routed through generic OpenAPI code generation for path parameters — a standards-compliant generated client will encode per RFC 3986, and silently round-tripping names through such a client against a form-decoding server permanently loses the distinction between space and `+` (and the original name with it). Thanks, Christian References (permalinks on `main` @ `7e4aa89`): - `RESTUtil.encodeString`: https://github.com/apache/iceberg/blob/7e4aa89d9900a52620afd1456152b63b47f2223b/core/src/main/java/org/apache/iceberg/rest/RESTUtil.java#L154-L157 - `RESTUtil.encodeNamespace` per-level encoding: https://github.com/apache/iceberg/blob/7e4aa89d9900a52620afd1456152b63b47f2223b/core/src/main/java/org/apache/iceberg/rest/RESTUtil.java#L288-L300 - `ResourcePaths` path-segment callers: https://github.com/apache/iceberg/blob/7e4aa89d9900a52620afd1456152b63b47f2223b/core/src/main/java/org/apache/iceberg/rest/ResourcePaths.java#L111 - `TestResourcePaths` pinning `+` for space in a path: https://github.com/apache/iceberg/blob/7e4aa89d9900a52620afd1456152b63b47f2223b/core/src/test/java/org/apache/iceberg/rest/TestResourcePaths.java#L321-L330 - `TestRESTUtil.testOAuth2URLEncoding`: https://github.com/apache/iceberg/blob/7e4aa89d9900a52620afd1456152b63b47f2223b/core/src/test/java/org/apache/iceberg/rest/TestRESTUtil.java#L143-L149
