Re: [DISCUSS] URL path decoding issues in Polaris

Dmitri Bourlatchkov Wed, 15 Apr 2026 10:47:55 -0700

Hi All,

Slashes in namespace seem to work fine (Spark 3.5 + Iceberg 1.10.0):


spark-sql ()> create namespace `n/s`;
Time taken: 0.335 seconds
spark-sql ()> show namespaces;
`n/s`
Time taken: 0.232 seconds, Fetched 1 row(s)
spark-sql ()> use `n/s`;
Time taken: 0.028 seconds
spark-sql (`n/s`)> create table t1 (n string);
Time taken: 0.702 seconds

The URLs appear to be encoded properly, e.g. (from Polaris log):

2026-04-15 13:41:17,594 INFO  [io.qua.htt.access-log]
[dee1505c-ec1d-4f90-a9de-154eac66a40c_0000000000000000013,POLARIS] [,,,]
(executor-thread-1) 127.0.0.1 - root [15/Apr/2026:13:41:17 -0400] "GET
/api/catalog/v1/polaris/namespaces/n%2Fs/tables?pageToken= HTTP/1.1" 200 74

I did not test trickier chars, but adding CI coverage for them would be
good.

However, the S3 location in this case gets a proper directory breakdown:

$ mc ls rustfs/pol/n/s
[2026-04-15 13:44:37 EDT]     0B t1/

... and table metadata has: "location":"s3://pol/n/s/t1"

... but that is probably a different issue.

Cheers,
Dmitri.



On Wed, Apr 15, 2026 at 10:35 AM Robert Stupp <[email protected]> wrote:

> Thanks Alex for the thorough investigation!
>
> URL en/decoding is really not that easy.
> I wonder how namespace elements and table/view names with a slash ('/')
> character in the middle behave. Or other characters like '&' or '?' or '#'.
>
> Overall, I agree with your idea to implement correct URL encoding/decoding
> in the Polaris code base to protect Polaris from upstream behavior changes
> that can seriously break or even corrupt things.
>
> What's your take on leveraging jakarta.ws.rs.ext.ParamConverterProvider
> / jakarta.ws.rs.ext.ParamConverter for the path parameters and have
> centralized helpers that deal with "proper" URL encoding/decoding?
>
> I also agree that the "configurable namespace separator" must never change.
> Is my assumption correct, that it must always be the same character as it
> is today?
>
> Best,
> Robert
>
>
> On Wed, Apr 15, 2026 at 3:48 PM Alexandre Dutra <[email protected]> wrote:
>
> > Hi all,
> >
> > FYI I created a first PR to address the double-decoding issue:
> >
> > https://github.com/apache/polaris/pull/4210
> >
> > Thanks,
> > Alex
> >
> > On Tue, Apr 14, 2026 at 9:56 PM Alexandre Dutra <[email protected]>
> wrote:
> > >
> > > Hi all,
> > >
> > > I would also point out that Polaris uses RESTUtil.encodeNamespace and
> > > RESTUtil.decodeNamespace for encoding and decoding the parent
> > > namespace within a NamespaceEntity [1].
> > >
> > > These methods also exhibit the faulty space encoding behavior.
> > > Therefore, we must exercise **extreme caution** regarding any upcoming
> > > Iceberg project fixes for space-encoding issues. If these methods are
> > > modified, it is imperative that we retain the legacy versions
> > > specifically for encoding and decoding NamespaceEntity properties –
> > > otherwise we could end up with a corrupted database.
> > >
> > > The same goes for the future namespace separator coming with Iceberg
> > > 1.11: for the sake of encoding and decoding NamespaceEntity
> > > properties, the separator must never change.
> > >
> > > I would actually be in favor of proactively internalizing the
> > > encoding/decoding algorithm used in NamespaceEntity. What do you
> > > think?
> > >
> > > Thanks,
> > > Alex
> > >
> > > [1]:
> >
> https://github.com/apache/polaris/blob/8ad8f74f62258ab6238190271603e4d4c8a75998/polaris-core/src/main/java/org/apache/polaris/core/entity/NamespaceEntity.java#L92
> > >
> > >
> > > On Tue, Apr 14, 2026 at 7:43 PM Alexandre Dutra <[email protected]>
> > wrote:
> > > >
> > > > Hi all,
> > > >
> > > > A discussion on the Iceberg ML [1] recently highlighted that URL path
> > > > segments are not being decoded correctly according to RFC 3986,
> > > > specifically regarding space encoding.
> > > >
> > > > I investigated the situation in Polaris, and found many problems:
> > > >
> > > > TLDR
> > > >
> > > > - Table names with the + sign can be created but cannot be retrieved
> > > > - Namespace names with the + sign are OK (can be created and
> retrieved)
> > > > - Table names with spaces cannot be created
> > > > - Namespace names with spaces cannot be created
> > > >
> > > > DISCUSSION
> > > >
> > > > Table names such as "foo+bar" can be created (via POST, where the
> name
> > > > is in the request body). But they cannot be retrieved: when reading
> > > > tables, the name is part of the URL path. Polaris incorrectly
> performs
> > > > a second decoding step using RESTUtil.decodeString(table), even
> though
> > > > the REST framework has already decoded it. Consequently, a client
> > > > sends "foo%2Bbar" which is first decoded to "foo+bar" by the
> framework
> > > > (correct) and then re-decoded by Polaris to "foo bar" (incorrect),
> > > > resulting in a "not found" error.
> > > >
> > > > Table and namespace names like "foo bar" simply cannot be created at
> > > > all. This is because in IcebergCatalog.defaultWarehouseLocation() and
> > > > other similar places, we create locations merely by joining
> > > > identifiers together, without any form of URL encoding: see [2] [3].
> > > >
> > > > And even if tables like "foo bar" could be created, they couldn't be
> > > > retrieved by Java clients. This occurs because current Java clients
> > > > incorrectly encode that name as "foo+bar", which the REST framework
> > > > does not modify. Consequently, Polaris would look for a table named
> > > > "foo+bar" instead and throw a "not found" error. (Other clients would
> > > > send "foo%20bar" which would be correctly decoded by the framework as
> > > > "foo bar", and thus it would succeed.)
> > > >
> > > > PROPOSAL
> > > >
> > > > To resolve the issue with the + sign in table names, we simply need
> to
> > > > eliminate the redundant decoding step. I can open a PR for that
> > > > shortly.
> > > >
> > > > To resolve the issue with spaces in table and namespace names, we
> > > > could fix all the methods that incorrectly join together identifiers
> > > > without proper URL encoding.
> > > >
> > > > Finally, addressing the Java clients encoding problem is complex, but
> > > > we could consider implementing a workaround as follows:
> > > >
> > > > 1) If the client is Java and lacks the upcoming Iceberg fix for space
> > > > encoding, manually replace "+" with a space to correct the client's
> > > > faulty encoding.
> > > >
> > > > 2) For non-Java clients or those with the fix, no workaround would be
> > required.
> > > >
> > > > What are your thoughts on this?
> > > >
> > > > Thanks,
> > > > Alex
> > > >
> > > > [1]:
> https://lists.apache.org/thread/c498svln0x18vvm42998b9nm9j6ck5yh
> > > > [2]:
> >
> https://github.com/apache/polaris/blob/e94fdff63852dc41635c9e7eb62b3627ba562b85/runtime/service/src/main/java/org/apache/polaris/service/catalog/iceberg/IcebergCatalog.java#L379
> > > > [3]:
> >
> https://github.com/apache/polaris/blob/e94fdff63852dc41635c9e7eb62b3627ba562b85/runtime/service/src/main/java/org/apache/polaris/service/catalog/iceberg/IcebergCatalog.java#L571
> >
>

Re: [DISCUSS] URL path decoding issues in Polaris

Reply via email to