Re: [DISCUSS] URL path decoding issues in Polaris

Robert Stupp Thu, 16 Apr 2026 03:57:48 -0700

Hi,

> spark-sql ()> create namespace `n/s`;
> However, the S3 location in this case gets a proper directory breakdown:
> ... and table metadata has: "location":"s3://pol/n/s/t1"
> ... but that is probably a different issue.


Yea, it's different from the URL en/decoding topic. Do you think it's worth
having a separate discussion about guardrails for namespace elements and
table/view names? For example, disallowing '/', disallowing empty/blank
namespace elements and table/view names, disallowing leading/trailing
whitespaces? Sure, some of these checks already happen, but not at every
level/layer (defense-in-depth).

> when Iceberg itself will introduce configurable separators, we MAY ask
ourselves if Polaris should allow them to beconfigurable or not. [...]
separator is just a REST layer thing

True, the separator is a primarily a REST-layer namespace en/decoding
thing. What worries me slightly is that (existing) namespace elements with
the configured separator character could become inaccessible. However,
"configurable separator" is IMO a different discussion.

Best,
Robert


On Wed, Apr 15, 2026 at 8:20 PM Dmitri Bourlatchkov <[email protected]>
wrote:

> Hi All,
>
> My understanding of the need to make namespace separators configurable is
> that there exist a rather narrow set of deployment cases where the ASCII
> "0x1F" (unit separator) character is not permitted in URL paths by some
> infrastructure components.
>
> It might be worth allowing users to define a different separator, but since
> no one has brought this up yet, I assume it is not a priority.
>
> In any case, using a different separator is completely a REST API
> concern and should not affect how Polaris stores data internally.
>
> Cheers,
> Dmitri.
>
> On Wed, Apr 15, 2026 at 2:03 PM Alexandre Dutra <[email protected]> wrote:
>
> > Hi all,
> >
> > > I wonder how namespace elements and table/view names with a slash ('/')
> > character in the middle behave. Or other characters like '&' or '?' or
> '#'.
> >
> > For the REST layer, these will be percent-encoded, and with my PR to
> > fix a double-decoding issue, these characters "survive" the REST layer
> > just fine.
> >
> > The issue now is in some layers beneath: as I pointed out and as
> > Dmitri demonstrated, we are unfortunately concatenating identifiers
> > together to create storage locations, without proper escaping. This
> > currently results in corrupted storage locations.
> >
> > I'm trying first to fix the REST layer first, then I'll move to the
> > storage layer.
> >
> > > What's your take on leveraging jakarta.ws.rs.ext.ParamConverterProvider
> > / jakarta.ws.rs.ext.ParamConverter for the path parameters and have
> > centralized helpers that deal with "proper" URL encoding/decoding?
> >
> > For now I don't see a valid usage in Polaris for that, since Jersey
> > handles decoding path parameters already.
> >
> > > I also agree that the "configurable namespace separator" must never
> > change. Is my assumption correct, that it must always be the same
> character
> > as it is today?
> >
> > In Polaris, we are using the namespace separator in two different use
> > cases:
> >
> > 1) For path parameters in the REST layer
> > 2) For storing namespaces in Polaris entities
> >
> > What is clear is that in the second use case, the namespace must NEVER
> > change. I just opened a PR for that:
> > https://github.com/apache/polaris/pull/4214
> >
> > Regarding the first use case, once we solve all our encoding/decoding
> > issues, and when Iceberg itself will introduce configurable
> > separators, we MAY ask ourselves if Polaris should allow them to be
> > configurable or not. I don't have strong opinions, but if the
> > separator is just a REST layer thing, it should be possible to change
> > it without breaking the storage layer or the metastore.
> >
> > Thanks,
> > Alex
> >
> > On Wed, Apr 15, 2026 at 7:47 PM Dmitri Bourlatchkov <[email protected]>
> > wrote:
> > >
> > > Hi All,
> > >
> > > Slashes in namespace seem to work fine (Spark 3.5 + Iceberg 1.10.0):
> > >
> > > spark-sql ()> create namespace `n/s`;
> > > Time taken: 0.335 seconds
> > > spark-sql ()> show namespaces;
> > > `n/s`
> > > Time taken: 0.232 seconds, Fetched 1 row(s)
> > > spark-sql ()> use `n/s`;
> > > Time taken: 0.028 seconds
> > > spark-sql (`n/s`)> create table t1 (n string);
> > > Time taken: 0.702 seconds
> > >
> > > The URLs appear to be encoded properly, e.g. (from Polaris log):
> > >
> > > 2026-04-15 13:41:17,594 INFO  [io.qua.htt.access-log]
> > > [dee1505c-ec1d-4f90-a9de-154eac66a40c_0000000000000000013,POLARIS]
> [,,,]
> > > (executor-thread-1) 127.0.0.1 - root [15/Apr/2026:13:41:17 -0400] "GET
> > > /api/catalog/v1/polaris/namespaces/n%2Fs/tables?pageToken= HTTP/1.1"
> 200
> > 74
> > >
> > > I did not test trickier chars, but adding CI coverage for them would be
> > > good.
> > >
> > > However, the S3 location in this case gets a proper directory
> breakdown:
> > >
> > > $ mc ls rustfs/pol/n/s
> > > [2026-04-15 13:44:37 EDT]     0B t1/
> > >
> > > ... and table metadata has: "location":"s3://pol/n/s/t1"
> > >
> > > ... but that is probably a different issue.
> > >
> > > Cheers,
> > > Dmitri.
> > >
> > >
> > >
> > > On Wed, Apr 15, 2026 at 10:35 AM Robert Stupp <[email protected]> wrote:
> > >
> > > > Thanks Alex for the thorough investigation!
> > > >
> > > > URL en/decoding is really not that easy.
> > > > I wonder how namespace elements and table/view names with a slash
> ('/')
> > > > character in the middle behave. Or other characters like '&' or '?'
> or
> > '#'.
> > > >
> > > > Overall, I agree with your idea to implement correct URL
> > encoding/decoding
> > > > in the Polaris code base to protect Polaris from upstream behavior
> > changes
> > > > that can seriously break or even corrupt things.
> > > >
> > > > What's your take on leveraging
> jakarta.ws.rs.ext.ParamConverterProvider
> > > > / jakarta.ws.rs.ext.ParamConverter for the path parameters and have
> > > > centralized helpers that deal with "proper" URL encoding/decoding?
> > > >
> > > > I also agree that the "configurable namespace separator" must never
> > change.
> > > > Is my assumption correct, that it must always be the same character
> as
> > it
> > > > is today?
> > > >
> > > > Best,
> > > > Robert
> > > >
> > > >
> > > > On Wed, Apr 15, 2026 at 3:48 PM Alexandre Dutra <[email protected]>
> > wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > FYI I created a first PR to address the double-decoding issue:
> > > > >
> > > > > https://github.com/apache/polaris/pull/4210
> > > > >
> > > > > Thanks,
> > > > > Alex
> > > > >
> > > > > On Tue, Apr 14, 2026 at 9:56 PM Alexandre Dutra <[email protected]
> >
> > > > wrote:
> > > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > I would also point out that Polaris uses RESTUtil.encodeNamespace
> > and
> > > > > > RESTUtil.decodeNamespace for encoding and decoding the parent
> > > > > > namespace within a NamespaceEntity [1].
> > > > > >
> > > > > > These methods also exhibit the faulty space encoding behavior.
> > > > > > Therefore, we must exercise **extreme caution** regarding any
> > upcoming
> > > > > > Iceberg project fixes for space-encoding issues. If these methods
> > are
> > > > > > modified, it is imperative that we retain the legacy versions
> > > > > > specifically for encoding and decoding NamespaceEntity
> properties –
> > > > > > otherwise we could end up with a corrupted database.
> > > > > >
> > > > > > The same goes for the future namespace separator coming with
> > Iceberg
> > > > > > 1.11: for the sake of encoding and decoding NamespaceEntity
> > > > > > properties, the separator must never change.
> > > > > >
> > > > > > I would actually be in favor of proactively internalizing the
> > > > > > encoding/decoding algorithm used in NamespaceEntity. What do you
> > > > > > think?
> > > > > >
> > > > > > Thanks,
> > > > > > Alex
> > > > > >
> > > > > > [1]:
> > > > >
> > > >
> >
> https://github.com/apache/polaris/blob/8ad8f74f62258ab6238190271603e4d4c8a75998/polaris-core/src/main/java/org/apache/polaris/core/entity/NamespaceEntity.java#L92
> > > > > >
> > > > > >
> > > > > > On Tue, Apr 14, 2026 at 7:43 PM Alexandre Dutra <
> [email protected]
> > >
> > > > > wrote:
> > > > > > >
> > > > > > > Hi all,
> > > > > > >
> > > > > > > A discussion on the Iceberg ML [1] recently highlighted that
> URL
> > path
> > > > > > > segments are not being decoded correctly according to RFC 3986,
> > > > > > > specifically regarding space encoding.
> > > > > > >
> > > > > > > I investigated the situation in Polaris, and found many
> problems:
> > > > > > >
> > > > > > > TLDR
> > > > > > >
> > > > > > > - Table names with the + sign can be created but cannot be
> > retrieved
> > > > > > > - Namespace names with the + sign are OK (can be created and
> > > > retrieved)
> > > > > > > - Table names with spaces cannot be created
> > > > > > > - Namespace names with spaces cannot be created
> > > > > > >
> > > > > > > DISCUSSION
> > > > > > >
> > > > > > > Table names such as "foo+bar" can be created (via POST, where
> the
> > > > name
> > > > > > > is in the request body). But they cannot be retrieved: when
> > reading
> > > > > > > tables, the name is part of the URL path. Polaris incorrectly
> > > > performs
> > > > > > > a second decoding step using RESTUtil.decodeString(table), even
> > > > though
> > > > > > > the REST framework has already decoded it. Consequently, a
> client
> > > > > > > sends "foo%2Bbar" which is first decoded to "foo+bar" by the
> > > > framework
> > > > > > > (correct) and then re-decoded by Polaris to "foo bar"
> > (incorrect),
> > > > > > > resulting in a "not found" error.
> > > > > > >
> > > > > > > Table and namespace names like "foo bar" simply cannot be
> > created at
> > > > > > > all. This is because in
> > IcebergCatalog.defaultWarehouseLocation() and
> > > > > > > other similar places, we create locations merely by joining
> > > > > > > identifiers together, without any form of URL encoding: see [2]
> > [3].
> > > > > > >
> > > > > > > And even if tables like "foo bar" could be created, they
> > couldn't be
> > > > > > > retrieved by Java clients. This occurs because current Java
> > clients
> > > > > > > incorrectly encode that name as "foo+bar", which the REST
> > framework
> > > > > > > does not modify. Consequently, Polaris would look for a table
> > named
> > > > > > > "foo+bar" instead and throw a "not found" error. (Other clients
> > would
> > > > > > > send "foo%20bar" which would be correctly decoded by the
> > framework as
> > > > > > > "foo bar", and thus it would succeed.)
> > > > > > >
> > > > > > > PROPOSAL
> > > > > > >
> > > > > > > To resolve the issue with the + sign in table names, we simply
> > need
> > > > to
> > > > > > > eliminate the redundant decoding step. I can open a PR for that
> > > > > > > shortly.
> > > > > > >
> > > > > > > To resolve the issue with spaces in table and namespace names,
> we
> > > > > > > could fix all the methods that incorrectly join together
> > identifiers
> > > > > > > without proper URL encoding.
> > > > > > >
> > > > > > > Finally, addressing the Java clients encoding problem is
> > complex, but
> > > > > > > we could consider implementing a workaround as follows:
> > > > > > >
> > > > > > > 1) If the client is Java and lacks the upcoming Iceberg fix for
> > space
> > > > > > > encoding, manually replace "+" with a space to correct the
> > client's
> > > > > > > faulty encoding.
> > > > > > >
> > > > > > > 2) For non-Java clients or those with the fix, no workaround
> > would be
> > > > > required.
> > > > > > >
> > > > > > > What are your thoughts on this?
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Alex
> > > > > > >
> > > > > > > [1]:
> > > > https://lists.apache.org/thread/c498svln0x18vvm42998b9nm9j6ck5yh
> > > > > > > [2]:
> > > > >
> > > >
> >
> https://github.com/apache/polaris/blob/e94fdff63852dc41635c9e7eb62b3627ba562b85/runtime/service/src/main/java/org/apache/polaris/service/catalog/iceberg/IcebergCatalog.java#L379
> > > > > > > [3]:
> > > > >
> > > >
> >
> https://github.com/apache/polaris/blob/e94fdff63852dc41635c9e7eb62b3627ba562b85/runtime/service/src/main/java/org/apache/polaris/service/catalog/iceberg/IcebergCatalog.java#L571
> > > > >
> > > >
> >
>

Re: [DISCUSS] URL path decoding issues in Polaris

Reply via email to