Hi again,

I have been reviewing the documentation for various storage providers
[1] [2] [3] to identify which characters they restrict or advise
against using.

The slash remains the most prominent issue because obviously, it's
accepted by all storage providers, but it has a special meaning for
Polaris-created locations.

That said, other characters may cause trouble as well. I wonder if we
shouldn't add them to the list of forbidden chars:

  - Control characters
  - Backslash `\`
  - Path segments equal to `.` or `..`
  - Commonly discouraged symbols: * ? " < > | #

Given that most storage providers already reject or discourage these,
formalizing their exclusion seems like a safe step. Prohibiting these
characters explicitly prevents issues with invalid locations that
could hinder client access, while simultaneously addressing potential
security vulnerabilities.

What do you all think?

Thanks,
Alex

[1]: https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-keys.html
[2]: https://docs.cloud.google.com/storage/docs/objects#naming
[3]: 
https://learn.microsoft.com/en-us/rest/api/storageservices/naming-and-referencing-containers--blobs--and-metadata

On Sun, Apr 26, 2026 at 5:15 PM Alexandre Dutra <[email protected]> wrote:
>
> Hi Yufei,
>
> >  the name is persisted verbatim in Polaris's catalog entity and baked as a 
> > directory boundary in the S3 location (s3://bucket123/db1/my/table1/…)
>
> While your research suggests this is a positive outcome, in fact this
> is *exactly* why I am concerned about using slashes. It introduces a
> prefix hierarchy - /db1/my/table1/ in your example - that doesn't
> exist conceptually.
>
> I'm also finding the conclusion of your research a bit unclear.
> Although it mentions the slash is "worth considering," it then
> provides three arguments against it before ultimately suggesting it's
> "not worth fighting." And among the 3 action items your research
> recommends, the first two are already implemented in the PR.
>
> About the feature flag idea: in my opinion, a feature flag is only
> viable if we also strengthen the URL construction logic; otherwise, I
> believe slashes should be prohibited unconditionally.
>
> Thanks,
> Alex
>
>
> On Fri, Apr 24, 2026 at 8:43 PM Yufei Gu <[email protected]> wrote:
> >
> > Thanks for the PR, Alex! I researched whether we should block slack in the
> > table name.
> >
> > Here is what I tested. Created *db1.my/table1 <http://db1.my/table1>* in a
> > Polaris quickstart catalog (RustFS-backed, in-memory metastore) and
> > exercised it against three client surfaces. All three surfaces work well:
> >   1. Iceberg REST API via curl. Create, list, and load all worked. The
> > slash must be percent-encoded as %2F in the path (e.g.
> > .../tables/my%2Ftable1); the name is persisted verbatim in
> > Polaris's catalog entity and baked as a directory boundary in the S3
> > location (s3://bucket123/db1/my/table1/…).
> >   2. PyIceberg (RestCatalog). list_namespaces, list_tables, load_table, and
> > scan().to_arrow() round-tripped the slash correctly end-to-end, including
> > fetching metadata JSON from storage with vended credentials.
> >   3. Spark SQL. The name is addressable via single-part backticks:
> > polaris.db1.`my/table1`. Other engines need their own quoting (Trino:
> > double quotes, etc.).
> >
> > Why the slash is still worth considering:
> >
> >    - URI-level fragility. %2F is a reserved character; intermediaries
> >    routinely reject it (Apache default `AllowEncodedSlashes Off` results in 
> > a
> >    404, ALB results in a 400) or silently normalize it to / (some nginx
> >    configs, API Gateway REST, CloudFront), which would dispatch the request 
> > to
> >    a different namespace/table entirely. These failures surface only once a
> >    proxy/WAF/CDN is in the call path.
> >    - Storage-layout collision. Polaris builds default locations as
> >    <warehouse>/<namespace>/<name>. A table named my/table1 shares a prefix
> >    with a hypothetical future namespace db1.my, which could let vended
> >    credentials for one leak into the blast radius of another.
> >    - Engine quoting drift and bad UX. Every downstream engine has its own
> >    identifier-quoting rules. Slashes survive in Spark with backticks and in
> >    Trino with double quotes, but tools, dashboards, and DDL generators
> >    frequently drop or mangle them. Users has to think about which quote to
> >    use.
> >
> > *My recommendation: Not worth fighting. *The features work today in
> > isolated testing, but keeping them working requires every future hop, like
> > proxy, WAF, CDN, ingress, engine, and SDK to handle URLs exactly
> > right, forever. The upside is purely cosmetic (the slash in the name). I
> > suggest putting the restriction behind a feature flag, defaulted to reject.
> > Here are action items:
> >
> >    - Validate table and namespace names server-side at create time which
> >    the PR does already.
> >    - Reject with a clear 400 and an error message pointing to the flag.
> >    - Flag can be flipped on per realm for teams that genuinely need exotic
> >    names, with a documented warning about proxy-chain testing.
> >
> > This gets us the robustness benefits immediately, keeps the door open for
> > backward compatibility and niche use cases, and avoids a long tail of "it
> > works on my laptop, fails in prod" tickets. WDYT?
> >
> > Yufei
> >
> >
> > On Thu, Apr 23, 2026 at 6:07 AM Alexandre Dutra <[email protected]> wrote:
> >
> > > Hi Yufei,
> > >
> > > Yes, I think we can view storage location sanitizing as a parallel effort.
> > >
> > > With that, here is a simple PR that aims at forbidding slashes and a
> > > few other pathological cases for Iceberg and Generic Tables entities
> > > at creation time:
> > >
> > > https://github.com/apache/polaris/pull/4282
> > >
> > > Thanks,
> > > Alex
> > >
> > > On Thu, Apr 23, 2026 at 1:14 AM Yufei Gu <[email protected]> wrote:
> > > >
> > > > Hi Alex, it's a good point that the storage location build is also
> > > > affected, but it feels less controversial and somewhat separate from the
> > > > main question here.
> > > >
> > > > The immediate discussion, at least from my perspective, is about entity
> > > > naming guardrails and externally visible behavior, for example 
> > > > preventing
> > > > names that are ambiguous or likely to break REST access and cross client
> > > > behavior.
> > > >
> > > > Storage location construction is important too, but that feels more like
> > > an
> > > > internal implementation hardening task than a spec or user-facing
> > > semantics
> > > > question. I would view it as a parallel track rather than something that
> > > > should block agreement on the narrower entity name issue. I'm also fine
> > > if
> > > > someone wants to tackle the location building issue first. That could
> > > > provide useful context for resolving the user-facing naming questions.
> > > >
> > > > Yufei
> > > >
> > > >
> > > > On Wed, Apr 22, 2026 at 8:28 AM Alexandre Dutra <[email protected]>
> > > wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > Disallowing the most problematic cases seems the right way to go. I
> > > > > can provide a PR to quickly implement that.
> > > > >
> > > > > However, we must keep in mind that disallowing a few chars will not
> > > > > solve all our problems. IMHO we need to consistently replace all
> > > > > string concatenations that we use today for creating storage locations
> > > > > with a proper location builder that will take care of proper path
> > > > > escaping and sanitization. That part of the job is way more complex,
> > > > > due to the blast radius.
> > > > >
> > > > > Thanks,
> > > > > Alex
> > > > >
> > > > >
> > > > > On Wed, Apr 22, 2026 at 2:07 AM Yufei Gu <[email protected]> wrote:
> > > > > >
> > > > > > Sorry for jumping into this thread a bit late.
> > > > > >
> > > > > > I’m supportive of introducing some guardrails for namespace and
> > > table or
> > > > > > view names. Specifically, I think we should disallow a few
> > > problematic
> > > > > > cases to avoid ambiguity and downstream issues:
> > > > > >
> > > > > >    - Disallow the slash character “/”
> > > > > >    - Disallow empty strings
> > > > > >    - Disallow leading or trailing whitespace
> > > > > >
> > > > > > These constraints seem reasonable given the interactions across 
> > > > > > REST,
> > > > > > storage paths, and different client behaviors. Adding clear
> > > guardrails
> > > > > > early can prevent subtle bugs and inconsistencies later on. Curious
> > > to
> > > > > hear
> > > > > > if others see any concerns or edge cases with this approach.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Yufei
> > > > > >
> > > > > >
> > > > > > On Thu, Apr 16, 2026 at 9:11 AM Alexandre Dutra <[email protected]>
> > > > > wrote:
> > > > > >
> > > > > > > > Do you think it's worth having a separate discussion about
> > > > > guardrails for
> > > > > > > namespace elements and table/view names? [...]
> > > > > > >
> > > > > > > Completely agree here. I think the slash character in particular
> > > > > > > should definitely be banned.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Alex
> > > > > > >
> > > > > > > On Thu, Apr 16, 2026 at 6:03 PM Dmitri Bourlatchkov <
> > > [email protected]>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Do you think it's worth having a separate discussion about
> > > > > guardrails
> > > > > > > for
> > > > > > > > namespace elements and table/view names? [...]
> > > > > > > >
> > > > > > > > Definitely!
> > > > > > > >
> > > > > > > > Cheers,
> > > > > > > > Dmitri.
> > > > > > > >
> > > > > > > > On Thu, Apr 16, 2026 at 6:57 AM Robert Stupp <[email protected]>
> > > wrote:
> > > > > > > >
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > > spark-sql ()> create namespace `n/s`;
> > > > > > > > > > However, the S3 location in this case gets a proper 
> > > > > > > > > > directory
> > > > > > > breakdown:
> > > > > > > > > > ... and table metadata has: "location":"s3://pol/n/s/t1"
> > > > > > > > > > ... but that is probably a different issue.
> > > > > > > > >
> > > > > > > > > Yea, it's different from the URL en/decoding topic. Do you
> > > think
> > > > > it's
> > > > > > > worth
> > > > > > > > > having a separate discussion about guardrails for namespace
> > > > > elements
> > > > > > > and
> > > > > > > > > table/view names? For example, disallowing '/', disallowing
> > > > > empty/blank
> > > > > > > > > namespace elements and table/view names, disallowing
> > > > > leading/trailing
> > > > > > > > > whitespaces? Sure, some of these checks already happen, but
> > > not at
> > > > > > > every
> > > > > > > > > level/layer (defense-in-depth).
> > > > > > > > >
> > > > > > > > > > when Iceberg itself will introduce configurable separators,
> > > we
> > > > > MAY
> > > > > > > ask
> > > > > > > > > ourselves if Polaris should allow them to beconfigurable or
> > > not.
> > > > > [...]
> > > > > > > > > separator is just a REST layer thing
> > > > > > > > >
> > > > > > > > > True, the separator is a primarily a REST-layer namespace
> > > > > en/decoding
> > > > > > > > > thing. What worries me slightly is that (existing) namespace
> > > > > elements
> > > > > > > with
> > > > > > > > > the configured separator character could become inaccessible.
> > > > > However,
> > > > > > > > > "configurable separator" is IMO a different discussion.
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > > Robert
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Wed, Apr 15, 2026 at 8:20 PM Dmitri Bourlatchkov <
> > > > > [email protected]>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi All,
> > > > > > > > > >
> > > > > > > > > > My understanding of the need to make namespace separators
> > > > > > > configurable is
> > > > > > > > > > that there exist a rather narrow set of deployment cases
> > > where
> > > > > the
> > > > > > > ASCII
> > > > > > > > > > "0x1F" (unit separator) character is not permitted in URL
> > > paths
> > > > > by
> > > > > > > some
> > > > > > > > > > infrastructure components.
> > > > > > > > > >
> > > > > > > > > > It might be worth allowing users to define a different
> > > > > separator, but
> > > > > > > > > since
> > > > > > > > > > no one has brought this up yet, I assume it is not a
> > > priority.
> > > > > > > > > >
> > > > > > > > > > In any case, using a different separator is completely a
> > > REST API
> > > > > > > > > > concern and should not affect how Polaris stores data
> > > internally.
> > > > > > > > > >
> > > > > > > > > > Cheers,
> > > > > > > > > > Dmitri.
> > > > > > > > > >
> > > > > > > > > > On Wed, Apr 15, 2026 at 2:03 PM Alexandre Dutra <
> > > > > [email protected]>
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi all,
> > > > > > > > > > >
> > > > > > > > > > > > I wonder how namespace elements and table/view names
> > > with a
> > > > > slash
> > > > > > > > > ('/')
> > > > > > > > > > > character in the middle behave. Or other characters like
> > > '&' or
> > > > > > > '?' or
> > > > > > > > > > '#'.
> > > > > > > > > > >
> > > > > > > > > > > For the REST layer, these will be percent-encoded, and
> > > with my
> > > > > PR
> > > > > > > to
> > > > > > > > > > > fix a double-decoding issue, these characters "survive" 
> > > > > > > > > > > the
> > > > > REST
> > > > > > > layer
> > > > > > > > > > > just fine.
> > > > > > > > > > >
> > > > > > > > > > > The issue now is in some layers beneath: as I pointed out
> > > and
> > > > > as
> > > > > > > > > > > Dmitri demonstrated, we are unfortunately concatenating
> > > > > identifiers
> > > > > > > > > > > together to create storage locations, without proper
> > > escaping.
> > > > > This
> > > > > > > > > > > currently results in corrupted storage locations.
> > > > > > > > > > >
> > > > > > > > > > > I'm trying first to fix the REST layer first, then I'll
> > > move
> > > > > to the
> > > > > > > > > > > storage layer.
> > > > > > > > > > >
> > > > > > > > > > > > What's your take on leveraging
> > > > > > > > > jakarta.ws.rs.ext.ParamConverterProvider
> > > > > > > > > > > / jakarta.ws.rs.ext.ParamConverter for the path parameters
> > > and
> > > > > have
> > > > > > > > > > > centralized helpers that deal with "proper" URL
> > > > > encoding/decoding?
> > > > > > > > > > >
> > > > > > > > > > > For now I don't see a valid usage in Polaris for that,
> > > since
> > > > > Jersey
> > > > > > > > > > > handles decoding path parameters already.
> > > > > > > > > > >
> > > > > > > > > > > > I also agree that the "configurable namespace separator"
> > > must
> > > > > > > never
> > > > > > > > > > > change. Is my assumption correct, that it must always be
> > > the
> > > > > same
> > > > > > > > > > character
> > > > > > > > > > > as it is today?
> > > > > > > > > > >
> > > > > > > > > > > In Polaris, we are using the namespace separator in two
> > > > > different
> > > > > > > use
> > > > > > > > > > > cases:
> > > > > > > > > > >
> > > > > > > > > > > 1) For path parameters in the REST layer
> > > > > > > > > > > 2) For storing namespaces in Polaris entities
> > > > > > > > > > >
> > > > > > > > > > > What is clear is that in the second use case, the 
> > > > > > > > > > > namespace
> > > > > must
> > > > > > > NEVER
> > > > > > > > > > > change. I just opened a PR for that:
> > > > > > > > > > > https://github.com/apache/polaris/pull/4214
> > > > > > > > > > >
> > > > > > > > > > > Regarding the first use case, once we solve all our
> > > > > > > encoding/decoding
> > > > > > > > > > > issues, and when Iceberg itself will introduce 
> > > > > > > > > > > configurable
> > > > > > > > > > > separators, we MAY ask ourselves if Polaris should allow
> > > them
> > > > > to be
> > > > > > > > > > > configurable or not. I don't have strong opinions, but if
> > > the
> > > > > > > > > > > separator is just a REST layer thing, it should be
> > > possible to
> > > > > > > change
> > > > > > > > > > > it without breaking the storage layer or the metastore.
> > > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > > > Alex
> > > > > > > > > > >
> > > > > > > > > > > On Wed, Apr 15, 2026 at 7:47 PM Dmitri Bourlatchkov <
> > > > > > > [email protected]>
> > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > Hi All,
> > > > > > > > > > > >
> > > > > > > > > > > > Slashes in namespace seem to work fine (Spark 3.5 +
> > > Iceberg
> > > > > > > 1.10.0):
> > > > > > > > > > > >
> > > > > > > > > > > > spark-sql ()> create namespace `n/s`;
> > > > > > > > > > > > Time taken: 0.335 seconds
> > > > > > > > > > > > spark-sql ()> show namespaces;
> > > > > > > > > > > > `n/s`
> > > > > > > > > > > > Time taken: 0.232 seconds, Fetched 1 row(s)
> > > > > > > > > > > > spark-sql ()> use `n/s`;
> > > > > > > > > > > > Time taken: 0.028 seconds
> > > > > > > > > > > > spark-sql (`n/s`)> create table t1 (n string);
> > > > > > > > > > > > Time taken: 0.702 seconds
> > > > > > > > > > > >
> > > > > > > > > > > > The URLs appear to be encoded properly, e.g. (from
> > > Polaris
> > > > > log):
> > > > > > > > > > > >
> > > > > > > > > > > > 2026-04-15 13:41:17,594 INFO  [io.qua.htt.access-log]
> > > > > > > > > > > >
> > > > > > > [dee1505c-ec1d-4f90-a9de-154eac66a40c_0000000000000000013,POLARIS]
> > > > > > > > > > [,,,]
> > > > > > > > > > > > (executor-thread-1) 127.0.0.1 - root
> > > [15/Apr/2026:13:41:17
> > > > > -0400]
> > > > > > > > > "GET
> > > > > > > > > > > >
> > > /api/catalog/v1/polaris/namespaces/n%2Fs/tables?pageToken=
> > > > > > > HTTP/1.1"
> > > > > > > > > > 200
> > > > > > > > > > > 74
> > > > > > > > > > > >
> > > > > > > > > > > > I did not test trickier chars, but adding CI coverage 
> > > > > > > > > > > > for
> > > > > them
> > > > > > > would
> > > > > > > > > be
> > > > > > > > > > > > good.
> > > > > > > > > > > >
> > > > > > > > > > > > However, the S3 location in this case gets a proper
> > > directory
> > > > > > > > > > breakdown:
> > > > > > > > > > > >
> > > > > > > > > > > > $ mc ls rustfs/pol/n/s
> > > > > > > > > > > > [2026-04-15 13:44:37 EDT]     0B t1/
> > > > > > > > > > > >
> > > > > > > > > > > > ... and table metadata has: "location":"s3://pol/n/s/t1"
> > > > > > > > > > > >
> > > > > > > > > > > > ... but that is probably a different issue.
> > > > > > > > > > > >
> > > > > > > > > > > > Cheers,
> > > > > > > > > > > > Dmitri.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Wed, Apr 15, 2026 at 10:35 AM Robert Stupp <
> > > > > [email protected]>
> > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Thanks Alex for the thorough investigation!
> > > > > > > > > > > > >
> > > > > > > > > > > > > URL en/decoding is really not that easy.
> > > > > > > > > > > > > I wonder how namespace elements and table/view names
> > > with a
> > > > > > > slash
> > > > > > > > > > ('/')
> > > > > > > > > > > > > character in the middle behave. Or other characters
> > > like
> > > > > '&'
> > > > > > > or '?'
> > > > > > > > > > or
> > > > > > > > > > > '#'.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Overall, I agree with your idea to implement correct
> > > URL
> > > > > > > > > > > encoding/decoding
> > > > > > > > > > > > > in the Polaris code base to protect Polaris from
> > > upstream
> > > > > > > behavior
> > > > > > > > > > > changes
> > > > > > > > > > > > > that can seriously break or even corrupt things.
> > > > > > > > > > > > >
> > > > > > > > > > > > > What's your take on leveraging
> > > > > > > > > > jakarta.ws.rs.ext.ParamConverterProvider
> > > > > > > > > > > > > / jakarta.ws.rs.ext.ParamConverter for the path
> > > parameters
> > > > > and
> > > > > > > have
> > > > > > > > > > > > > centralized helpers that deal with "proper" URL
> > > > > > > encoding/decoding?
> > > > > > > > > > > > >
> > > > > > > > > > > > > I also agree that the "configurable namespace
> > > separator"
> > > > > must
> > > > > > > never
> > > > > > > > > > > change.
> > > > > > > > > > > > > Is my assumption correct, that it must always be the
> > > same
> > > > > > > character
> > > > > > > > > > as
> > > > > > > > > > > it
> > > > > > > > > > > > > is today?
> > > > > > > > > > > > >
> > > > > > > > > > > > > Best,
> > > > > > > > > > > > > Robert
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Wed, Apr 15, 2026 at 3:48 PM Alexandre Dutra <
> > > > > > > [email protected]
> > > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Hi all,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > FYI I created a first PR to address the
> > > double-decoding
> > > > > > > issue:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > https://github.com/apache/polaris/pull/4210
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > Alex
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Tue, Apr 14, 2026 at 9:56 PM Alexandre Dutra <
> > > > > > > > > [email protected]
> > > > > > > > > > >
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Hi all,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I would also point out that Polaris uses
> > > > > > > > > RESTUtil.encodeNamespace
> > > > > > > > > > > and
> > > > > > > > > > > > > > > RESTUtil.decodeNamespace for encoding and decoding
> > > the
> > > > > > > parent
> > > > > > > > > > > > > > > namespace within a NamespaceEntity [1].
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > These methods also exhibit the faulty space
> > > encoding
> > > > > > > behavior.
> > > > > > > > > > > > > > > Therefore, we must exercise **extreme caution**
> > > > > regarding
> > > > > > > any
> > > > > > > > > > > upcoming
> > > > > > > > > > > > > > > Iceberg project fixes for space-encoding issues. 
> > > > > > > > > > > > > > > If
> > > > > these
> > > > > > > > > methods
> > > > > > > > > > > are
> > > > > > > > > > > > > > > modified, it is imperative that we retain the
> > > legacy
> > > > > > > versions
> > > > > > > > > > > > > > > specifically for encoding and decoding
> > > NamespaceEntity
> > > > > > > > > > properties –
> > > > > > > > > > > > > > > otherwise we could end up with a corrupted
> > > database.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > The same goes for the future namespace separator
> > > coming
> > > > > > > with
> > > > > > > > > > > Iceberg
> > > > > > > > > > > > > > > 1.11: for the sake of encoding and decoding
> > > > > NamespaceEntity
> > > > > > > > > > > > > > > properties, the separator must never change.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I would actually be in favor of proactively
> > > > > internalizing
> > > > > > > the
> > > > > > > > > > > > > > > encoding/decoding algorithm used in
> > > NamespaceEntity.
> > > > > What
> > > > > > > do
> > > > > > > > > you
> > > > > > > > > > > > > > > think?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > Alex
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > [1]:
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > https://github.com/apache/polaris/blob/8ad8f74f62258ab6238190271603e4d4c8a75998/polaris-core/src/main/java/org/apache/polaris/core/entity/NamespaceEntity.java#L92
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Tue, Apr 14, 2026 at 7:43 PM Alexandre Dutra <
> > > > > > > > > > [email protected]
> > > > > > > > > > > >
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Hi all,
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > A discussion on the Iceberg ML [1] recently
> > > > > highlighted
> > > > > > > that
> > > > > > > > > > URL
> > > > > > > > > > > path
> > > > > > > > > > > > > > > > segments are not being decoded correctly
> > > according
> > > > > to RFC
> > > > > > > > > 3986,
> > > > > > > > > > > > > > > > specifically regarding space encoding.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I investigated the situation in Polaris, and
> > > found
> > > > > many
> > > > > > > > > > problems:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > TLDR
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > - Table names with the + sign can be created but
> > > > > cannot
> > > > > > > be
> > > > > > > > > > > retrieved
> > > > > > > > > > > > > > > > - Namespace names with the + sign are OK (can be
> > > > > created
> > > > > > > and
> > > > > > > > > > > > > retrieved)
> > > > > > > > > > > > > > > > - Table names with spaces cannot be created
> > > > > > > > > > > > > > > > - Namespace names with spaces cannot be created
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > DISCUSSION
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Table names such as "foo+bar" can be created 
> > > > > > > > > > > > > > > > (via
> > > > > POST,
> > > > > > > where
> > > > > > > > > > the
> > > > > > > > > > > > > name
> > > > > > > > > > > > > > > > is in the request body). But they cannot be
> > > > > retrieved:
> > > > > > > when
> > > > > > > > > > > reading
> > > > > > > > > > > > > > > > tables, the name is part of the URL path. 
> > > > > > > > > > > > > > > > Polaris
> > > > > > > incorrectly
> > > > > > > > > > > > > performs
> > > > > > > > > > > > > > > > a second decoding step using
> > > > > > > RESTUtil.decodeString(table),
> > > > > > > > > even
> > > > > > > > > > > > > though
> > > > > > > > > > > > > > > > the REST framework has already decoded it.
> > > > > Consequently,
> > > > > > > a
> > > > > > > > > > client
> > > > > > > > > > > > > > > > sends "foo%2Bbar" which is first decoded to
> > > > > "foo+bar" by
> > > > > > > the
> > > > > > > > > > > > > framework
> > > > > > > > > > > > > > > > (correct) and then re-decoded by Polaris to "foo
> > > bar"
> > > > > > > > > > > (incorrect),
> > > > > > > > > > > > > > > > resulting in a "not found" error.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Table and namespace names like "foo bar" simply
> > > > > cannot be
> > > > > > > > > > > created at
> > > > > > > > > > > > > > > > all. This is because in
> > > > > > > > > > > IcebergCatalog.defaultWarehouseLocation() and
> > > > > > > > > > > > > > > > other similar places, we create locations merely
> > > by
> > > > > > > joining
> > > > > > > > > > > > > > > > identifiers together, without any form of URL
> > > > > encoding:
> > > > > > > see
> > > > > > > > > [2]
> > > > > > > > > > > [3].
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > And even if tables like "foo bar" could be
> > > created,
> > > > > they
> > > > > > > > > > > couldn't be
> > > > > > > > > > > > > > > > retrieved by Java clients. This occurs because
> > > > > current
> > > > > > > Java
> > > > > > > > > > > clients
> > > > > > > > > > > > > > > > incorrectly encode that name as "foo+bar", which
> > > the
> > > > > REST
> > > > > > > > > > > framework
> > > > > > > > > > > > > > > > does not modify. Consequently, Polaris would 
> > > > > > > > > > > > > > > > look
> > > > > for a
> > > > > > > table
> > > > > > > > > > > named
> > > > > > > > > > > > > > > > "foo+bar" instead and throw a "not found" error.
> > > > > (Other
> > > > > > > > > clients
> > > > > > > > > > > would
> > > > > > > > > > > > > > > > send "foo%20bar" which would be correctly
> > > decoded by
> > > > > the
> > > > > > > > > > > framework as
> > > > > > > > > > > > > > > > "foo bar", and thus it would succeed.)
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > PROPOSAL
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > To resolve the issue with the + sign in table
> > > names,
> > > > > we
> > > > > > > > > simply
> > > > > > > > > > > need
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > eliminate the redundant decoding step. I can
> > > open a
> > > > > PR
> > > > > > > for
> > > > > > > > > that
> > > > > > > > > > > > > > > > shortly.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > To resolve the issue with spaces in table and
> > > > > namespace
> > > > > > > > > names,
> > > > > > > > > > we
> > > > > > > > > > > > > > > > could fix all the methods that incorrectly join
> > > > > together
> > > > > > > > > > > identifiers
> > > > > > > > > > > > > > > > without proper URL encoding.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Finally, addressing the Java clients encoding
> > > > > problem is
> > > > > > > > > > > complex, but
> > > > > > > > > > > > > > > > we could consider implementing a workaround as
> > > > > follows:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 1) If the client is Java and lacks the upcoming
> > > > > Iceberg
> > > > > > > fix
> > > > > > > > > for
> > > > > > > > > > > space
> > > > > > > > > > > > > > > > encoding, manually replace "+" with a space to
> > > > > correct
> > > > > > > the
> > > > > > > > > > > client's
> > > > > > > > > > > > > > > > faulty encoding.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 2) For non-Java clients or those with the fix, 
> > > > > > > > > > > > > > > > no
> > > > > > > workaround
> > > > > > > > > > > would be
> > > > > > > > > > > > > > required.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > What are your thoughts on this?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > Alex
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > [1]:
> > > > > > > > > > > > >
> > > > > > > https://lists.apache.org/thread/c498svln0x18vvm42998b9nm9j6ck5yh
> > > > > > > > > > > > > > > > [2]:
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > https://github.com/apache/polaris/blob/e94fdff63852dc41635c9e7eb62b3627ba562b85/runtime/service/src/main/java/org/apache/polaris/service/catalog/iceberg/IcebergCatalog.java#L379
> > > > > > > > > > > > > > > > [3]:
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > https://github.com/apache/polaris/blob/e94fdff63852dc41635c9e7eb62b3627ba562b85/runtime/service/src/main/java/org/apache/polaris/service/catalog/iceberg/IcebergCatalog.java#L571
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > >

Reply via email to