Hi again, I have been reviewing the documentation for various storage providers [1] [2] [3] to identify which characters they restrict or advise against using.
The slash remains the most prominent issue because obviously, it's accepted by all storage providers, but it has a special meaning for Polaris-created locations. That said, other characters may cause trouble as well. I wonder if we shouldn't add them to the list of forbidden chars: - Control characters - Backslash `\` - Path segments equal to `.` or `..` - Commonly discouraged symbols: * ? " < > | # Given that most storage providers already reject or discourage these, formalizing their exclusion seems like a safe step. Prohibiting these characters explicitly prevents issues with invalid locations that could hinder client access, while simultaneously addressing potential security vulnerabilities. What do you all think? Thanks, Alex [1]: https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-keys.html [2]: https://docs.cloud.google.com/storage/docs/objects#naming [3]: https://learn.microsoft.com/en-us/rest/api/storageservices/naming-and-referencing-containers--blobs--and-metadata On Sun, Apr 26, 2026 at 5:15 PM Alexandre Dutra <[email protected]> wrote: > > Hi Yufei, > > > the name is persisted verbatim in Polaris's catalog entity and baked as a > > directory boundary in the S3 location (s3://bucket123/db1/my/table1/…) > > While your research suggests this is a positive outcome, in fact this > is *exactly* why I am concerned about using slashes. It introduces a > prefix hierarchy - /db1/my/table1/ in your example - that doesn't > exist conceptually. > > I'm also finding the conclusion of your research a bit unclear. > Although it mentions the slash is "worth considering," it then > provides three arguments against it before ultimately suggesting it's > "not worth fighting." And among the 3 action items your research > recommends, the first two are already implemented in the PR. > > About the feature flag idea: in my opinion, a feature flag is only > viable if we also strengthen the URL construction logic; otherwise, I > believe slashes should be prohibited unconditionally. > > Thanks, > Alex > > > On Fri, Apr 24, 2026 at 8:43 PM Yufei Gu <[email protected]> wrote: > > > > Thanks for the PR, Alex! I researched whether we should block slack in the > > table name. > > > > Here is what I tested. Created *db1.my/table1 <http://db1.my/table1>* in a > > Polaris quickstart catalog (RustFS-backed, in-memory metastore) and > > exercised it against three client surfaces. All three surfaces work well: > > 1. Iceberg REST API via curl. Create, list, and load all worked. The > > slash must be percent-encoded as %2F in the path (e.g. > > .../tables/my%2Ftable1); the name is persisted verbatim in > > Polaris's catalog entity and baked as a directory boundary in the S3 > > location (s3://bucket123/db1/my/table1/…). > > 2. PyIceberg (RestCatalog). list_namespaces, list_tables, load_table, and > > scan().to_arrow() round-tripped the slash correctly end-to-end, including > > fetching metadata JSON from storage with vended credentials. > > 3. Spark SQL. The name is addressable via single-part backticks: > > polaris.db1.`my/table1`. Other engines need their own quoting (Trino: > > double quotes, etc.). > > > > Why the slash is still worth considering: > > > > - URI-level fragility. %2F is a reserved character; intermediaries > > routinely reject it (Apache default `AllowEncodedSlashes Off` results in > > a > > 404, ALB results in a 400) or silently normalize it to / (some nginx > > configs, API Gateway REST, CloudFront), which would dispatch the request > > to > > a different namespace/table entirely. These failures surface only once a > > proxy/WAF/CDN is in the call path. > > - Storage-layout collision. Polaris builds default locations as > > <warehouse>/<namespace>/<name>. A table named my/table1 shares a prefix > > with a hypothetical future namespace db1.my, which could let vended > > credentials for one leak into the blast radius of another. > > - Engine quoting drift and bad UX. Every downstream engine has its own > > identifier-quoting rules. Slashes survive in Spark with backticks and in > > Trino with double quotes, but tools, dashboards, and DDL generators > > frequently drop or mangle them. Users has to think about which quote to > > use. > > > > *My recommendation: Not worth fighting. *The features work today in > > isolated testing, but keeping them working requires every future hop, like > > proxy, WAF, CDN, ingress, engine, and SDK to handle URLs exactly > > right, forever. The upside is purely cosmetic (the slash in the name). I > > suggest putting the restriction behind a feature flag, defaulted to reject. > > Here are action items: > > > > - Validate table and namespace names server-side at create time which > > the PR does already. > > - Reject with a clear 400 and an error message pointing to the flag. > > - Flag can be flipped on per realm for teams that genuinely need exotic > > names, with a documented warning about proxy-chain testing. > > > > This gets us the robustness benefits immediately, keeps the door open for > > backward compatibility and niche use cases, and avoids a long tail of "it > > works on my laptop, fails in prod" tickets. WDYT? > > > > Yufei > > > > > > On Thu, Apr 23, 2026 at 6:07 AM Alexandre Dutra <[email protected]> wrote: > > > > > Hi Yufei, > > > > > > Yes, I think we can view storage location sanitizing as a parallel effort. > > > > > > With that, here is a simple PR that aims at forbidding slashes and a > > > few other pathological cases for Iceberg and Generic Tables entities > > > at creation time: > > > > > > https://github.com/apache/polaris/pull/4282 > > > > > > Thanks, > > > Alex > > > > > > On Thu, Apr 23, 2026 at 1:14 AM Yufei Gu <[email protected]> wrote: > > > > > > > > Hi Alex, it's a good point that the storage location build is also > > > > affected, but it feels less controversial and somewhat separate from the > > > > main question here. > > > > > > > > The immediate discussion, at least from my perspective, is about entity > > > > naming guardrails and externally visible behavior, for example > > > > preventing > > > > names that are ambiguous or likely to break REST access and cross client > > > > behavior. > > > > > > > > Storage location construction is important too, but that feels more like > > > an > > > > internal implementation hardening task than a spec or user-facing > > > semantics > > > > question. I would view it as a parallel track rather than something that > > > > should block agreement on the narrower entity name issue. I'm also fine > > > if > > > > someone wants to tackle the location building issue first. That could > > > > provide useful context for resolving the user-facing naming questions. > > > > > > > > Yufei > > > > > > > > > > > > On Wed, Apr 22, 2026 at 8:28 AM Alexandre Dutra <[email protected]> > > > wrote: > > > > > > > > > Hi all, > > > > > > > > > > Disallowing the most problematic cases seems the right way to go. I > > > > > can provide a PR to quickly implement that. > > > > > > > > > > However, we must keep in mind that disallowing a few chars will not > > > > > solve all our problems. IMHO we need to consistently replace all > > > > > string concatenations that we use today for creating storage locations > > > > > with a proper location builder that will take care of proper path > > > > > escaping and sanitization. That part of the job is way more complex, > > > > > due to the blast radius. > > > > > > > > > > Thanks, > > > > > Alex > > > > > > > > > > > > > > > On Wed, Apr 22, 2026 at 2:07 AM Yufei Gu <[email protected]> wrote: > > > > > > > > > > > > Sorry for jumping into this thread a bit late. > > > > > > > > > > > > I’m supportive of introducing some guardrails for namespace and > > > table or > > > > > > view names. Specifically, I think we should disallow a few > > > problematic > > > > > > cases to avoid ambiguity and downstream issues: > > > > > > > > > > > > - Disallow the slash character “/” > > > > > > - Disallow empty strings > > > > > > - Disallow leading or trailing whitespace > > > > > > > > > > > > These constraints seem reasonable given the interactions across > > > > > > REST, > > > > > > storage paths, and different client behaviors. Adding clear > > > guardrails > > > > > > early can prevent subtle bugs and inconsistencies later on. Curious > > > to > > > > > hear > > > > > > if others see any concerns or edge cases with this approach. > > > > > > > > > > > > Thanks, > > > > > > > > > > > > Yufei > > > > > > > > > > > > > > > > > > On Thu, Apr 16, 2026 at 9:11 AM Alexandre Dutra <[email protected]> > > > > > wrote: > > > > > > > > > > > > > > Do you think it's worth having a separate discussion about > > > > > guardrails for > > > > > > > namespace elements and table/view names? [...] > > > > > > > > > > > > > > Completely agree here. I think the slash character in particular > > > > > > > should definitely be banned. > > > > > > > > > > > > > > Thanks, > > > > > > > Alex > > > > > > > > > > > > > > On Thu, Apr 16, 2026 at 6:03 PM Dmitri Bourlatchkov < > > > [email protected]> > > > > > > > wrote: > > > > > > > > > > > > > > > > > Do you think it's worth having a separate discussion about > > > > > guardrails > > > > > > > for > > > > > > > > namespace elements and table/view names? [...] > > > > > > > > > > > > > > > > Definitely! > > > > > > > > > > > > > > > > Cheers, > > > > > > > > Dmitri. > > > > > > > > > > > > > > > > On Thu, Apr 16, 2026 at 6:57 AM Robert Stupp <[email protected]> > > > wrote: > > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > > spark-sql ()> create namespace `n/s`; > > > > > > > > > > However, the S3 location in this case gets a proper > > > > > > > > > > directory > > > > > > > breakdown: > > > > > > > > > > ... and table metadata has: "location":"s3://pol/n/s/t1" > > > > > > > > > > ... but that is probably a different issue. > > > > > > > > > > > > > > > > > > Yea, it's different from the URL en/decoding topic. Do you > > > think > > > > > it's > > > > > > > worth > > > > > > > > > having a separate discussion about guardrails for namespace > > > > > elements > > > > > > > and > > > > > > > > > table/view names? For example, disallowing '/', disallowing > > > > > empty/blank > > > > > > > > > namespace elements and table/view names, disallowing > > > > > leading/trailing > > > > > > > > > whitespaces? Sure, some of these checks already happen, but > > > not at > > > > > > > every > > > > > > > > > level/layer (defense-in-depth). > > > > > > > > > > > > > > > > > > > when Iceberg itself will introduce configurable separators, > > > we > > > > > MAY > > > > > > > ask > > > > > > > > > ourselves if Polaris should allow them to beconfigurable or > > > not. > > > > > [...] > > > > > > > > > separator is just a REST layer thing > > > > > > > > > > > > > > > > > > True, the separator is a primarily a REST-layer namespace > > > > > en/decoding > > > > > > > > > thing. What worries me slightly is that (existing) namespace > > > > > elements > > > > > > > with > > > > > > > > > the configured separator character could become inaccessible. > > > > > However, > > > > > > > > > "configurable separator" is IMO a different discussion. > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > Robert > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Apr 15, 2026 at 8:20 PM Dmitri Bourlatchkov < > > > > > [email protected]> > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > Hi All, > > > > > > > > > > > > > > > > > > > > My understanding of the need to make namespace separators > > > > > > > configurable is > > > > > > > > > > that there exist a rather narrow set of deployment cases > > > where > > > > > the > > > > > > > ASCII > > > > > > > > > > "0x1F" (unit separator) character is not permitted in URL > > > paths > > > > > by > > > > > > > some > > > > > > > > > > infrastructure components. > > > > > > > > > > > > > > > > > > > > It might be worth allowing users to define a different > > > > > separator, but > > > > > > > > > since > > > > > > > > > > no one has brought this up yet, I assume it is not a > > > priority. > > > > > > > > > > > > > > > > > > > > In any case, using a different separator is completely a > > > REST API > > > > > > > > > > concern and should not affect how Polaris stores data > > > internally. > > > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > > Dmitri. > > > > > > > > > > > > > > > > > > > > On Wed, Apr 15, 2026 at 2:03 PM Alexandre Dutra < > > > > > [email protected]> > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > Hi all, > > > > > > > > > > > > > > > > > > > > > > > I wonder how namespace elements and table/view names > > > with a > > > > > slash > > > > > > > > > ('/') > > > > > > > > > > > character in the middle behave. Or other characters like > > > '&' or > > > > > > > '?' or > > > > > > > > > > '#'. > > > > > > > > > > > > > > > > > > > > > > For the REST layer, these will be percent-encoded, and > > > with my > > > > > PR > > > > > > > to > > > > > > > > > > > fix a double-decoding issue, these characters "survive" > > > > > > > > > > > the > > > > > REST > > > > > > > layer > > > > > > > > > > > just fine. > > > > > > > > > > > > > > > > > > > > > > The issue now is in some layers beneath: as I pointed out > > > and > > > > > as > > > > > > > > > > > Dmitri demonstrated, we are unfortunately concatenating > > > > > identifiers > > > > > > > > > > > together to create storage locations, without proper > > > escaping. > > > > > This > > > > > > > > > > > currently results in corrupted storage locations. > > > > > > > > > > > > > > > > > > > > > > I'm trying first to fix the REST layer first, then I'll > > > move > > > > > to the > > > > > > > > > > > storage layer. > > > > > > > > > > > > > > > > > > > > > > > What's your take on leveraging > > > > > > > > > jakarta.ws.rs.ext.ParamConverterProvider > > > > > > > > > > > / jakarta.ws.rs.ext.ParamConverter for the path parameters > > > and > > > > > have > > > > > > > > > > > centralized helpers that deal with "proper" URL > > > > > encoding/decoding? > > > > > > > > > > > > > > > > > > > > > > For now I don't see a valid usage in Polaris for that, > > > since > > > > > Jersey > > > > > > > > > > > handles decoding path parameters already. > > > > > > > > > > > > > > > > > > > > > > > I also agree that the "configurable namespace separator" > > > must > > > > > > > never > > > > > > > > > > > change. Is my assumption correct, that it must always be > > > the > > > > > same > > > > > > > > > > character > > > > > > > > > > > as it is today? > > > > > > > > > > > > > > > > > > > > > > In Polaris, we are using the namespace separator in two > > > > > different > > > > > > > use > > > > > > > > > > > cases: > > > > > > > > > > > > > > > > > > > > > > 1) For path parameters in the REST layer > > > > > > > > > > > 2) For storing namespaces in Polaris entities > > > > > > > > > > > > > > > > > > > > > > What is clear is that in the second use case, the > > > > > > > > > > > namespace > > > > > must > > > > > > > NEVER > > > > > > > > > > > change. I just opened a PR for that: > > > > > > > > > > > https://github.com/apache/polaris/pull/4214 > > > > > > > > > > > > > > > > > > > > > > Regarding the first use case, once we solve all our > > > > > > > encoding/decoding > > > > > > > > > > > issues, and when Iceberg itself will introduce > > > > > > > > > > > configurable > > > > > > > > > > > separators, we MAY ask ourselves if Polaris should allow > > > them > > > > > to be > > > > > > > > > > > configurable or not. I don't have strong opinions, but if > > > the > > > > > > > > > > > separator is just a REST layer thing, it should be > > > possible to > > > > > > > change > > > > > > > > > > > it without breaking the storage layer or the metastore. > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > Alex > > > > > > > > > > > > > > > > > > > > > > On Wed, Apr 15, 2026 at 7:47 PM Dmitri Bourlatchkov < > > > > > > > [email protected]> > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > Hi All, > > > > > > > > > > > > > > > > > > > > > > > > Slashes in namespace seem to work fine (Spark 3.5 + > > > Iceberg > > > > > > > 1.10.0): > > > > > > > > > > > > > > > > > > > > > > > > spark-sql ()> create namespace `n/s`; > > > > > > > > > > > > Time taken: 0.335 seconds > > > > > > > > > > > > spark-sql ()> show namespaces; > > > > > > > > > > > > `n/s` > > > > > > > > > > > > Time taken: 0.232 seconds, Fetched 1 row(s) > > > > > > > > > > > > spark-sql ()> use `n/s`; > > > > > > > > > > > > Time taken: 0.028 seconds > > > > > > > > > > > > spark-sql (`n/s`)> create table t1 (n string); > > > > > > > > > > > > Time taken: 0.702 seconds > > > > > > > > > > > > > > > > > > > > > > > > The URLs appear to be encoded properly, e.g. (from > > > Polaris > > > > > log): > > > > > > > > > > > > > > > > > > > > > > > > 2026-04-15 13:41:17,594 INFO [io.qua.htt.access-log] > > > > > > > > > > > > > > > > > > > [dee1505c-ec1d-4f90-a9de-154eac66a40c_0000000000000000013,POLARIS] > > > > > > > > > > [,,,] > > > > > > > > > > > > (executor-thread-1) 127.0.0.1 - root > > > [15/Apr/2026:13:41:17 > > > > > -0400] > > > > > > > > > "GET > > > > > > > > > > > > > > > /api/catalog/v1/polaris/namespaces/n%2Fs/tables?pageToken= > > > > > > > HTTP/1.1" > > > > > > > > > > 200 > > > > > > > > > > > 74 > > > > > > > > > > > > > > > > > > > > > > > > I did not test trickier chars, but adding CI coverage > > > > > > > > > > > > for > > > > > them > > > > > > > would > > > > > > > > > be > > > > > > > > > > > > good. > > > > > > > > > > > > > > > > > > > > > > > > However, the S3 location in this case gets a proper > > > directory > > > > > > > > > > breakdown: > > > > > > > > > > > > > > > > > > > > > > > > $ mc ls rustfs/pol/n/s > > > > > > > > > > > > [2026-04-15 13:44:37 EDT] 0B t1/ > > > > > > > > > > > > > > > > > > > > > > > > ... and table metadata has: "location":"s3://pol/n/s/t1" > > > > > > > > > > > > > > > > > > > > > > > > ... but that is probably a different issue. > > > > > > > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > > > > Dmitri. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Apr 15, 2026 at 10:35 AM Robert Stupp < > > > > > [email protected]> > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > Thanks Alex for the thorough investigation! > > > > > > > > > > > > > > > > > > > > > > > > > > URL en/decoding is really not that easy. > > > > > > > > > > > > > I wonder how namespace elements and table/view names > > > with a > > > > > > > slash > > > > > > > > > > ('/') > > > > > > > > > > > > > character in the middle behave. Or other characters > > > like > > > > > '&' > > > > > > > or '?' > > > > > > > > > > or > > > > > > > > > > > '#'. > > > > > > > > > > > > > > > > > > > > > > > > > > Overall, I agree with your idea to implement correct > > > URL > > > > > > > > > > > encoding/decoding > > > > > > > > > > > > > in the Polaris code base to protect Polaris from > > > upstream > > > > > > > behavior > > > > > > > > > > > changes > > > > > > > > > > > > > that can seriously break or even corrupt things. > > > > > > > > > > > > > > > > > > > > > > > > > > What's your take on leveraging > > > > > > > > > > jakarta.ws.rs.ext.ParamConverterProvider > > > > > > > > > > > > > / jakarta.ws.rs.ext.ParamConverter for the path > > > parameters > > > > > and > > > > > > > have > > > > > > > > > > > > > centralized helpers that deal with "proper" URL > > > > > > > encoding/decoding? > > > > > > > > > > > > > > > > > > > > > > > > > > I also agree that the "configurable namespace > > > separator" > > > > > must > > > > > > > never > > > > > > > > > > > change. > > > > > > > > > > > > > Is my assumption correct, that it must always be the > > > same > > > > > > > character > > > > > > > > > > as > > > > > > > > > > > it > > > > > > > > > > > > > is today? > > > > > > > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > > > > > Robert > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Apr 15, 2026 at 3:48 PM Alexandre Dutra < > > > > > > > [email protected] > > > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi all, > > > > > > > > > > > > > > > > > > > > > > > > > > > > FYI I created a first PR to address the > > > double-decoding > > > > > > > issue: > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/apache/polaris/pull/4210 > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > Alex > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Apr 14, 2026 at 9:56 PM Alexandre Dutra < > > > > > > > > > [email protected] > > > > > > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi all, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I would also point out that Polaris uses > > > > > > > > > RESTUtil.encodeNamespace > > > > > > > > > > > and > > > > > > > > > > > > > > > RESTUtil.decodeNamespace for encoding and decoding > > > the > > > > > > > parent > > > > > > > > > > > > > > > namespace within a NamespaceEntity [1]. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > These methods also exhibit the faulty space > > > encoding > > > > > > > behavior. > > > > > > > > > > > > > > > Therefore, we must exercise **extreme caution** > > > > > regarding > > > > > > > any > > > > > > > > > > > upcoming > > > > > > > > > > > > > > > Iceberg project fixes for space-encoding issues. > > > > > > > > > > > > > > > If > > > > > these > > > > > > > > > methods > > > > > > > > > > > are > > > > > > > > > > > > > > > modified, it is imperative that we retain the > > > legacy > > > > > > > versions > > > > > > > > > > > > > > > specifically for encoding and decoding > > > NamespaceEntity > > > > > > > > > > properties – > > > > > > > > > > > > > > > otherwise we could end up with a corrupted > > > database. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > The same goes for the future namespace separator > > > coming > > > > > > > with > > > > > > > > > > > Iceberg > > > > > > > > > > > > > > > 1.11: for the sake of encoding and decoding > > > > > NamespaceEntity > > > > > > > > > > > > > > > properties, the separator must never change. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I would actually be in favor of proactively > > > > > internalizing > > > > > > > the > > > > > > > > > > > > > > > encoding/decoding algorithm used in > > > NamespaceEntity. > > > > > What > > > > > > > do > > > > > > > > > you > > > > > > > > > > > > > > > think? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > Alex > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > [1]: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/apache/polaris/blob/8ad8f74f62258ab6238190271603e4d4c8a75998/polaris-core/src/main/java/org/apache/polaris/core/entity/NamespaceEntity.java#L92 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Apr 14, 2026 at 7:43 PM Alexandre Dutra < > > > > > > > > > > [email protected] > > > > > > > > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi all, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > A discussion on the Iceberg ML [1] recently > > > > > highlighted > > > > > > > that > > > > > > > > > > URL > > > > > > > > > > > path > > > > > > > > > > > > > > > > segments are not being decoded correctly > > > according > > > > > to RFC > > > > > > > > > 3986, > > > > > > > > > > > > > > > > specifically regarding space encoding. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I investigated the situation in Polaris, and > > > found > > > > > many > > > > > > > > > > problems: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > TLDR > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > - Table names with the + sign can be created but > > > > > cannot > > > > > > > be > > > > > > > > > > > retrieved > > > > > > > > > > > > > > > > - Namespace names with the + sign are OK (can be > > > > > created > > > > > > > and > > > > > > > > > > > > > retrieved) > > > > > > > > > > > > > > > > - Table names with spaces cannot be created > > > > > > > > > > > > > > > > - Namespace names with spaces cannot be created > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > DISCUSSION > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Table names such as "foo+bar" can be created > > > > > > > > > > > > > > > > (via > > > > > POST, > > > > > > > where > > > > > > > > > > the > > > > > > > > > > > > > name > > > > > > > > > > > > > > > > is in the request body). But they cannot be > > > > > retrieved: > > > > > > > when > > > > > > > > > > > reading > > > > > > > > > > > > > > > > tables, the name is part of the URL path. > > > > > > > > > > > > > > > > Polaris > > > > > > > incorrectly > > > > > > > > > > > > > performs > > > > > > > > > > > > > > > > a second decoding step using > > > > > > > RESTUtil.decodeString(table), > > > > > > > > > even > > > > > > > > > > > > > though > > > > > > > > > > > > > > > > the REST framework has already decoded it. > > > > > Consequently, > > > > > > > a > > > > > > > > > > client > > > > > > > > > > > > > > > > sends "foo%2Bbar" which is first decoded to > > > > > "foo+bar" by > > > > > > > the > > > > > > > > > > > > > framework > > > > > > > > > > > > > > > > (correct) and then re-decoded by Polaris to "foo > > > bar" > > > > > > > > > > > (incorrect), > > > > > > > > > > > > > > > > resulting in a "not found" error. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Table and namespace names like "foo bar" simply > > > > > cannot be > > > > > > > > > > > created at > > > > > > > > > > > > > > > > all. This is because in > > > > > > > > > > > IcebergCatalog.defaultWarehouseLocation() and > > > > > > > > > > > > > > > > other similar places, we create locations merely > > > by > > > > > > > joining > > > > > > > > > > > > > > > > identifiers together, without any form of URL > > > > > encoding: > > > > > > > see > > > > > > > > > [2] > > > > > > > > > > > [3]. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > And even if tables like "foo bar" could be > > > created, > > > > > they > > > > > > > > > > > couldn't be > > > > > > > > > > > > > > > > retrieved by Java clients. This occurs because > > > > > current > > > > > > > Java > > > > > > > > > > > clients > > > > > > > > > > > > > > > > incorrectly encode that name as "foo+bar", which > > > the > > > > > REST > > > > > > > > > > > framework > > > > > > > > > > > > > > > > does not modify. Consequently, Polaris would > > > > > > > > > > > > > > > > look > > > > > for a > > > > > > > table > > > > > > > > > > > named > > > > > > > > > > > > > > > > "foo+bar" instead and throw a "not found" error. > > > > > (Other > > > > > > > > > clients > > > > > > > > > > > would > > > > > > > > > > > > > > > > send "foo%20bar" which would be correctly > > > decoded by > > > > > the > > > > > > > > > > > framework as > > > > > > > > > > > > > > > > "foo bar", and thus it would succeed.) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > PROPOSAL > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > To resolve the issue with the + sign in table > > > names, > > > > > we > > > > > > > > > simply > > > > > > > > > > > need > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > eliminate the redundant decoding step. I can > > > open a > > > > > PR > > > > > > > for > > > > > > > > > that > > > > > > > > > > > > > > > > shortly. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > To resolve the issue with spaces in table and > > > > > namespace > > > > > > > > > names, > > > > > > > > > > we > > > > > > > > > > > > > > > > could fix all the methods that incorrectly join > > > > > together > > > > > > > > > > > identifiers > > > > > > > > > > > > > > > > without proper URL encoding. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Finally, addressing the Java clients encoding > > > > > problem is > > > > > > > > > > > complex, but > > > > > > > > > > > > > > > > we could consider implementing a workaround as > > > > > follows: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 1) If the client is Java and lacks the upcoming > > > > > Iceberg > > > > > > > fix > > > > > > > > > for > > > > > > > > > > > space > > > > > > > > > > > > > > > > encoding, manually replace "+" with a space to > > > > > correct > > > > > > > the > > > > > > > > > > > client's > > > > > > > > > > > > > > > > faulty encoding. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 2) For non-Java clients or those with the fix, > > > > > > > > > > > > > > > > no > > > > > > > workaround > > > > > > > > > > > would be > > > > > > > > > > > > > > required. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > What are your thoughts on this? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > Alex > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > [1]: > > > > > > > > > > > > > > > > > > > > https://lists.apache.org/thread/c498svln0x18vvm42998b9nm9j6ck5yh > > > > > > > > > > > > > > > > [2]: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/apache/polaris/blob/e94fdff63852dc41635c9e7eb62b3627ba562b85/runtime/service/src/main/java/org/apache/polaris/service/catalog/iceberg/IcebergCatalog.java#L379 > > > > > > > > > > > > > > > > [3]: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/apache/polaris/blob/e94fdff63852dc41635c9e7eb62b3627ba562b85/runtime/service/src/main/java/org/apache/polaris/service/catalog/iceberg/IcebergCatalog.java#L571 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
