FANNG1 opened a new issue, #11588:
URL: https://github.com/apache/gravitino/issues/11588

   ### Version
   
   main branch
   
   ### Describe what's wrong
   
   The Lance REST server's `ListTables` endpoint returns **fully-qualified** 
table names (`catalog{delimiter}schema{delimiter}table`) instead of the **leaf 
table names** required by the Lance Namespace spec.
   
   As a result, a Spark client using the Lance namespace connector shows 
polluted table names in `SHOW TABLES` — each table name embeds the catalog and 
schema, e.g. `catalog.schema.my_table` instead of `my_table`.
   
   ### Root cause
   
   
`lance/lance-common/src/main/java/org/apache/gravitino/lance/common/ops/gravitino/GravitinoLanceNameSpaceOperations.java`
 (`listTables`):
   
   ```java
   List<String> tables =
       
Arrays.stream(catalog.asTableCatalog().listTables(Namespace.of(schemaName)))
           .map(ident -> Joiner.on(delimiter).join(catalogName, schemaName, 
ident.name())) // <-- returns full qualified name
           .sorted()
           .collect(Collectors.toList());
   ```
   
   The parent namespace is already conveyed by the request `id` (`catalog`, 
`schema`), so the response must only contain the child table names.
   
   ### Why it surfaces in Spark
   
   The Lance Spark connector (`BaseLanceNamespaceSparkCatalog.listTables`) 
trusts the response strings as leaf names and wraps them directly:
   
   ```java
   for (String table : response.getTables()) {
     identifiers.add(Identifier.of(namespace, table)); // table = 
"catalog.schema.tbl"
   }
   ```
   
   Spark then renders `Identifier.name()`, i.e. the full string returned by the 
server, producing the `catalog.schema.table` display.
   
   ### Spec & reference implementations
   
   - Spec `ListTables` (`docs/src/spec.yaml`): *"List all child table **names** 
of the parent namespace `id`."*
   - Reference impls return only the leaf name, e.g. Glue: `.forEach(t -> 
tables.add(t.name()))`.
   
   (Note: the shared `ListTablesResponse` schema description mentioning a "full 
identifier in string form" refers to the recursive `/v1/table` 
(list-all-tables) endpoint, not the per-namespace `ListTables`.)
   
   ### Secondary issues in the same method
   
   1. Results are sorted with `.sorted()` and then collected into a `HashSet` 
via `Sets.newHashSet(page.items())`, which discards the ordering. A 
`ListNamespacesResponse` object is also reused to carry table results, which is 
misleading.
   2. The method hard-asserts a 2-level namespace (`nsId.levels() == 2`); worth 
verifying this matches the connector's `parent` / `single_level_ns` 
configuration expectations.
   
   ### How to reproduce
   
   1. Start the Gravitino Lance REST server over a lakehouse catalog with a 
schema containing one or more tables.
   2. Configure a Spark `lance` namespace catalog (`impl=rest`) pointing at the 
server.
   3. Run `SHOW TABLES`.
   4. Observe table names appear as `catalog.schema.table` instead of `table`.
   
   ### Expected behavior
   
   `ListTables` should return only the leaf table names, so `SHOW TABLES` shows 
`table`.
   
   ### Additional context
   
   Suggested fix: change the mapping to `.map(ident -> ident.name())` (and 
preserve sort order / use a proper list for the paged result).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to