+1 for engine-agnostic, unified metadata and discovery, multi-tenancy and granular ACLs catalog federation.
It's important to consider how the engine will consume the metadata before we start. As the catalog is engine-agnostic, I would like to add a plugin service between the Metastore client and the engine, this plugin will translate the Hive metadata to anything the engine wants. On 2026/03/23 19:45:44 Sai Hemanth Gantasala wrote: > +1 to Deny's and Butao's suggestions. > > Lisoda, > 1) I agree that relying on external permission systems for basic table > visibility can be complex and error-prone. However, introducing capability > filtering, even based on format type, still moves HMS away from its core > role as an engine-agnostic metadata service. We need a solution that > addresses the operational complexity without compromising HMS neutrality. > 2) I see your point on operational complexity, but the need for external > permissions goes beyond format support, it is essential for multi-tenancy > and granular security. We must be able to hide a sensitive Iceberg table > from a user, even if their engine is capable of reading Iceberg. Separating > the security policy (ACLs) from the metadata definition (HMS) remains the > correct architectural approach IMO. > > Thanks, > Sai > > On Mon, Mar 23, 2026 at 3:18 AM Butao Zhang <[email protected]> wrote: > > > I mostly agree with Denys's viewpoint. That is, when querying Iceberg and > > Hudi tables in HMS, engines need to implement and configure their own > > connectors. These connectors are specific to each engine and have nothing > > to do with HMS itself. HMS serves as a neutral, unified metadata management > > service, responsible only for managing the lifecycle of catalogs (such as > > creation and deletion) and providing unified metadata authorization > > services. > > > > > > Add some extra information to respond to lisoda: > > > > 1) Q1: HMS may store various types of tables (e.g., Iceberg, Hudi), and > > some engines may not be able to query certain types of tables stored in HMS. > > First, this issue seems unrelated to the multi-catalog or federated > > catalog approach I proposed. This is essentially a problem where multiple > > table formats (Iceberg, Hudi, etc.) are mixed within a single HMS catalog. > > When a compute engine is configured with this HMS catalog, it may be able > > to see all tables via `SHOW TABLES`, but it may only be able to query a > > subset of them. This issue should be handled at the compute engine level. > > For example, the engine can determine whether a table should be visible or > > whether it can be queried based on table attributes like `table_type`. > > For instance, StarRocks provides a catalog/connector called the Unified > > Catalog ( > > https://docs.starrocks.io/docs/data_source/catalog/unified_catalog/), > > which can query multiple table formats (such as Iceberg and Hudi) stored in > > the same HMS. > > > > If users only want to query a specific type of table stored in the same > > HMS, such as Iceberg tables, they can create a dedicated catalog/connector, > > like the Iceberg Catalog ( > > https://docs.starrocks.io/docs/data_source/catalog/iceberg/iceberg_catalog/). > > This catalog/connector allows users to see only Iceberg tables when running > > `SHOW TABLES`, and any other table formats will be invisible. > > > > Additionally, based on my tests, when using > > `org.apache.iceberg.spark.SparkSessionCatalog`, Spark should be able to > > query both Hive tables and Iceberg tables through the HMS catalog. > > > > 2) Q2: Regarding the issue of circular catalogs, I believe this does not > > exist. When a compute engine is configured with an HMS catalog, that HMS > > catalog can only see its own catalog namespace (databases and tables). The > > engine cannot see information from other catalogs through this HMS catalog. > > > > > > Thanks, > > Butao Zhang > > ---- Replied Message ---- > > From lisoda<[email protected]> <[email protected]> > > Date 3/20/2026 22:53 > > To dev<[email protected]> <[email protected]> > > Subject Re: [Discuss][HIVE-28879] Federated Catalog Support in Apache Hive > > I understand your concern, but I may not have expressed myself clearly—I > > don't intend to tightly couple the catalog with specific engine runtime > > configurations either. What I'm suggesting is a lightweight convention > > mechanism, not deep integration. > > My idea is actually quite simple: engines could report just a few boolean > > flags upon connection (e.g., supports_iceberg: true/false ), or we could > > push the filtering logic down to the engine side via an SDK. This is less > > about "coupling" and more about a declarative contract. > > From an engineering perspective, convention over configuration is > > generally the better path: > > > > Convention (auto-reporting/filtering): The engine declares its > > capabilities → HMS or the SDK automatically masks incompatible metadata. > > This maintains a single source of truth—the physical properties of the > > table (format, location) directly determine its visibility. > > > > Configuration (manual access control): Administrators manually maintain a > > separate set of ACL rules outside of HMS to hide certain tables. This > > essentially creates duplicate definition—the metadata layer already defines > > "this is an Iceberg table," and then the permission layer has to define > > "this engine shouldn't see this Iceberg table." As the number of tables or > > engines scales, this manual synchronization overhead becomes unmanageable. > > In other words, I'm not asking HMS to understand "what connectors Spark > > 3.4 has installed." I'm simply suggesting that the physical properties of > > the metadata (the format type) should automatically determine its > > distribution scope. If HMS remains completely agnostic and relies on > > external permission systems to retroactively hide visibility, doesn't that > > actually increase operational complexity? > > > > > > ---- Replied Message ---- > > From Denys Kuzmenko<[email protected]> <[email protected]> > > Date 03/20/2026 19:12 > > To [email protected] > > Cc > > Subject Re: [Discuss][HIVE-28879] Federated Catalog Support in Apache Hive > > I don’t think tying catalog behavior to engine capabilities is a good > > direction. A catalog should remain engine-agnostic and focus purely on > > metadata management and discovery, not on the execution capabilities of > > specific query engines. > > > > Hive Metastore is intentionally designed as a neutral metadata service. It > > exposes table definitions, while each engine (e.g., Apache Spark, Trino, > > etc.) decides whether it can actually process those tables based on its > > configured connectors or format support. Introducing capability negotiation > > would effectively couple the catalog to specific engines and their runtime > > configuration, which breaks that separation of concerns and makes the > > catalog responsible for execution-layer logic. > > > > If a particular engine does not support a given format or catalog (for > > example, it does not have the appropriate client/connector installed), the > > cleaner solution is access control, not metadata filtering. In practice, > > permissions can simply be removed for users of that engine on catalogs or > > tables they are not expected to query. > > > > Keeping the catalog engine-agnostic preserves interoperability and avoids > > embedding engine-specific behavior into the metadata layer. > > >
