Hi All, One big emerging enterprise use case coming up as more people consolidate Data Lakehouses and Catalogs is something commonly known as "Data Sharing", an more specifically over the course of adoption of open table formats "Open Data Sharing".
Examples of existing managed service providers' Data Sharing features: https://www.databricks.com/product/delta-sharing https://docs.snowflake.com/en/user-guide/data-sharing-intro https://docs.aws.amazon.com/redshift/latest/dg/datashare-overview.html https://docs.cloud.google.com/bigquery/docs/analytics-hub-introduction https://learn.microsoft.com/en-us/fabric/governance/external-data-sharing-overview The basic idea is that when you share data between different companies, you need a first-class governance/management layer and extra bells-and-whistles that are distinct from just the basic capabilities of RBAC or generalized access-control (i.e. if you're sharing across partially-untrusted org boundaries, you don't just let the consumer organization log into your datalake like one of your own employees). JB and I put together this high-level proposal for supporting Open Sharing in Polaris: https://docs.google.com/document/d/1Y0yQi5iWbmuTHPkFiIs7WjIiC3EXJTl1PzZ-wtoRnZ0/edit?usp=sharing Tentatively, it means adding ~5 logical data model constructs, some of which may be a first-class PolarisEntity type, others subtypes of existing entities, and others just a nested construct: - ShareEntity (would behave similarly to a Catalog) - ExternalConsumer (mostly inherits from Principal) - Listing (Similar to a "role grant" but has different metadata) - EndpointConfig (nested config under Listing) - ShareMembership (Similar to a "securable grant" but different metadata) Feedback/comments welcome! I'll also bring it up for live discussion if there's time in the community sync. Cheers, Dennis
