Hi all,

Following discussion on PR #3699
<https://github.com/apache/polaris/pull/3699> and the broader SPI-surface
thread <https://lists.apache.org/thread/0nj24zro7kyctqfnlml08ppo7zs9xcqs>,
I'd like to open a discussion on a set of related changes to the storage
credential-vending SPI surface in polaris-core. The latest state of the PR
reflects the changes as well.

The PR changes bunch of stuff around credential vending flow, but I wanted
to specifically highlight here the changes that touch on the SPI surface.
Not sure about the correct procedure, but I can initiate a vote as well if
necessary.

Background
----------

Today, credential vending is split across two SPIs that mirror each other:

   - PolarisCredentialVendor (an interface mixed into
   PolarisMetaStoreManager)
   - PolarisStorageIntegrationProvider, which produces
   PolarisStorageIntegration instances that themselves expose a
   getSubscopedCreds method.


In addition, IntegrationPersistence carries storage-integration lifecycle
methods (createStorageIntegration, persistStorageIntegrationIfNeeded,
loadPolarisStorageIntegration) intended for "lease/commit/use"
customizations backing stateful StorageIntegration objects.

The OSS code has converged such that PolarisCredentialVendor and
PolarisStorageIntegration have effectively the same shape, and the OSS path
no longer calls loadPolarisStorageIntegration. The proposal below collapses
the duplication and makes PolarisStorageIntegrationProvider the single
first-class SPI for storage credential vending, while keeping the lease/
commit semantics available to custom implementations through the
IntegrationPersistence methods that are still in use.

Proposal
--------

(1) Remove PolarisCredentialVendor as an SPI.
    - Drop it from the PolarisMetaStoreManager interface aggregate.
    - Move the OSS implementation behind PolarisStorageIntegrationProvider.

(2) Remove StorageCredentialsVendor.
    - Functionality is fully subsumed by PolarisStorageIntegration.

(3) Promote PolarisStorageIntegrationProvider to a first-class SPI with a
single method:

        PolarisStorageIntegration getStorageIntegration(
            List<PolarisEntity> resolvedEntityPath);

    - OSS impl: resolves storage config from the entity path and returns a
per-config cached PolarisStorageIntegration.
    - Custom impls: free to call
IntegrationPersistence.loadPolarisStorage-Integration (or any lease/commit
hook they own) directly.

(4) Remove IntegrationPersistence.loadPolarisStorageIntegration from the
    polaris-core SPI.
    - Rationale: it has no remaining callers in OSS. Custom
MetaStore-Manager implementations that rely on a "lease/commit/use" model
can keep an equivalent method on their concrete persistence class and call
it from their own PolarisStorageIntegrationProvider impl.
    - createStorageIntegration and persistStorageIntegrationIfNeeded remain
in IntegrationPersistence for the same lease/commit use case.

(5) Slim PolarisStorageIntegration down to a single method,
    getStorageAccessConfig(...). Specifically, remove:
    - validateAccessToLocations (unused; intended caller never materialized)
    - getStorageIdentifierOrId (unused dead code in OSS)

(6) Replace the boolean+two-set form
        boolean allowListOperation,
        Set<String> allowedReadLocations,
        Set<String> allowedWriteLocations
    with a per-location action-set form. After discussion on the PR, the
proposed signature is:

        StorageAccessConfig getStorageAccessConfig(
>             List<LocationGrant> grants,
>             Optional<String> refreshEndpoint,
>             CredentialVendingContext context);
>         record LocationGrant(
>             Set<String> locations,
>             Set<PolarisStorageActions> actions);


    Rationale:

   - PolarisStorageActions already carries DELETE, which today is
   folded into "write" and cannot be requested independently.
   - The per-location form keeps the door open for legitimate use
   cases that need different actions on different prefixes (e.g.
   legacy read-only locations on cloned tables, snapshot-expiry workers
   that need DELETE on data/ but not READ, metadata-only writers, etc.). PR
   comment
   <https://github.com/apache/polaris/pull/3699#issuecomment-4393945081> by
   Dennis explains more.
   - Today's two-set form is always called with identical read and
   write sets in OSS, so collapsing the duplication is safe in practice
   while the LocationGrant form preserves the expressive power.
   - caveat: seems like Azure token generation is not expressive enough to
   carry different sets of privileges for different locations, but that's an
   issue even for the current signature which might be invoked with different
   read/write locations.

please comment here or on the PR thread if there are concerns with these
changes.

Thanks,
Tornike

Reply via email to