Re: [DISCUSS][REST] Granularity of referenced-by context in loadTable calls

Ryan Blue Fri, 06 Feb 2026 14:29:47 -0800

Yeah, I think it's a good idea to keep the entire chain. That's more
flexible and accounts for cases where we don't need to constrain the
catalog's decisions about its authorization model. Thanks for the
discussion here, I'm glad to double check this!


On Fri, Feb 6, 2026 at 2:27 PM Prashant Singh <[email protected]>
wrote:

> Thank you for the feedback everyone ! I believe I also agree here that we
> don't need the entire reference chain to be secure.
>
> Though I can totally understand how having the whole reference chain in
> the catalog can be helpful in  AuthZ as sometimes they can be very complex
> based security models / guarantees catalog provides , I believe this is
> where Christian and Ryan suggestion is, I feel like we should send complete
> reference chains from the client to the server to support these use cases.
>
> I checked offline with Russell too (and he kind of hinted in the above
> message that he doesn't have strong feelings either way) we are good !
>
>  I believe we have consensus here in the thread to keep the complete chain
> ! It would be nice to advance with the voting !
>
> Best,
> Prashant Singh
>
> On Wed, Feb 4, 2026 at 2:14 PM Russell Spitzer <[email protected]>
> wrote:
>
>> To me
>>
>> Otherwise, B would not have been provided to the engine. Are there cases
>> where an engine might load B but not intend to allow access to the
>> tables it references?
>>
>> This sounds like the definition of an invoker view. A user is able to
>> load the view definition, but the table load itself is on a per user basis
>> so we don't really have DEFINER behavior imho.
>>
>> I honestly don't have strong feelings either way here, If we want to move
>> forward with the full chain that's fine with me since I feel like Catalogs
>> will get to make these decisions on what their particular permission
>> structures allow. Personally, I wouldn't want to give someone permission to
>> modify a view that is run-as another user if they don't have the
>> permissions as that user to access the underlying tables ;)
>>
>> On Wed, Feb 4, 2026 at 3:49 PM Ryan Blue <[email protected]> wrote:
>>
>>> The DEFINER view referenced by a DEFINER view is a good case to think
>>> about, but I don’t think that it requires the entire reference chain in
>>> order to be secure.
>>>
>>> Using the object names from Russell’s response, when view B is loaded
>>> and referenced-by is A, the catalog must trust that the engine is
>>> setting referenced-by correctly. It trusts that the engine will not lie
>>> and say that B is referenced from A instead of another view, and it
>>> trusts that projections, filters, etc. from A will be applied to data
>>> from B.
>>>
>>> I think the question here is whether the first guarantee, that A was
>>> loaded and referenced B, is sufficient when deciding whether the query
>>> has access to B and the tables it references. The catalog *could*
>>> assume that because B is the referenced-by for C from a trusted engine,
>>> that the query must have access to B. Otherwise, B would not have been
>>> provided to the engine. Are there cases where an engine might load B
>>> but not intend to allow access to the tables it references?
>>>
>>> I think there’s a fair argument that those cases exist. When tables or
>>> views are loaded, there’s no intent included. The catalog doesn’t know
>>> whether a view was loaded for a SHOW HISTORY command or because it is
>>> being updated or being run. So a view could be loaded because a user has
>>> some other permission, like MODIFY, but not SELECT. Or maybe a
>>> permission to audit the view but not see data. If the catalog allows those
>>> cases, then being able to load B doesn’t necessarily mean the query has
>>> access to the data that B produces. In that case, you would need to
>>> check the permissions that A has on B to determine whether to load/vend
>>> credentials for C.
>>>
>>> In writing this email, I think I’ve been convinced that Christian is
>>> correct and that it is best to keep the reference chain. Russell and
>>> Prashant, what do you think?
>>>
>>> Ryan
>>>
>>> On Wed, Feb 4, 2026 at 1:12 PM Russell Spitzer <
>>> [email protected]> wrote:
>>>
>>>> I understand the logging concern but not the correctness one.
>>>>
>>>> Are you saying we have to re-check to make sure nothing has changed
>>>> since we started?
>>>>
>>>> I would assume in this auth chain we could get by with a referenced_by
>>>> in the view request as well?
>>>> A  (View) => B (View) => C (Table)
>>>> LoadView(A)                                   gets the first view
>>>> LoadView(B, referenced_by A)       is for the second view using
>>>> "referenced_by" the first view
>>>> LoadTable(C, referenced_by B)      Finally we request the table using
>>>> referenced_by the second view
>>>>
>>>> Do we need the full chain in this case?
>>>>
>>>> I'm kind of convinced though by the logging argument since that would
>>>> be useful information to have, although I'm not
>>>> sure the Catalog couldn't piece this back together. It would definitely
>>>> be simpler to have it just always present.
>>>>
>>>> On Wed, Feb 4, 2026 at 2:34 PM Christian Thiel <
>>>> [email protected]> wrote:
>>>>
>>>>> Your assumption is correct—the 1st DEFINER view is authorized before
>>>>> the query engine retrieves its content and learns it references the 2nd
>>>>> DEFINER.
>>>>>
>>>>> Let me clarify the setup I had in mind: Query engines increasingly
>>>>> support passing user tokens to the catalog for authorization. Examples
>>>>> include Starburst's OAuth2 Token Passthrough [1] and StarRocks' JWT
>>>>> authentication [2].
>>>>>
>>>>> In such setups, the second request to the 2nd DEFINER view becomes
>>>>> problematic: the catalog receives a request from a user / invoker lacking
>>>>> direct access. Using the hypothetical "referenced-by" field—and assuming a
>>>>> trust relationship with the engine guaranteeing correctness—we must
>>>>> validate both:
>>>>>
>>>>> 1. The authorization decision for the 1st DEFINER still holds
>>>>> 2. The 1st DEFINER's owner has access to the 2nd
>>>>>
>>>>> While catalogs could issue short-lived authorization proof when
>>>>> returning the 1st DEFINER, re-authorizing is equally valid and arguably
>>>>> preferable, as the information is more current.
>>>>>
>>>>> Extending this to the TABLE level: we can either provide authorization
>>>>> proof with the 2nd DEFINER (presented when querying the TABLE), or
>>>>> re-authorize the entire chain.
>>>>>
>>>>> Without carrying client-side trust between requests, having the full
>>>>> (trusted) chain is the only way to authorize TABLE access (again requiring
>>>>> correctness guarantees through other trust mechanisms). Therefore,
>>>>> authorizing table access can only be seamlessly explained with the 
>>>>> complete
>>>>> chain. Explicitly providing this information explicitly is preferable to
>>>>> reconstructing it from the TABLE metadata plus all prior authorization
>>>>> requests in my opinion - if only for audit logging.
>>>>>
>>>>> Does that make my thoughts clear?
>>>>>
>>>>> [1]
>>>>> https://docs.starburst.io/latest/object-storage/metastores.html#oauth-2-0-token-pass-through
>>>>> [2]
>>>>> https://docs.starrocks.io/docs/data_source/catalog/iceberg/iceberg_rest_security/#security-mechanisms
>>>>>
>>>>> Best,
>>>>>
>>>>> Christian
>>>>>
>>>>> On Wed, 4 Feb 2026 at 20:20, Prashant Singh <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Thank you for the feedback Christian !
>>>>>> I agree having full context could help in Audit purpose.
>>>>>>
>>>>>> Though, I am not able to fully understand your feedback from AuthZ
>>>>>> pov can you please elaborate ?
>>>>>> IIUC in your example 1st DEFINER => 2nd DEFINER => TABLE
>>>>>> user's access to 1st DEFINER view would have been Authorized before
>>>>>> the Query Engine could learn that 1st DEFINER references the 2nd 
>>>>>> DEFINER, i
>>>>>> am assuming it has a success in getting the view definition ? All it 
>>>>>> needs
>>>>>> to know when loading the table is what the view is referencing, when
>>>>>> it's authorizing the loadTable.
>>>>>>
>>>>>> regarding the referenced-by in the loadView thats a good
>>>>>> recommendation, let me think more
>>>>>>
>>>>>> Best,
>>>>>> Prashant Singh
>>>>>>
>>>>>>
>>>>>> On Tue, Feb 3, 2026 at 11:28 AM Christian Thiel <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> I prefer to keep the full chain.
>>>>>>>
>>>>>>> Consider this scenario:
>>>>>>> 1st DEFINER => 2nd DEFINER => TABLE
>>>>>>>
>>>>>>> When a user has access only to the outer view and the load table
>>>>>>> endpoint is called, the following authorizations conditions must be 
>>>>>>> ensured:
>>>>>>>
>>>>>>>    1. Owners of the DEFINER views still have access to their
>>>>>>>    referenced objects
>>>>>>>    2. The querying User has access to his entrypoint - the 1st
>>>>>>>    DEFINER View
>>>>>>>
>>>>>>> If the load table endpoint receives only the immediate parent in
>>>>>>> referenced-by, we lose critical information for check (2). This
>>>>>>> means the request data alone—even if trusted—is insufficient to make a
>>>>>>> complete authorization decision unless the server internally correlates 
>>>>>>> the
>>>>>>> call to the 2nd DEFINER load with the load table request, as we can't 
>>>>>>> trace
>>>>>>> it back to the 1st DEFINER otherwise. To make this work consistently we
>>>>>>> would require referenced-by also for the load View endpoint.
>>>>>>>
>>>>>>> Additionally, knowing the user's entry point is valuable for
>>>>>>> auditing purposes, particularly in DEFINER-heavy implementations.
>>>>>>>
>>>>>>> I kind of disagree that postgres DEFINER views don't require deeply
>>>>>>> nested context.
>>>>>>>
>>>>>>> Postgres just handles this chain internally:
>>>>>>> 1. User is allowed to query 1st DEFINER
>>>>>>> 2. thus 2nd DEFINER may be used to respond to the query
>>>>>>> 3. thus TABLE maybe used to respond to the query
>>>>>>> But propagating this trust relationship in Icebeberg REST is more
>>>>>>> complex as objects are queried individually, so we can't just validate 
>>>>>>> the
>>>>>>> full plan, but instead need to be able to validate access to each
>>>>>>> individual component it requires.
>>>>>>>
>>>>>>> Best,
>>>>>>> Christian
>>>>>>>
>>>>>>> On Mon, 2 Feb 2026 at 19:44, Russell Spitzer <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> Just to re-up my comments from the discussion.
>>>>>>>>
>>>>>>>> I'm in favor of Immediate Parent only. Full chain seems to be for
>>>>>>>> situations where we want to be able to "override" the security
>>>>>>>> definition of an inner nested view. For users who want to
>>>>>>>> do this, I would encourage them to just make a brand new definer
>>>>>>>> view without referencing the "invoker" view.
>>>>>>>>
>>>>>>>> For example
>>>>>>>>
>>>>>>>> DEFINER => INVOKER => TABLE
>>>>>>>>
>>>>>>>> The "definer" should not be able to remove the "invoked" nature of
>>>>>>>> access to the table. If a user really
>>>>>>>> wants that behavior they should construct
>>>>>>>>
>>>>>>>> DEFINER (Combined with INVOKER SQL) => TABLE
>>>>>>>>
>>>>>>>> I'd rather we didn't encourage more complicated constructions
>>>>>>>>
>>>>>>>> On Mon, Feb 2, 2026 at 12:34 PM Prashant Singh <
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>>> Hi everyone,
>>>>>>>>>
>>>>>>>>> I’m currently working on passing additional context via the
>>>>>>>>> referenced-by parameter in loadTable calls. This is a
>>>>>>>>> foundational step toward enabling catalogs to make authorization 
>>>>>>>>> decisions
>>>>>>>>> based on query execution context.
>>>>>>>>>
>>>>>>>>> While the broader trust relationships and AuthZ constructs are
>>>>>>>>> outside the scope of IRC, I’d like to align on the level of detail we
>>>>>>>>> should provide. Specifically: *Should we send the entire view
>>>>>>>>> reference chain, or only the immediate parent view on nested views?*
>>>>>>>>>
>>>>>>>>> The following are trade-offs:
>>>>>>>>>
>>>>>>>>>    -
>>>>>>>>>
>>>>>>>>>    *Full Chain:* Provides maximum flexibility for the server to
>>>>>>>>>    make complex AuthZ decisions but increases client-side overhead for
>>>>>>>>>    tracking nested references.
>>>>>>>>>    -
>>>>>>>>>
>>>>>>>>>    *Immediate Parent:* Simpler for the client to implement but
>>>>>>>>>    provides limited context for sophisticated authorization policies.
>>>>>>>>>
>>>>>>>>> *Prior Art & Research:* As noted in this discussion
>>>>>>>>> <https://github.com/apache/iceberg/pull/13810#discussion_r2747121401>
>>>>>>>>> (thanks Ryan and Russell), Postgres handles this via DEFINER
>>>>>>>>> (owner permissions) and INVOKER (query permissions) without
>>>>>>>>> requiring deeply nested context. My research into other engines hasn't
>>>>>>>>> yielded a standard "gold level" approach yet, as some platforms simply
>>>>>>>>> restrict nested view complexity.
>>>>>>>>>
>>>>>>>>> I’d love to hear your thoughts on which approach aligns better.
>>>>>>>>>
>>>>>>>>> Best regards,
>>>>>>>>>
>>>>>>>>> Prashant Singh
>>>>>>>>>
>>>>>>>>

Re: [DISCUSS][REST] Granularity of referenced-by context in loadTable calls

Reply via email to