Hi Jack,
I think this is an interesting idea but I think there are some practical
concerns (I posted them inline).

- general access patterns, like read-only, read-write, admin full access,
> etc.

Is this intended to be information only?  I would hope the tokens and REST
API vending to clients would enforce these settings, so it seems like this
would mostly be for debug purposes (e.g. if only read access is available,
only tokens with "read" privileges are vended, or without full access admin
rights update to the catalog would not be allowed).

- columns that the specific caller has access to for read or write
> - filters (maybe expressed in Iceberg expression) that should be applied
> by the engine on behalf of the caller during a table scan

I have a few concerns here:
1.  I worry a little bit about putting security features into the REST API
that require the execution engine and catalog to agree on semantics and
execution.  All it takes is one engine to ignore these as the security
provided is no longer applicable.  For more tightly controlled environments
this is viable but it feels like some very large consequences if users make
the wrong choice on engine or even if there is an engine using a stale REST
API client (i.e. we would need to be very careful with
compatibility guarantees).
2.  The row-level security feature linked is designed so that end-users are
not aware of which, if any, filters were applied during the query.  I think
replicating this would be challenging, since it requires distinguishing
between direct user access to the catalog and a query engine working on a
user's behalf.
3.  In terms of dialect, I imagine it would probably make sense to be
agnostic here and follow a similar model that views are taking by allowing
multiple dialects (or at least wait to see how the view works out in
practice).


For points 1 and 2 a different approach would be to introduce a new
standard based on something like Apache Arrow's Flight or Flight SQL
protocol that acts as a layer of abstraction between physical storage and
security controls.

- constraints (again, maybe expressed in Iceberg expression) that should
> trigger the table scan or table commit to be rejected


It feels like this should probably be part of the table spec, as in
general, it affects the commit protocol (IIUC it is already covered
partially with identifier-field IDs).

Thanks,
Micah



On Tue, Feb 13, 2024 at 10:42 AM Jack Ye <yezhao...@gmail.com> wrote:

> Hi everyone,
>
> I would like to get some initial thoughts about the possibility to add
> some permission control constructs to the Iceberg REST spec. Do we think it
> is valuable? If so, how do we imagine its shape and form?
>
> The background of this idea is that, today Iceberg already supports loading
> credentials to a table through the config field
> <https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml#L2714-L2719>
> in LoadTableResponse, as a basic way to control data access. We heard that
> users really like this feature and want more regarding data access control
> and permission configuration in Iceberg.
>
> For example, we could consider add a *policy* field in the REST
> LoadTableResponse, where a policy has sub-fields that describe:
> - general access patterns, like read-only, read-write, admin full access,
> etc.
> - columns that the specific caller has access to for read or write
> - filters (maybe expressed in Iceberg expression) that should be applied
> by the engine on behalf of the caller during a table scan
> - constraints (again, maybe expressed in Iceberg expression) that should
> trigger the table scan or table commit to be rejected
>
> This could be the solution to some topics we discussed in the past. For
> example, we can use this as a solution to the EXTERNAL database semantics
> support discussion
> <https://lists.apache.org/thread/ohqfvhf4wofzkhrvff1lxl58blh432o6> by
> saying an external table has read-only access. We can also let the REST
> service decide access to columns, which solves some governance issues
> raised during the column tagging discussion
> <https://lists.apache.org/thread/yflg8w1h87qgwc4s3qtog4l8nx8nk8m0>.
>
> Outside existing discussions, this can also work pretty well with popular
> engine vendor features like row-level security
> <https://cloud.google.com/bigquery/docs/row-level-security-intro>, check
> constraint <https://docs.databricks.com/en/tables/constraints.html>, etc.
>
> In general, permission control and data governance is an important aspect
> for enterprise data warehousing. I think having these constructs in the
> REST spec and related engine integration could increase enterprise adoption
> and help our vision of standardizing access through the REST interface.
>
> Would appreciate any thoughts in this domain! And if we have some general
> interest in this direction, I can put up a more detailed design doc.
>
> Best,
> Jack Ye
>

Reply via email to