singhpk234 commented on PR #14502: URL: https://github.com/apache/iceberg/pull/14502#issuecomment-3771588110
Thank you for the feedbacks @rdblue ! > the right solution is to stop embedding partition information in the snapshot summary and instead capture that data (if it is needed) using the metrics reporting framework and REST endpoint Agree, i think its an anti-pattern here were we leak stuff specially there are multiple ways to achieve the same. I am not sure we have a clear way to ban such writers, may be the end user made a dashboard on top of it because its convenient for them ? > I'd recommend solving that problem more directly with something like a catalog override that suppresses them. IIUC, there can be cases such as a table was un-protected when the snapshot was added which contained partition stats, but now it is protected (we can enforce always to not add partition summary irrespective if the table is protected or not), may be this is a check then we would need to do as part of policy (RAP) attachment to make the attachment fail, but i think policy is sometimes attached via TAGs, may be failure at runtime then that "hey this table is protected but it has sensitive info which catalog can't hide", we throw 403 and prompt user to fix it. would expiring the snapshot be only solution then ? or we expect the user to rewrite the `metadata.json` without such summary and then do a force register ? > Or just drop them at the catalog level when processing AddSnapshot changes. My understanding is unless we spec this out, it would hard to enforce across catalog, for example the cases of federation where one defines a policy on a federated table (catalog C1 federating to catalog C2) in will run into cases where AddSnapshot in C2 didn't enforce this and hence the table can't be queried now and we fail at runtime when queried from C1 since the policies are defined here. Hence i thought having something like metadata projection would give some flexibility to the catalogs to properly redact info (since snapshot summary is optional) without burdening the end-user. Please let me know your thoughts considering above. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
