pvary commented on issue #5339:
URL: https://github.com/apache/iceberg/issues/5339#issuecomment-1193063396
> Yea I think there was an old similar discussion here: #3064. I think we
can do a per check of all files added in same transaction, but anything beyond
that involves an expensive spark call to check for duplicates in the table
itself?
Thanks @szehon-ho, I was not aware of the old thread. It seems like a
reasonable comprise to accept duplicated files, if we do not parse the whole
table metadata anyway.
What is the level of the data parsed when we have a `Table` object at hand?
Which metadata files do we read when we commit something? Does anyone have a
quick answer for this, or shall I check?
Thanks everyone for the answers!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]