wypoon commented on issue #6042:
URL: https://github.com/apache/iceberg/issues/6042#issuecomment-1316220819
I'm trying to understand the proposed behavior.
To go back to @ajantha-bhat's example: Suppose you have a partition `{A}`
with `record_count`=6 and `file_count`=2 (3 records in each file). Suppose you
now delete 3 records in one file. I understand that `pos_delete_file_count`
will be 1 and `pos_delete_record_count` will be 3. But what about
`record_count` and `file_count`? Will `file_count` be 3 (is it supposed to be
the total number of data files, including delete files)? And `record_count`?
When is it possible to correctly compute the `record_count` using metadata
alone (without applying delete files)?
Another example: Suppose you have two partitions `{A}` and `{B}`. Let's say
`record_count`=1000 and `file_count`=1 for partition `{B}`. Suppose you rename
`B` to `C` (using an `UPDATE <table> SET <partition column> = 'C' where
<partition column = 'B'` where we use merge-on-read, resulting in 1 delete file
and 1 new pure data file). If you do a `SELECT * FROM <table>.partitions`
currently, you will get an entry for each of `{A}`, `{B}` and `{C}`. What
should the behavior be (should there be an entry for `{B}` and if so, what
should be shown for it? and for `{C}`?)?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]