gaborkaszab commented on PR #15979: URL: https://github.com/apache/iceberg/pull/15979#issuecomment-4371995087
Thank you for sharing your view, @ajantha-bhat ! Let me sum up how I understand the situation so far: - Calculating total_record_count as a derived value on the write path probably does't add much value, because that a) can't work on all use-cases (eq-deletes, v2 pos-deletes) and b) could easily be calculated by the user engines if the partition doesn't have or has only v3 DVs. - Calculating total_record_count for all use cases requires a full data scan applying the deletes. Engines might be able to do this when performing a full scan for other purposes, but this is very speculative, I don't think any engine would do this. I'd lean towards dropping it from the spec then, because I don't see any motivation to write it TBH. Keeping it there so that some engine in the future might write it doesn't make sense to me. WDYT @ajantha-bhat, @pvary ? Additionally, we could consider if it's still valuable to calculate it as a derived value on the read path, though. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
