[GitHub] [iceberg] RussellSpitzer commented on issue #6424: The size estimation formula for spark task is incorrect

GitBox Wed, 14 Dec 2022 07:21:57 -0800


RussellSpitzer commented on issue #6424:
URL: https://github.com/apache/iceberg/issues/6424#issuecomment-1351621767


   The current code is slightly different than this, 
   
   
https://github.com/apache/iceberg/blob/33217abf7f88c6c22a8c43b320f9de48de998b94/api/src/main/java/org/apache/iceberg/ContentScanTask.java#L65-L70
   
   but i'm not sure I follow your math. 
   
   Why would Full Row Count have to be greater than the record count for a 
split? If we scan only part of a file it should only be a portion of the rows 
in the file? When we sum over all of our partial scans we should get the full 
amount (if the full file is read)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] RussellSpitzer commented on issue #6424: The size estimation formula for spark task is incorrect

Reply via email to