kbendick commented on pull request #3784: URL: https://github.com/apache/iceberg/pull/3784#issuecomment-1058822726
cc @dongjoon-hyun This is a proposed PR for estimating file size with ORC files, to support rolling file writers using ORC in Iceberg. Right now, the feature is disabled entirely because of inability to estimate the file size for an open ORC file that’s still being written to. Adding this in would add alot of parity between ORC and Parquet from Iceberg. @openinx has summarized their thoughts and the current situation pretty well here: https://lists.apache.org/thread/g6yo7m46mr86ov1vkm9wnmshgw7hcl6b If you have time, could you or somebody from the ORC community provide any feedback for the better approach to estimating file size, so that ORC might have equivalent support to Parquet in this regard? L I was hoping you or somebody else from the ORC community might chime in, given @openinx’s summary of the situation (on the dev list here https://lists.apache.org/thread/g6yo7m46mr86ov1vkm9wnmshgw7hcl6b). Thanks in advance for any guidance you might be able to provide 🙂 Also cc @marton-bod and other ORC developers. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
