RussellSpitzer opened a new issue, #15622: URL: https://github.com/apache/iceberg/issues/15622
### Feature Request / Improvement The reason why we care about this is when we have parquet manifests we cannot re-use the immutable list returned by the "get" method from base file. That means we leak an object for every manifest. Not a huge deal but we should probably do something there. -- BaseFile stores split offsets internally as a long[], but splitOffsets() wraps it in a new List<Long> via ArrayUtil.toUnmodifiableLongList on every invocation. When file metadata is being read and rewritten (e.g., during manifest rewriting or format conversion), this means each entry needlessly allocates a list that is immediately consumed and discarded. Other fields like partitionData are stored and returned as-is. Split offsets could similarly cache or reuse the List<Long> representation, or callers within the core module could use the existing package-private splitOffsetArray() to pass the raw long[] through without conversion. ### Query engine None ### Willingness to contribute - [x] I can contribute this improvement/feature independently - [ ] I would be willing to contribute this improvement/feature with guidance from the Iceberg community - [ ] I cannot contribute this improvement/feature at this time -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
