RussellSpitzer opened a new issue, #15622:
URL: https://github.com/apache/iceberg/issues/15622

   ### Feature Request / Improvement
   
   The reason why we care about this is when we have parquet manifests we 
cannot re-use the immutable list returned by the "get" method from base file. 
That means we leak an object for every manifest. Not a huge deal but we should 
probably do something there.
   
   --
   
   BaseFile stores split offsets internally as a long[], but splitOffsets() 
wraps it in a new List<Long> via ArrayUtil.toUnmodifiableLongList on every 
invocation. When file metadata is being read and rewritten (e.g., during 
manifest rewriting or format conversion), this means each entry needlessly 
allocates a list that is immediately consumed and discarded.
   
   Other fields like partitionData are stored and returned as-is. Split offsets 
could similarly cache or reuse the List<Long> representation, or callers within 
the core module could use the existing package-private splitOffsetArray() to 
pass the raw long[] through without conversion.
   
   
   ### Query engine
   
   None
   
   ### Willingness to contribute
   
   - [x] I can contribute this improvement/feature independently
   - [ ] I would be willing to contribute this improvement/feature with 
guidance from the Iceberg community
   - [ ] I cannot contribute this improvement/feature at this time


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to