[ https://issues.apache.org/jira/browse/HUDI-2950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vinoth Chandar updated HUDI-2950: --------------------------------- Sprint: Hudi-Sprint-Jan-3 > Address high small objects churn in Bulk Insert/Layout Optimization > ------------------------------------------------------------------- > > Key: HUDI-2950 > URL: https://issues.apache.org/jira/browse/HUDI-2950 > Project: Apache Hudi > Issue Type: Task > Reporter: Alexey Kudinkin > Assignee: Alexey Kudinkin > Priority: Blocker > Labels: pull-request-available > Fix For: 0.11.0 > > > Based on findings in HUDI-2949, following needs to be addressed to reduce > pressure on GC, and improve performance: > * Remove unnecessary `ArrayList` resizing (during Hilbert Curve mapping) > * Avoid unnecessary boxing (during Hilbert Curve mapping) > * (In Parquet) Avoid allocating `ByteBuffer`s in `compareTo` method invoked > from `BinaryStatistics.updateStats` method (on every write to Parquet's > `ColumnWriterBase`) > * Avoid {{bytesToAvro}} / {{avroToBytes}} ser-de loop (due to use of > {{{}OverwriteWithLatestAvroPayload{}}}, to be replaced w/ > {{{}RewriteAvroPayload{}}}) > * Avoid re-allocating substrings (caching them) when fetching > {{Path.getName}} (from{{ }}{{HoodieWrapperFileSystem.getBytesWritten)}} > * Avoid allocating large deques by {{DefaultSizeEstimator.sizeEstimate}} > (currently allocates 16 x 1024 default internal `ArrayDeque`) {{ }}{{}} -- This message was sent by Atlassian Jira (v8.20.1#820001)