William Lo created GOBBLIN-1918:
-----------------------------------
Summary: Optimize smart resizing for ORC Writer converter buffer
Key: GOBBLIN-1918
URL: https://issues.apache.org/jira/browse/GOBBLIN-1918
Project: Apache Gobblin
Issue Type: Improvement
Components: gobblin-core
Reporter: William Lo
Assignee: Abhishek Tiwari
The GobblinOrcWriter contains a converter and a buffer rowbatch. The buffer
holds the converted Avro -> Orc records before adding them to the native orc
writer.
Since it can contain multiple records, it constantly needs to resize the
columns of the rowbatch in order to hold multiple records. This problem affects
both performance and memory when resizing is done either too often (enlarge
factor is too low) or not often enough (enlarge factor is too high and thus the
buffer dominates the container memory).
Because there is a bounded number of records that can persist in the buffer
before getting flushed, we want to reduce the aggressiveness of the resizing
algorithm the more records that have been processed.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)