William Lo created GOBBLIN-1918:
-----------------------------------

             Summary: Optimize smart resizing for ORC Writer converter buffer
                 Key: GOBBLIN-1918
                 URL: https://issues.apache.org/jira/browse/GOBBLIN-1918
             Project: Apache Gobblin
          Issue Type: Improvement
          Components: gobblin-core
            Reporter: William Lo
            Assignee: Abhishek Tiwari


The GobblinOrcWriter contains a converter and a buffer rowbatch. The buffer 
holds the converted Avro -> Orc records before adding them to the native orc 
writer.

Since it can contain multiple records, it constantly needs to resize the 
columns of the rowbatch in order to hold multiple records. This problem affects 
both performance and memory when resizing is done either too often (enlarge 
factor is too low) or not often enough (enlarge factor is too high and thus the 
buffer dominates the container memory).

Because there is a bounded number of records that can persist in the buffer 
before getting flushed, we want to reduce the aggressiveness of the resizing 
algorithm the more records that have been processed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to