[ https://issues.apache.org/jira/browse/HUDI-5516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Raymond Xu updated HUDI-5516: ----------------------------- Sprint: 0.13.0 Final Sprint 2, 0.13.0 Final Sprint 3 (was: 0.13.0 Final Sprint 2) > Reduce memory footprint on workload with thousand active partitions > ------------------------------------------------------------------- > > Key: HUDI-5516 > URL: https://issues.apache.org/jira/browse/HUDI-5516 > Project: Apache Hudi > Issue Type: Improvement > Components: flink > Reporter: Alexander Trushev > Assignee: Danny Chen > Priority: Major > Labels: pull-request-available > Fix For: 0.13.0 > > > We can reduce memory footprint on workload with thousand active partitions > between checkpoints. That workload is relevant with wide checkpoint interval. > More specifically, active partition here is a special case of active fileId. > Write client holds map with write handles to create ReplaceHandle between > checkpoints. It leads to OutOfMemoryError on the workload because write > handle is huge object. > {code:sql} > create table source ( > `id` int, > `data` string > ) with ( > 'connector' = 'datagen', > 'rows-per-second' = '100', > 'fields.id.kind' = 'sequence', > 'fields.id.start' = '0', > 'fields.id.end' = '3000' > ); > create table sink ( > `id` int primary key, > `data` string, > `part` string > ) partitioned by (`part`) with ( > 'connector' = 'hudi', > 'path' = '/tmp/sink', > 'write.batch.size' = '0.001', -- 1024 bytes > 'write.task.max.size' = '101.001', -- 101.001MB > 'write.merge.max_memory' = '1' -- 1024 bytes > ); > insert into sink select `id`, `data`, concat('part', cast(`id` as string)) as > `part` from source; > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)