[ 
https://issues.apache.org/jira/browse/HUDI-5516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-5516:
-----------------------------
    Sprint: 0.13.0 Final Sprint 2, 0.13.0 Final Sprint 3  (was: 0.13.0 Final 
Sprint 2)

> Reduce memory footprint on workload with thousand active partitions
> -------------------------------------------------------------------
>
>                 Key: HUDI-5516
>                 URL: https://issues.apache.org/jira/browse/HUDI-5516
>             Project: Apache Hudi
>          Issue Type: Improvement
>          Components: flink
>            Reporter: Alexander Trushev
>            Assignee: Danny Chen
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.13.0
>
>
> We can reduce memory footprint on workload with thousand active partitions 
> between checkpoints. That workload is relevant with wide checkpoint interval. 
> More specifically, active partition here is a special case of active fileId.
> Write client holds map with write handles to create ReplaceHandle between 
> checkpoints. It leads to OutOfMemoryError on the workload because write 
> handle is huge object.
> {code:sql}
> create table source (
>     `id` int,
>     `data` string
> ) with (
>     'connector' = 'datagen',
>     'rows-per-second' = '100',
>     'fields.id.kind' = 'sequence',
>     'fields.id.start' = '0',
>     'fields.id.end' = '3000'
> );
> create table sink (
>     `id` int primary key,
>     `data` string,
>     `part` string
> ) partitioned by (`part`) with (
>     'connector' = 'hudi',
>     'path' = '/tmp/sink',
>     'write.batch.size' = '0.001',  -- 1024 bytes
>     'write.task.max.size' = '101.001',  -- 101.001MB
>     'write.merge.max_memory' = '1'  -- 1024 bytes
> );
> insert into sink select `id`, `data`, concat('part', cast(`id` as string)) as 
> `part` from source;
> {code} 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to