Re: [DISCUSS] FLIP-430: Remote Compaction For Disaggregated State

Han Yin Thu, 04 Sep 2025 21:50:20 -0700

Hi Zakelly,

Thanks for your feedback! I address the comments in the following:


1. Good point! I’ve updated the REST API in the FLIP to make the URL more 
specific. 
2. The deployment details are omitted in the FLIP to maintain focus on the core 
interaction between Flink subtasks and the compaction service (rather than the 
service implementation details). The proposed design follows a distributed 
deployment model with components communicating through RPC. The dispatcher and 
workers can run in different processes or containers. Therefore the service can 
be independently scalable from the Flink jobs. The deployment can be quite 
flexible as the workers can be placed within or separated from the Flink 
cluster (or even colocated with DFS data nodes). The service configs are also 
customizable, e.g. tailored scheduling policies or specialized IO settings for 
compaction workloads.

Best regards,
Han Yin

> 2025年8月28日 19:28，Zakelly Lan <[email protected]> 写道：
> 
> Hi Han,
> 
> Thanks for your proposal! Remote compaction decouples compaction from
> computation, which is another great step toward cloud-native architecture.
> I have a few questions:
> 
> 1. I’d suggest including 'forst' in the URL used to update the remote
> compaction service endpoint, since this functionality is specific to
> ForStStateBackend.
> 2. What is the deployment model for the compaction service components
> (e.g., dispatcher and workers)? Do they run in the same process or
> container? How could we customize the setup of that service?
> 
> Best,
> Zakelly
> 
> On Thu, Aug 28, 2025 at 12:25 PM Han Yin <[email protected]> wrote:
> 
>> Hi everyone,
>> 
>> I would like to open a discussion on introducing remote compaction for
>> disaggregated state[1].
>> 
>> Flink state backends rely on LSM-Trees for large-scale storage, with file
>> compaction executed locally in TaskManager background threads. This
>> co-location creates local resource contention, causing latency spikes and
>> resource instability.
>> 
>> Flink 2.0 introduces disaggregated state management through the ForSt
>> StateBackend[2], employing a shared DFS as primary storage. This allows
>> ForSt to implement compaction-as-a-service (Remote Compaction) through
>> dedicated compaction workers.
>> 
>> This approach can clearly separate the responsibilities between computing
>> and storage nodes, therefore further complement Flink's disaggregated
>> architecture. Introducing a compaction service aligns with the pooling
>> concept prevalent in the cloud-native era, and can significantly improve
>> the resource efficiency and elasticity of Flink stateful jobs.
>> 
>> Looking forward to your comments or feedback. Best regards,
>> Han Yin
>> 
>> [1]
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-430%3A+Remote+Compaction+For+Disaggregated+State
>> [2] https://cwiki.apache.org/confluence/x/R4p3EQ
>> 
>>

Re: [DISCUSS] FLIP-430: Remote Compaction For Disaggregated State

Reply via email to