Rongtong Jin created COMDEV-513:
-----------------------------------
Summary: RocketMQ TieredStore Integration with High Availability
Architecture
Key: COMDEV-513
URL: https://issues.apache.org/jira/browse/COMDEV-513
Project: Community Development
Issue Type: Task
Components: Comdev, GSoC/Mentoring ideas
Reporter: Rongtong Jin
{*}Apache RocketMQ{*}{*}{*}
Apache RocketMQ is a distributed messaging and streaming platform with low
latency, high performance and reliability, trillion-level capacity and flexible
scalability.
Page: [https://rocketmq.apache.org|https://rocketmq.apache.org/]
*Background*
With the official release of RocketMQ 5.1.0, tiered storage has arrived as a
new independent module in the Technical Preview milestone. This allows users to
unload messages from local disks to other cheaper storage, extending message
retention time at a lower cost.
Reference RIP-57:
[https://github.com/apache/rocketmq/wiki/RIP-57-Tiered-storage-for-RocketMQ]
In addition, RocketMQ introduced a new high availability architecture in
version 5.0.
Reference RIP-44:
[https://github.com/apache/rocketmq/wiki/RIP-44-Support-DLedger-Controller]
However, currently RocketMQ tiered storage only supports single replicas.
*Task*
Currently, tiered storage only supports single replicas, and there are still
the following issues in the integration with the high availability architecture:
* Metadata synchronization: how to reliably synchronize metadata between
master and slave nodes.
* Disallowing message uploads beyond the confirm offset: to avoid message
rollback, the maximum uploaded offset cannot exceed the confirm offset.
* Starting multi-tier storage upload when the slave changes to master, and
stopping tiered storage upload when the master becomes the slave: only the
master node has write and delete permissions, and after the slave node is
promoted, it needs to quickly resume tiered storage breakpoint resumption.
* Design of slave pull protocol: how a newly launched empty slave can properly
synchronize data through the tiered storage architecture. (If synchronization
is performed based on the first or last file, resumption of breakpoints may not
be possible when switching again).
So you need to provide a complete plan to solve the above issues and ultimately
complete the integration of tiered storage and high availability architecture,
while verifying it through the existing tiered storage file version and
OpenChaos testing.
*Relevant Skills*
* Interest in messaging middleware and distributed storage systems
* Java development skills
* Having a good understanding of RocketMQ tiered storage and high availability
architecture
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]