Heesung, thanks. That's a good point for multi steps works. But I'm afraid it will increase the zk pressure and it also needs to handle some corner cases. I prefer to use a system topic to handle it.
On 2023/01/31 19:05:34 Heesung Sohn wrote: > On Tue, Jan 31, 2023 at 6:43 AM Yan Zhao <horizo...@apache.org> wrote: > > > > - Have we considered a metadata store to persist and dedup deletion > > > requests instead of the system topic? Why is the system topic the better > > > choice than a metadata store for this problem? > > If we use the metadata store to store the middle step ledger, we need to > > operate the metadata store after deletion every time. > > > > > > > And we need a trigger to trigger deletion. In the broker, it may have lots > > of topics, the ledger deletion is also much. Using the metadata store to > > store it may be a bottleneck. > > Using pub/sub is easy to implement, and it is a good trigger to trigger > > deletion. > > > > > We can group the multiple resource deletions to a single record in the > metadata store. Also, we can use the metadata store watcher to trigger the > deletion. > > I can see that a similar transactional operation(using metadata store) can > be done like the following. > > Alternatively, > 1. A broker receives a resource(ledger) deletion request from a client. > 2. If the target resource is available, the broker persists a transaction > lock(/transactions/broker-id/delete_ledger/ledger_id) into a metadata > store(state:pending, createdAt:now). > 2.1 If there is no target resource, error > out(ResourceDoesNotExistException). > 2.2 If the lock already exists, error out(OperationInProgressExeception). > 3. The broker returns success to the client. > 4. The transaction watcher(metadata store listener) on the same broker-id > is notified. > 5. The transaction watcher runs the deletion process with an x min timeout. > 5.1 The transaction watcher updates the lock state (state: running, > startedAt: now) > 5.2 Run step 1 ... n (periodically update the lock state and > updatedAt:now every x secs) > 5.3 Delete the lock. > 6. The orphan transaction monitor runs any orphan jobs by retrying step 5. > (If the watcher fails in the middle at step 5, the lock state will be > orphan(state:running and startedAt : > x min)) > 7. The leader monitor(on the leader broker) manages orphan jobs if brokers > are gone or unavailable. > > We can have multiple types of transaction locks(or generic lock) depending > on the operations types. This will reduce the number of locks to > create/update if there are multiple target resources to operate on for a > single transaction. > > - Single ledger deletion: /transactions/broker-id/delete_ledger/ledger_id > - Mult-ledger deletion: /transactions/broker-id/delete_ledgers/ledgers : > {ledger_ids[a,b,c,d], last_deleted_ledger_index:3} > //last_deleted_ledger_index could be periodically updated every min. This > can help to resume the deletion when retrying. > - Topic deletion : /transactions/broker-id/delete_topic/topic_name > > > > > > - How does Pulsar deduplicate deletion requests(error out to users) while > > > the deletion request is running? > > The user only can invoke `truncateTopic`, it's not for a particular > > ledger. The note: "The truncate operation will move all cursors to the end > > of the topic and delete all inactive ledgers." > > It's just a trigger for the user. > > > > What if the admin concurrently requests `truncateTopic` many times for the > same topic while one truncation job is running? How does Pulsar currently > deduplicate these requests? And how does this proposal handle this > situation? > > > > > > > - How do users track async deletion flow status? (do we expose any > > > describeDeletion API to show the deletion status?) > > Why need to track the async deletion flow status? The ledger deletion is > > transparent for pulsarClient. In the broker, deleting a ledger will print > > the log `delete ledger xx successfully `. > > If delete failed, it print the log `delete ledger xxx failed.` > > > > IMHO, relying on logs to check the system state is not a good practice. > Generally, every async user/admin API(long-running async workflow API) > needs the corresponding describe* API to return the current running state. > > > Regards, > Heesung >