czy006 commented on PR #3937:
URL: https://github.com/apache/amoro/pull/3937#issuecomment-4182395523

   I think the safer fix here is to separate optimizer lifecycle management 
from local task recovery.
   
   At the moment, the follower path only syncs optimizer add/remove from DB, 
but it no longer performs the local queue recovery that used to happen in 
`OptimizerKeeper.processTask()`. As a result, follower-local `SCHEDULED/ACKED` 
tasks may stay stuck after the master deletes an optimizer from DB.
   
   My suggestion is:
   1. keep optimizer expiry / DB deletion as a leader-only responsibility
   2. extract the local task recovery logic (`collectTasks(...)` + 
`retryTask(...)`) into a separate helper
   3. invoke that helper not only in the leader path, but also in follower 
`onFollowerTick()` after syncing optimizers from DB
   
   Importantly, I would not limit the fix to “retry tasks for removed tokens 
only”, because the existing recovery predicate also handles:
   - stale token tasks
   - SCHEDULED tasks that exceeded ack timeout
   - ACKED tasks that exceeded execute timeout
   
   So the follower should periodically recover local tasks for all local queues 
based on the current `authOptimizers` snapshot, instead of only updating the 
optimizer map.
   
   I don't know how serious this problem is.The impact is on the resumption of 
operations under special circumstances.
   WDYT? @xxubai @wardlican 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to