lyl2008dsg commented on issue #403: URL: https://github.com/apache/shardingsphere-elasticjob/issues/403#issuecomment-2009529318
最近在生产环境中,我们遭遇了一个问题,经过调查发现是由于ZooKeeper(ZK)发生故障引起的。在这次故障中,所有节点尝试连接ZK时均超时,这直接导致了计划中的任务未能按时触发。幸运的是,ZK在2分钟后自动恢复了正常,但遗憾的是,期间错过的任务并未得到补偿执行。 ```log [ERROR][2024-03-14T00:39:56.834+0800][ConnectionState.java:228] …… _msg=Connection timed out for connection string (ip:port) and timeout (1000) / elapsed (1000) ``` 至于为什么没有被ReconcileService reshard,怀疑是因为 hasShardingInfoInOfflineServers = true 所以没有被修复; 短期解决办法是感知到应触发而未触发的,补偿调度; 期待大佬帮忙跟进更合适的应对办法。 respect~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
