zwZjut opened a new issue #7361: URL: https://github.com/apache/dolphinscheduler/issues/7361
### Search before asking - [X] I had searched in the [issues](https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue) and found no similar issues. ### What happened the process of MasterServer is still active, but it should be stopped. now it need manual restart. logs: [INFO] 2021-12-13 12:14:51.254 org.apache.dolphinscheduler.server.master.MasterServer:[196] - master server is stopping ..., cause : i was judged to death, release resources and stop myself /opt/apache-dolphinscheduler-2.0.1-bin/logs # grep -A 10000 'master server is stopping' dolphinscheduler-master.log [INFO] 2021-12-13 12:14:51.254 org.apache.dolphinscheduler.server.master.MasterServer:[196] - master server is stopping ..., cause : i was judged to death, release resources and stop myself [WARN] 2021-12-13 12:14:51.570 org.apache.dolphinscheduler.server.master.dispatch.host.LowerWeightHostManager:[161] - worker honghuo-worker-1.honghuo-worker-headless:1234 current cpu load average 11.18 is too high or available memory 8.08G is too low [WARN] 2021-12-13 12:14:51.571 org.apache.dolphinscheduler.server.master.dispatch.host.LowerWeightHostManager:[161] - worker honghuo-worker-0.honghuo-worker-headless:1234 current cpu load average 9.54 is too high or available memory 6.27G is too low [WARN] 2021-12-13 12:14:52.571 org.apache.dolphinscheduler.server.master.dispatch.host.LowerWeightHostManager:[161] - worker honghuo-worker-1.honghuo-worker-headless:1234 current cpu load average 10.77 is too high or available memory 8.17G is too low [WARN] 2021-12-13 12:14:52.572 org.apache.dolphinscheduler.server.master.dispatch.host.LowerWeightHostManager:[161] - worker honghuo-worker-0.honghuo-worker-headless:1234 current cpu load average 9.54 is too high or available memory 6.27G is too low [WARN] 2021-12-13 12:14:53.572 org.apache.dolphinscheduler.server.master.dispatch.host.LowerWeightHostManager:[161] - worker honghuo-worker-1.honghuo-worker-headless:1234 current cpu load average 10.77 is too high or available memory 8.17G is too low [WARN] 2021-12-13 12:14:53.573 org.apache.dolphinscheduler.server.master.dispatch.host.LowerWeightHostManager:[161] - worker honghuo-worker-0.honghuo-worker-headless:1234 current cpu load average 9.54 is too high or available memory 6.27G is too low [INFO] 2021-12-13 12:14:54.260 org.apache.dolphinscheduler.remote.NettyRemotingClient:[390] - netty client closed [INFO] 2021-12-13 12:14:54.263 org.apache.dolphinscheduler.server.master.runner.MasterSchedulerService:[159] - master schedule service stopped... [INFO] 2021-12-13 12:14:54.265 org.apache.dolphinscheduler.remote.NettyRemotingServer:[243] - netty server closed [INFO] 2021-12-13 12:14:54.270 org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient:[437] - master node : honghuo-master-1.honghuo-master-headless:5678 unRegistry to register center. [INFO] 2021-12-13 12:14:54.270 org.apache.dolphinscheduler.server.master.registry.ServerNodeManager:[286] - master node : /nodes/master/honghuo-master-1.honghuo-master-headless:5678 down. [INFO] 2021-12-13 12:14:54.271 org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient:[439] - heartbeat executor shutdown [INFO] 2021-12-13 12:14:54.274 org.apache.curator.framework.imps.CuratorFrameworkImpl:[955] - backgroundOperationsLoop exiting [ERROR] 2021-12-13 12:14:54.274 org.apache.dolphinscheduler.server.master.registry.ServerNodeManager:[307] - update master nodes error org.apache.dolphinscheduler.registry.api.RegistryException: zookeeper release lock error at org.apache.dolphinscheduler.plugin.registry.zookeeper.ZookeeperRegistry.acquireLock(ZookeeperRegistry.java:214) at org.apache.dolphinscheduler.service.registry.RegistryClient.getLock(RegistryClient.java:237) at org.apache.dolphinscheduler.server.master.registry.ServerNodeManager.updateMasterNodes(ServerNodeManager.java:302) at org.apache.dolphinscheduler.server.master.registry.ServerNodeManager.access$700(ServerNodeManager.java:67) at org.apache.dolphinscheduler.server.master.registry.ServerNodeManager$MasterDataListener.notify(ServerNodeManager.java:287) at org.apache.dolphinscheduler.plugin.registry.zookeeper.ZookeeperRegistry.lambda$subscribe$1(ZookeeperRegistry.java:127) at org.apache.curator.framework.recipes.cache.TreeCache$2.apply(TreeCache.java:760) at org.apache.curator.framework.recipes.cache.TreeCache$2.apply(TreeCache.java:754) at org.apache.curator.framework.listen.ListenerContainer$1.run(ListenerContainer.java:100) at org.apache.curator.shaded.com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30) at org.apache.curator.framework.listen.ListenerContainer.forEach(ListenerContainer.java:92) at org.apache.curator.framework.recipes.cache.TreeCache.callListeners(TreeCache.java:753) at org.apache.curator.framework.recipes.cache.TreeCache.access$1900(TreeCache.java:75) at org.apache.curator.framework.recipes.cache.TreeCache$4.run(TreeCache.java:865) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.io.IOException: Lost connection while trying to acquire lock: /lock/masters at org.apache.curator.framework.recipes.locks.InterProcessMutex.acquire(InterProcessMutex.java:91) at org.apache.dolphinscheduler.plugin.registry.zookeeper.ZookeeperRegistry.acquireLock(ZookeeperRegistry.java:203) ... 18 common frames omitted [ERROR] 2021-12-13 12:14:54.275 org.apache.dolphinscheduler.server.master.registry.ServerNodeManager:[291] - MasterNodeListener capture data change and get data failed. java.lang.NullPointerException: null at org.apache.dolphinscheduler.plugin.registry.zookeeper.ZookeeperRegistry.releaseLock(ZookeeperRegistry.java:221) at org.apache.dolphinscheduler.service.registry.RegistryClient.releaseLock(RegistryClient.java:241) at org.apache.dolphinscheduler.server.master.registry.ServerNodeManager.updateMasterNodes(ServerNodeManager.java:309) at org.apache.dolphinscheduler.server.master.registry.ServerNodeManager.access$700(ServerNodeManager.java:67) at org.apache.dolphinscheduler.server.master.registry.ServerNodeManager$MasterDataListener.notify(ServerNodeManager.java:287) at org.apache.dolphinscheduler.plugin.registry.zookeeper.ZookeeperRegistry.lambda$subscribe$1(ZookeeperRegistry.java:127) at org.apache.curator.framework.recipes.cache.TreeCache$2.apply(TreeCache.java:760) at org.apache.curator.framework.recipes.cache.TreeCache$2.apply(TreeCache.java:754) at org.apache.curator.framework.listen.ListenerContainer$1.run(ListenerContainer.java:100) at org.apache.curator.shaded.com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30) at org.apache.curator.framework.listen.ListenerContainer.forEach(ListenerContainer.java:92) at org.apache.curator.framework.recipes.cache.TreeCache.callListeners(TreeCache.java:753) at org.apache.curator.framework.recipes.cache.TreeCache.access$1900(TreeCache.java:75) at org.apache.curator.framework.recipes.cache.TreeCache$4.run(TreeCache.java:865) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) [INFO] 2021-12-13 12:14:54.279 org.apache.zookeeper.ZooKeeper:[693] - Session: 0x20001523448001d closed [INFO] 2021-12-13 12:14:54.280 org.apache.zookeeper.ClientCnxn:[522] - EventThread shut down for session: 0x20001523448001d [INFO] 2021-12-13 12:14:54.280 org.quartz.core.QuartzScheduler:[666] - Scheduler DolphinScheduler_$_honghuo-master-1.honghuo-master-headless.honghuo.svc.cluster.local1639368841996 shutting down. [INFO] 2021-12-13 12:14:54.281 org.quartz.core.QuartzScheduler:[585] - Scheduler DolphinScheduler_$_honghuo-master-1.honghuo-master-headless.honghuo.svc.cluster.local1639368841996 paused. [INFO] 2021-12-13 12:14:54.283 com.zaxxer.hikari.HikariDataSource:[350] - DolphinScheduler - Shutdown initiated... [INFO] 2021-12-13 12:14:54.294 com.zaxxer.hikari.HikariDataSource:[352] - DolphinScheduler - Shutdown completed. [INFO] 2021-12-13 12:14:54.296 org.quartz.core.QuartzScheduler:[740] - Scheduler DolphinScheduler_$_honghuo-master-1.honghuo-master-headless.honghuo.svc.cluster.local1639368841996 shutdown complete. [INFO] 2021-12-13 12:14:54.296 org.apache.dolphinscheduler.service.quartz.QuartzExecutors:[210] - Quartz service stopped, and halt all tasks [INFO] 2021-12-13 12:14:54.297 org.apache.dolphinscheduler.server.master.MasterServer:[214] - Quartz service stopped [INFO] 2021-12-13 12:14:54.302 org.quartz.core.QuartzScheduler:[585] - Scheduler quartzScheduler_$_NON_CLUSTERED paused. [INFO] 2021-12-13 12:14:54.307 org.apache.dolphinscheduler.remote.NettyRemotingClient:[390] - netty client closed [INFO] 2021-12-13 12:14:54.307 org.apache.dolphinscheduler.service.log.LogClientService:[74] - logger client closed [INFO] 2021-12-13 12:14:54.307 org.springframework.scheduling.quartz.SchedulerFactoryBean:[845] - Shutting down Quartz Scheduler [INFO] 2021-12-13 12:14:54.308 org.quartz.core.QuartzScheduler:[666] - Scheduler quartzScheduler_$_NON_CLUSTERED shutting down. [INFO] 2021-12-13 12:14:54.308 org.quartz.core.QuartzScheduler:[585] - Scheduler quartzScheduler_$_NON_CLUSTERED paused. [INFO] 2021-12-13 12:14:54.309 org.quartz.core.QuartzScheduler:[740] - Scheduler quartzScheduler_$_NON_CLUSTERED shutdown complete. [INFO] 2021-12-13 12:14:54.314 org.apache.dolphinscheduler.server.master.processor.queue.TaskResponseService:[139] - StateEventResponseWorker stopped [WARN] 2021-12-13 12:14:54.314 org.apache.dolphinscheduler.server.master.processor.queue.StateEventResponseService:[115] - persist task error java.lang.InterruptedException: null at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2048) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at org.apache.dolphinscheduler.server.master.processor.queue.StateEventResponseService$StateEventResponseWorker.run(StateEventResponseService.java:112) [INFO] 2021-12-13 12:14:54.315 org.apache.dolphinscheduler.server.master.processor.queue.StateEventResponseService:[120] - StateEventResponseWorker stopped ### What you expected to happen but it should by stopped and process killed when it judged to death ### How to reproduce kill master when it has processes to schedule ### Anything else _No response_ ### Version dev ### Are you willing to submit PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
