zwZjut opened a new issue #7361:
URL: https://github.com/apache/dolphinscheduler/issues/7361


   ### Search before asking
   
   - [X] I had searched in the 
[issues](https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue) and 
found no similar issues.
   
   
   ### What happened
   
   the process of MasterServer is still active, but  it should be stopped.
   now it need  manual restart.
   
   logs:
   [INFO] 2021-12-13 12:14:51.254 
org.apache.dolphinscheduler.server.master.MasterServer:[196] - master server is 
stopping ..., cause : i was judged to death, release resources and stop myself
   /opt/apache-dolphinscheduler-2.0.1-bin/logs # grep -A 10000 'master server 
is stopping' dolphinscheduler-master.log
   [INFO] 2021-12-13 12:14:51.254 
org.apache.dolphinscheduler.server.master.MasterServer:[196] - master server is 
stopping ..., cause : i was judged to death, release resources and stop myself
   [WARN] 2021-12-13 12:14:51.570 
org.apache.dolphinscheduler.server.master.dispatch.host.LowerWeightHostManager:[161]
 - worker honghuo-worker-1.honghuo-worker-headless:1234 current cpu load 
average 11.18 is too high or available memory 8.08G is too low
   [WARN] 2021-12-13 12:14:51.571 
org.apache.dolphinscheduler.server.master.dispatch.host.LowerWeightHostManager:[161]
 - worker honghuo-worker-0.honghuo-worker-headless:1234 current cpu load 
average 9.54 is too high or available memory 6.27G is too low
   [WARN] 2021-12-13 12:14:52.571 
org.apache.dolphinscheduler.server.master.dispatch.host.LowerWeightHostManager:[161]
 - worker honghuo-worker-1.honghuo-worker-headless:1234 current cpu load 
average 10.77 is too high or available memory 8.17G is too low
   [WARN] 2021-12-13 12:14:52.572 
org.apache.dolphinscheduler.server.master.dispatch.host.LowerWeightHostManager:[161]
 - worker honghuo-worker-0.honghuo-worker-headless:1234 current cpu load 
average 9.54 is too high or available memory 6.27G is too low
   [WARN] 2021-12-13 12:14:53.572 
org.apache.dolphinscheduler.server.master.dispatch.host.LowerWeightHostManager:[161]
 - worker honghuo-worker-1.honghuo-worker-headless:1234 current cpu load 
average 10.77 is too high or available memory 8.17G is too low
   [WARN] 2021-12-13 12:14:53.573 
org.apache.dolphinscheduler.server.master.dispatch.host.LowerWeightHostManager:[161]
 - worker honghuo-worker-0.honghuo-worker-headless:1234 current cpu load 
average 9.54 is too high or available memory 6.27G is too low
   [INFO] 2021-12-13 12:14:54.260 
org.apache.dolphinscheduler.remote.NettyRemotingClient:[390] - netty client 
closed
   [INFO] 2021-12-13 12:14:54.263 
org.apache.dolphinscheduler.server.master.runner.MasterSchedulerService:[159] - 
master schedule service stopped...
   [INFO] 2021-12-13 12:14:54.265 
org.apache.dolphinscheduler.remote.NettyRemotingServer:[243] - netty server 
closed
   [INFO] 2021-12-13 12:14:54.270 
org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient:[437] - 
master node : honghuo-master-1.honghuo-master-headless:5678 unRegistry to 
register center.
   [INFO] 2021-12-13 12:14:54.270 
org.apache.dolphinscheduler.server.master.registry.ServerNodeManager:[286] - 
master node : /nodes/master/honghuo-master-1.honghuo-master-headless:5678 down.
   [INFO] 2021-12-13 12:14:54.271 
org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient:[439] - 
heartbeat executor shutdown
   [INFO] 2021-12-13 12:14:54.274 
org.apache.curator.framework.imps.CuratorFrameworkImpl:[955] - 
backgroundOperationsLoop exiting
   [ERROR] 2021-12-13 12:14:54.274 
org.apache.dolphinscheduler.server.master.registry.ServerNodeManager:[307] - 
update master nodes error
   org.apache.dolphinscheduler.registry.api.RegistryException: zookeeper 
release lock error
        at 
org.apache.dolphinscheduler.plugin.registry.zookeeper.ZookeeperRegistry.acquireLock(ZookeeperRegistry.java:214)
        at 
org.apache.dolphinscheduler.service.registry.RegistryClient.getLock(RegistryClient.java:237)
        at 
org.apache.dolphinscheduler.server.master.registry.ServerNodeManager.updateMasterNodes(ServerNodeManager.java:302)
        at 
org.apache.dolphinscheduler.server.master.registry.ServerNodeManager.access$700(ServerNodeManager.java:67)
        at 
org.apache.dolphinscheduler.server.master.registry.ServerNodeManager$MasterDataListener.notify(ServerNodeManager.java:287)
        at 
org.apache.dolphinscheduler.plugin.registry.zookeeper.ZookeeperRegistry.lambda$subscribe$1(ZookeeperRegistry.java:127)
        at 
org.apache.curator.framework.recipes.cache.TreeCache$2.apply(TreeCache.java:760)
        at 
org.apache.curator.framework.recipes.cache.TreeCache$2.apply(TreeCache.java:754)
        at 
org.apache.curator.framework.listen.ListenerContainer$1.run(ListenerContainer.java:100)
        at 
org.apache.curator.shaded.com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30)
        at 
org.apache.curator.framework.listen.ListenerContainer.forEach(ListenerContainer.java:92)
        at 
org.apache.curator.framework.recipes.cache.TreeCache.callListeners(TreeCache.java:753)
        at 
org.apache.curator.framework.recipes.cache.TreeCache.access$1900(TreeCache.java:75)
        at 
org.apache.curator.framework.recipes.cache.TreeCache$4.run(TreeCache.java:865)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
   Caused by: java.io.IOException: Lost connection while trying to acquire 
lock: /lock/masters
        at 
org.apache.curator.framework.recipes.locks.InterProcessMutex.acquire(InterProcessMutex.java:91)
        at 
org.apache.dolphinscheduler.plugin.registry.zookeeper.ZookeeperRegistry.acquireLock(ZookeeperRegistry.java:203)
        ... 18 common frames omitted
   [ERROR] 2021-12-13 12:14:54.275 
org.apache.dolphinscheduler.server.master.registry.ServerNodeManager:[291] - 
MasterNodeListener capture data change and get data failed.
   java.lang.NullPointerException: null
        at 
org.apache.dolphinscheduler.plugin.registry.zookeeper.ZookeeperRegistry.releaseLock(ZookeeperRegistry.java:221)
        at 
org.apache.dolphinscheduler.service.registry.RegistryClient.releaseLock(RegistryClient.java:241)
        at 
org.apache.dolphinscheduler.server.master.registry.ServerNodeManager.updateMasterNodes(ServerNodeManager.java:309)
        at 
org.apache.dolphinscheduler.server.master.registry.ServerNodeManager.access$700(ServerNodeManager.java:67)
        at 
org.apache.dolphinscheduler.server.master.registry.ServerNodeManager$MasterDataListener.notify(ServerNodeManager.java:287)
        at 
org.apache.dolphinscheduler.plugin.registry.zookeeper.ZookeeperRegistry.lambda$subscribe$1(ZookeeperRegistry.java:127)
        at 
org.apache.curator.framework.recipes.cache.TreeCache$2.apply(TreeCache.java:760)
        at 
org.apache.curator.framework.recipes.cache.TreeCache$2.apply(TreeCache.java:754)
        at 
org.apache.curator.framework.listen.ListenerContainer$1.run(ListenerContainer.java:100)
        at 
org.apache.curator.shaded.com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30)
        at 
org.apache.curator.framework.listen.ListenerContainer.forEach(ListenerContainer.java:92)
        at 
org.apache.curator.framework.recipes.cache.TreeCache.callListeners(TreeCache.java:753)
        at 
org.apache.curator.framework.recipes.cache.TreeCache.access$1900(TreeCache.java:75)
        at 
org.apache.curator.framework.recipes.cache.TreeCache$4.run(TreeCache.java:865)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
   [INFO] 2021-12-13 12:14:54.279 org.apache.zookeeper.ZooKeeper:[693] - 
Session: 0x20001523448001d closed
   [INFO] 2021-12-13 12:14:54.280 org.apache.zookeeper.ClientCnxn:[522] - 
EventThread shut down for session: 0x20001523448001d
   [INFO] 2021-12-13 12:14:54.280 org.quartz.core.QuartzScheduler:[666] - 
Scheduler 
DolphinScheduler_$_honghuo-master-1.honghuo-master-headless.honghuo.svc.cluster.local1639368841996
 shutting down.
   [INFO] 2021-12-13 12:14:54.281 org.quartz.core.QuartzScheduler:[585] - 
Scheduler 
DolphinScheduler_$_honghuo-master-1.honghuo-master-headless.honghuo.svc.cluster.local1639368841996
 paused.
   [INFO] 2021-12-13 12:14:54.283 com.zaxxer.hikari.HikariDataSource:[350] - 
DolphinScheduler - Shutdown initiated...
   [INFO] 2021-12-13 12:14:54.294 com.zaxxer.hikari.HikariDataSource:[352] - 
DolphinScheduler - Shutdown completed.
   [INFO] 2021-12-13 12:14:54.296 org.quartz.core.QuartzScheduler:[740] - 
Scheduler 
DolphinScheduler_$_honghuo-master-1.honghuo-master-headless.honghuo.svc.cluster.local1639368841996
 shutdown complete.
   [INFO] 2021-12-13 12:14:54.296 
org.apache.dolphinscheduler.service.quartz.QuartzExecutors:[210] - Quartz 
service stopped, and halt all tasks
   [INFO] 2021-12-13 12:14:54.297 
org.apache.dolphinscheduler.server.master.MasterServer:[214] - Quartz service 
stopped
   [INFO] 2021-12-13 12:14:54.302 org.quartz.core.QuartzScheduler:[585] - 
Scheduler quartzScheduler_$_NON_CLUSTERED paused.
   [INFO] 2021-12-13 12:14:54.307 
org.apache.dolphinscheduler.remote.NettyRemotingClient:[390] - netty client 
closed
   [INFO] 2021-12-13 12:14:54.307 
org.apache.dolphinscheduler.service.log.LogClientService:[74] - logger client 
closed
   [INFO] 2021-12-13 12:14:54.307 
org.springframework.scheduling.quartz.SchedulerFactoryBean:[845] - Shutting 
down Quartz Scheduler
   [INFO] 2021-12-13 12:14:54.308 org.quartz.core.QuartzScheduler:[666] - 
Scheduler quartzScheduler_$_NON_CLUSTERED shutting down.
   [INFO] 2021-12-13 12:14:54.308 org.quartz.core.QuartzScheduler:[585] - 
Scheduler quartzScheduler_$_NON_CLUSTERED paused.
   [INFO] 2021-12-13 12:14:54.309 org.quartz.core.QuartzScheduler:[740] - 
Scheduler quartzScheduler_$_NON_CLUSTERED shutdown complete.
   [INFO] 2021-12-13 12:14:54.314 
org.apache.dolphinscheduler.server.master.processor.queue.TaskResponseService:[139]
 - StateEventResponseWorker stopped
   [WARN] 2021-12-13 12:14:54.314 
org.apache.dolphinscheduler.server.master.processor.queue.StateEventResponseService:[115]
 - persist task error
   java.lang.InterruptedException: null
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2048)
        at 
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
        at 
org.apache.dolphinscheduler.server.master.processor.queue.StateEventResponseService$StateEventResponseWorker.run(StateEventResponseService.java:112)
   [INFO] 2021-12-13 12:14:54.315 
org.apache.dolphinscheduler.server.master.processor.queue.StateEventResponseService:[120]
 - StateEventResponseWorker stopped
   
   ### What you expected to happen
   
   but  it should by stopped and process killed when  it judged to death
   
   ### How to reproduce
   
   kill master when it has processes to schedule
   
   ### Anything else
   
   _No response_
   
   ### Version
   
   dev
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to