[ https://issues.apache.org/jira/browse/STORM-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jungtaek Lim updated STORM-2901: -------------------------------- Fix Version/s: 1.2.0 > Reuse ZK connection for getKeySequenceNumber > -------------------------------------------- > > Key: STORM-2901 > URL: https://issues.apache.org/jira/browse/STORM-2901 > Project: Apache Storm > Issue Type: Improvement > Components: storm-server > Affects Versions: 2.0.0, 1.2.0 > Reporter: Yuzhao Chen > Assignee: Yuzhao Chen > Priority: Major > Labels: patch, pull-request-available > Fix For: 2.0.0, 1.2.0 > > Time Spent: 5h > Remaining Estimate: 0h > > Now when our nimbus restarts, many zookeeper connections will be made in > minutes, and it's really a pressure for our zookeeper server. > I checkout the log and code to find that when nimbus restart, in order to > sync local storm keys[ actually valid storms ], it will: > # check storm keys diff of local storm and zk remote. > # set up path for all the valid storm keys with a keySequenceNumber. > # In order to get the keySequenceNumber, now blobstore will make a new > zk-client and connect to zk-server. > This is the reason why thousands of connections are made. For our cluster, > there are about 800+ topologies running, which means that at least 800 > connections will be made which totally can be reused. > > This is part of nimbus re-starting log: > 2018-01-18 12:51:57.031 o.a.s.s.o.a.c.f.i.CuratorFrameworkImpl [INFO] Starting > 2018-01-18 12:51:57.032 o.a.s.s.o.a.z.ZooKeeper [INFO] Initiating client > connection, > connectString=dx-data-rt-zk01:2181,dx-data-rt-zk02:2181,dx-data-rt-zk04:2181/mtstorm_101_dx_storm01 > sessionTimeout=30000 > watcher=org.apache.storm.shade.org.apache.curator.ConnectionState@76513a57 > 2018-01-18 12:51:57.032 o.a.s.s.o.a.z.ClientCnxn [INFO] Opening socket > connection to server dx-data-rt-zk04.dx.sankuai.com/10.32.157.254:2181. Will > not attempt to authenticate using SASL (unknown error) > 2018-01-18 12:51:57.033 o.a.s.s.o.a.z.ClientCnxn [INFO] Socket connection > established to dx-data-rt-zk04.dx.sankuai.com/10.32.157.254:2181, initiating > session > 2018-01-18 12:51:57.034 o.a.s.s.o.a.z.ClientCnxn [INFO] Session > establishment complete on server > dx-data-rt-zk04.dx.sankuai.com/10.32.157.254:2181, sessionid = > 0x45cd92f0cc7e938, negotiated timeout = 30000 > 2018-01-18 12:51:57.034 o.a.s.s.o.a.c.f.s.ConnectionStateManager [INFO] > State change: CONNECTED > 2018-01-18 12:51:57.037 o.a.s.s.o.a.c.f.i.CuratorFrameworkImpl [INFO] > backgroundOperationsLoop exiting > 2018-01-18 12:51:57.039 o.a.s.s.o.a.z.ZooKeeper [INFO] Session: > 0x45cd92f0cc7e938 closed > 2018-01-18 12:51:57.039 o.a.s.s.o.a.z.ClientCnxn [INFO] EventThread shut down > 2018-01-18 12:51:57.040 o.a.s.cluster [INFO] > setup-path/blobstore/app_waimairank_wm_recsys_user_block-4-1504509174-stormconf.ser/dx-data-rt-nimbus05.dx.sankuai.com:9827-1 > 2018-01-18 12:51:57.051 o.a.s.s.o.a.c.f.i.CuratorFrameworkImpl [INFO] > Starting > 2018-01-18 12:51:57.051 o.a.s.s.o.a.z.ZooKeeper [INFO] Initiating client > connection, > connectString=dx-data-rt-zk01:2181,dx-data-rt-zk02:2181,dx-data-rt-zk04:2181/mtstorm_101_dx_storm01 > sessionTimeout=30000 > watcher=org.apache.storm.shade.org.apache.curator.ConnectionState@69c222d6 > 2018-01-18 12:51:57.052 o.a.s.s.o.a.z.ClientCnxn [INFO] Opening socket > connection to server dx-data-rt-zk02.dx.sankuai.com/10.32.108.46:2181. Will > not attempt to authenticate using SASL (unknown error) > 2018-01-18 12:51:57.053 o.a.s.s.o.a.z.ClientCnxn [INFO] Socket connection > established to dx-data-rt-zk02.dx.sankuai.com/10.32.108.46:2181, initiating > session > 2018-01-18 12:51:57.055 o.a.s.s.o.a.z.ClientCnxn [INFO] Session > establishment complete on server > dx-data-rt-zk02.dx.sankuai.com/10.32.108.46:2181, sessionid = > 0x25cd386f245eb72, negotiated timeout = 30000 > 2018-01-18 12:51:57.055 o.a.s.s.o.a.c.f.s.ConnectionStateManager [INFO] > State change: CONNECTED > 2018-01-18 12:51:57.058 o.a.s.s.o.a.c.f.i.CuratorFrameworkImpl [INFO] > backgroundOperationsLoop exiting > 2018-01-18 12:51:57.061 o.a.s.s.o.a.z.ZooKeeper [INFO] Session: > 0x25cd386f245eb72 closed > 2018-01-18 12:51:57.061 o.a.s.s.o.a.z.ClientCnxn [INFO] EventThread shut down > 2018-01-18 12:51:57.061 o.a.s.cluster [INFO] > setup-path/blobstore/app_waimairank_waimai_rank_rt_pipeline_user_feature-12-1507516853-stormconf.ser/dx-data-rt-nimbus05.dx.sankuai.com:9827-1 -- This message was sent by Atlassian JIRA (v7.6.3#76005)