I was not advocating an avoidance of the issue. I was suggesting that it isn't a stop-ship issue.
On Thu, Jan 23, 2020 at 11:05 AM Michael K. Edwards <[email protected]> wrote: > While I agree that this is not a very production-like configuration, I > think it's good to recognize that there are plenty of clusters out there > where more than five zookeeper nodes are called for. I run systems > routinely with seven voting members plus three or more observers, for > reasons having to do with system behavior during network split scenarios in > AWS EC2. Mac OS specific issues aside, it would be unfortunate if there > were an artificial cap on the number of nodes in a machine-local test > cluster, especially if it were related to an ICMP storm scenario. > > On Thu, Jan 23, 2020, 8:11 AM Ted Dunning <[email protected]> wrote: > > > I think that this is far outside the normal operation bounds and has an > > easy work-around. > > > > First, it is very uncommon to run more than 5 ZK nodes. Running 23 on a > > single host is bizarre (viewed from an operational lens). > > > > Second, there is a simple configuration change that makes the strange > > configuration work anyway. > > > > A third point unrelated to operational considerations is that there is > risk > > in making a last minute changes to code. That risk is borne by normal > > configurations as well as these unusual ones. > > > > In sum, > > > > - this might look like a P1 (system down) issue, but there is a > workaround > > so it is certainly no more than P2 > > > > - even P2 is unwarranted because the is a non-production configuration > > > > - a P3 issue isn't a stop-ship issue. > > > > > > > > On Fri, Jan 17, 2020 at 5:17 AM Szalay-Bekő Máté < > > [email protected]> > > wrote: > > > > > TLDR: > > > During testing RC for 3.6.0, we found that ZooKeeper cluster with large > > > number of ensemble members (e.g. 23) can not start properly. This issue > > > seems to happen only on mac, and a workaround is to disable the ICMP > > > throttling. The question is if this workaround is enough for the RC, or > > if > > > we should change the code in ZooKeeper to limit the number of ICMP > > > requests. > > > > > > > > > The problem: > > > > > > On linux, I haven't been able to reproduce the problem. I tried with 5, > > 9, > > > 15 and 23 ensemble members and the quorum always seems to start > properly > > in > > > a few seconds. (I used OpenJDK 1.8.232 on Ubuntu 18.04) > > > > > > On mac, the problem is consistently happening for large ensembles. The > > > server is very slow to start and we see a lot of warnings in the log > like > > > these: > > > > > > 2020-01-15 20:02:13,431 [myid:13] - WARN > > > [ListenerHandler-phunt-MBP13.local/192.168.1.91:4193 > > :QuorumCnxManager@691 > > > ] > > > - None of the addresses (/192.168.1.91:4190) are reachable for sid 10 > > > java.net.NoRouteToHostException: No valid address among [/ > > > 192.168.1.91:4190] > > > > > > 2020-01-17 11:02:26,177 [myid:4] - WARN > > > [Thread-2531:QuorumCnxManager$SendWorker@1269] - destination address > / > > > 127.0.0.1 not reachable anymore, shutting down the SendWorker for sid 6 > > > > > > The exception is happening when the new MultiAddress feature tries to > > > filter the unreachable hosts from the address list when trying to > decide > > > which election address to connect. This involves the calling of the > > > InetAddress.isReachable method with a default timeout of 500ms, which > > goes > > > down to a native call in java and basically try to do a ping (an ICMP > > echo > > > request) to the host. Naturally, the localhost should be always > > reachable. > > > This call gets timeouted on mac if we have many ensemble members. I > > tested > > > with 9 members and the cluster started properly. With 11-13-15 members > it > > > took more and more time to get the cluster to start, and the > > > "NoRouteToHostException" started to appear in the logs. After around 1 > > > minute the 15 ensemble members cluster started, but obviously this is > way > > > too long. > > > > > > On mac, we we have the ICMP rate limit set to 250 by default. You can > > turn > > > this off using the following command: sudo sysctl -w > > > net.inet.icmp.icmplim=0 > > > (see https://krypted.com/mac-os-x/disable-icmp-rate-limiting-os-x/) > > > > > > Using the above command before starting the 23 ensemble members cluster > > > locally seems to solve the problem for me. (can someone verify?) The > > > question is if this workaround is enough or not. > > > > > > As far as I can tell, the current code will generate 2*A*(M-1) ICMP > calls > > > in each ZooKeeper server during startup, if 'X' is the number of > ensemble > > > members and 'A' is the number of election addresses provided for each > > > member. This is not that high, if each ZooKeeper server is started on a > > > different machine, but if we start a lot of ZooKeeper servers on a > single > > > machine, then it can quickly go beyond the predefined limit of 250 for > > mac. > > > > > > OPTION 1: we keep the code as it is. we might change the documentation > > for > > > zkconf mentioning this mac specific issue and the way how to disable > the > > > ICMP rate limit. > > > > > > OPTION 2: we change the code not to filter the list of election > addresses > > > if the list has only a single element. This seems to be a logical way > to > > > decrease the ICMP requests. However, if we would run a large number of > > > ZooKeeper servers on a single machine using multiple election addresses > > for > > > each server, we would get the same problem (most probably even quicker) > > > > > > OPTION 3: make the address filtering configurable and change zkconf to > > > disable it by default. (but disabling will make the quorum potentially > > > unable to recover during network failures, so it is not recommended > > during > > > production) > > > > > > OPTION 4: refactor the MultiAddress feature and remove the ICMP calls > > from > > > the ZooKeeper code. However, it is clearly helps for the quick recovery > > > during network failures... at the moment I can't think any good > solution > > to > > > avoid it. > > > > > > Kind regards, > > > Mate > > > > > >
