I was not advocating an avoidance of the issue. I was suggesting that it
isn't a stop-ship issue.



On Thu, Jan 23, 2020 at 11:05 AM Michael K. Edwards <m.k.edwa...@gmail.com>
wrote:

> While I agree that this is not a very production-like configuration, I
> think it's good to recognize that there are plenty of clusters out there
> where more than five zookeeper nodes are called for.  I run systems
> routinely with seven voting members plus three or more observers, for
> reasons having to do with system behavior during network split scenarios in
> AWS EC2.  Mac OS specific issues aside, it would be unfortunate if there
> were an artificial cap on the number of nodes in a machine-local test
> cluster, especially if it were related to an ICMP storm scenario.
>
> On Thu, Jan 23, 2020, 8:11 AM Ted Dunning <ted.dunn...@gmail.com> wrote:
>
> > I think that this is far outside the normal operation bounds and has an
> > easy work-around.
> >
> > First, it is very uncommon to run more than 5 ZK nodes. Running 23 on a
> > single host is bizarre (viewed from an operational lens).
> >
> > Second, there is a simple configuration change that makes the strange
> > configuration work anyway.
> >
> > A third point unrelated to operational considerations is that there is
> risk
> > in making a last minute changes to code. That risk is borne by normal
> > configurations as well as these unusual ones.
> >
> > In sum,
> >
> > - this might look like a P1 (system down) issue, but there is a
> workaround
> > so it is certainly no more than P2
> >
> > - even P2 is unwarranted because the is a non-production configuration
> >
> > - a P3 issue isn't a stop-ship issue.
> >
> >
> >
> > On Fri, Jan 17, 2020 at 5:17 AM Szalay-Bekő Máté <
> > szalay.beko.m...@gmail.com>
> > wrote:
> >
> > > TLDR:
> > > During testing RC for 3.6.0, we found that ZooKeeper cluster with large
> > > number of ensemble members (e.g. 23) can not start properly. This issue
> > > seems to happen only on mac, and a workaround is to disable the ICMP
> > > throttling. The question is if this workaround is enough for the RC, or
> > if
> > > we should change the code in ZooKeeper to limit the number of ICMP
> > > requests.
> > >
> > >
> > > The problem:
> > >
> > > On linux, I haven't been able to reproduce the problem. I tried with 5,
> > 9,
> > > 15 and 23 ensemble members and the quorum always seems to start
> properly
> > in
> > > a few seconds. (I used OpenJDK 1.8.232 on Ubuntu 18.04)
> > >
> > > On mac, the problem is consistently happening for large ensembles. The
> > > server is very slow to start and we see a lot of warnings in the log
> like
> > > these:
> > >
> > > 2020-01-15 20:02:13,431 [myid:13] - WARN
> > >  [ListenerHandler-phunt-MBP13.local/192.168.1.91:4193
> > :QuorumCnxManager@691
> > > ]
> > > - None of the addresses (/192.168.1.91:4190) are reachable for sid 10
> > > java.net.NoRouteToHostException: No valid address among [/
> > > 192.168.1.91:4190]
> > >
> > > 2020-01-17 11:02:26,177 [myid:4] - WARN
> > >  [Thread-2531:QuorumCnxManager$SendWorker@1269] - destination address
> /
> > > 127.0.0.1 not reachable anymore, shutting down the SendWorker for sid 6
> > >
> > > The exception is happening when the new MultiAddress feature tries to
> > > filter the unreachable hosts from the address list when trying to
> decide
> > > which election address to connect. This involves the calling of the
> > > InetAddress.isReachable method with a default timeout of 500ms, which
> > goes
> > > down to a native call in java and basically try to do a ping (an ICMP
> > echo
> > > request) to the host. Naturally, the localhost should be always
> > reachable.
> > > This call gets timeouted on mac if we have many ensemble members. I
> > tested
> > > with 9 members and the cluster started properly. With 11-13-15 members
> it
> > > took more and more time to get the cluster to start, and the
> > > "NoRouteToHostException" started to appear in the logs. After around 1
> > > minute the 15 ensemble members cluster started, but obviously this is
> way
> > > too long.
> > >
> > > On mac, we we have the ICMP rate limit set to 250 by default. You can
> > turn
> > > this off using the following command: sudo sysctl -w
> > > net.inet.icmp.icmplim=0
> > > (see https://krypted.com/mac-os-x/disable-icmp-rate-limiting-os-x/)
> > >
> > > Using the above command before starting the 23 ensemble members cluster
> > > locally seems to solve the problem for me. (can someone verify?) The
> > > question is if this workaround is enough or not.
> > >
> > > As far as I can tell, the current code will generate 2*A*(M-1) ICMP
> calls
> > > in each ZooKeeper server during startup, if 'X' is the number of
> ensemble
> > > members and 'A' is the number of election addresses provided for each
> > > member. This is not that high, if each ZooKeeper server is started on a
> > > different machine, but if we start a lot of ZooKeeper servers on a
> single
> > > machine, then it can quickly go beyond the predefined limit of 250 for
> > mac.
> > >
> > > OPTION 1: we keep the code as it is. we might change the documentation
> > for
> > > zkconf mentioning this mac specific issue and the way how to disable
> the
> > > ICMP rate limit.
> > >
> > > OPTION 2: we change the code not to filter the list of election
> addresses
> > > if the list has only a single element. This seems to be a logical way
> to
> > > decrease the ICMP requests. However, if we would run a large number of
> > > ZooKeeper servers on a single machine using multiple election addresses
> > for
> > > each server, we would get the same problem (most probably even quicker)
> > >
> > > OPTION 3: make the address filtering configurable and change zkconf to
> > > disable it by default. (but disabling will make the quorum potentially
> > > unable to recover during network failures, so it is not recommended
> > during
> > > production)
> > >
> > > OPTION 4: refactor the MultiAddress feature and remove the ICMP calls
> > from
> > > the ZooKeeper code. However, it is clearly helps for the quick recovery
> > > during network failures... at the moment I can't think any good
> solution
> > to
> > > avoid it.
> > >
> > > Kind regards,
> > > Mate
> > >
> >
>

Reply via email to