We don't shuffle IPs after the initial resolution of IP addresses. In DNS RR, you resolve to a list of IPs, shuffle these, and then we round robin through them trying to connect. If you re-resolve on every round-robin, you have to put in logic to know which ones have changed and somehow maintain that shuffle order or you aren't doing a fair back end round robin, which people using the ZK client against DNS RR are relying on today.
If you just have machine names in a list that you pass in, then yes, we could re-resolve on every reconnect and you could just re-alias that name to a new IP. But you'll have to put in logic that will do that but not break people using DNS RR. I realize that moving machines is difficult when you have lots of clients. I'm a bit surprised your admins can't maintain machine IP addresses on a machine move given a cluster of that complexity, though. I also think that if we're going to be putting special cases like this in we might just want to go all the way to a pluggable reconnection scheme, but maybe that is too aggressive. C On Mon, Jan 9, 2012 at 1:51 PM, Neha Narkhede <neha.narkh...@gmail.com>wrote: > Maybe I didn't express myself clearly. When I said DNS RR, I meant its > simplest implementation which resolves a hostname to multiple IPs. > > Whatever method you use to map host names to IPs, the problem is that > the zookeeper client code will always cache the IPs. So to be able to > swap out a machine, all clients would have to be restarted, which if > you have 100s of clients, is a major pain. If you want to move the > entire cluster to new machines, this becomes even harder. > > I don't see why re-resolving host names to IPs in the reconnect logic > is a problem for zookeeper, since you shuffle the list of IPs anyways. > > Thanks, > Neha > > > On Mon, Jan 9, 2012 at 10:31 AM, Camille Fournier <cami...@apache.org> > wrote: > > You can't sensibly round robin within the client code if you re-resolve > on > > every reconnect, if you're using dns rr. If that's your goal you'd want a > > list of dns alias names and re-resolve each hostname when you hit it on > > reconnect. But that will break people using dns rr. > > You can look into writing a pluggable reconnect logic into the zk client, > > that's what would be required to do this but at the end of the day you'll > > have to give your users special clients to make that work. > > > > C > > On Jan 9, 2012 1:16 PM, "Neha Narkhede" <neha.narkh...@gmail.com> > wrote: > > > >> I was reading through the client code and saw that zookeeper client > >> caches the server IPs during startup and maintains it for the rest of > >> its lifetime. If we go with the DNS RR approach or a load balancer > >> approach, and later swap out a server with a new one ( with a new IP > >> ), all clients would have to be restarted to be able to "forget" the > >> old IP and see the new one. That doesn't look like a clean approach to > >> such upgrades. One way of getting around this problem, is adding the > >> resolution of host names to IPs in the "reconnect" logic in addition > >> to the constructor. So when such upgrades happen and the client > >> reconnects, it will see the new list of IPs, and wouldn't require to > >> be restarted. > >> > >> Does this approach sound good or am I missing something here ? > >> > >> Thanks, > >> Neha > >> > >> On Wed, Dec 21, 2011 at 7:21 PM, Camille Fournier <cami...@apache.org> > >> wrote: > >> > DNS RR is good. I had good experiences using that for my client > >> > configs for exactly the reasons you are listing. > >> > > >> > On Wed, Dec 21, 2011 at 8:43 PM, Neha Narkhede < > neha.narkh...@gmail.com> > >> wrote: > >> >> Thanks for the responses! > >> >> > >> >>>> How are your clients configured to find the zks now? > >> >> > >> >> Our clients currently use the list of hostnames and ports that > >> >> comprise the zookeeper cluster. For example, > >> >> zoo1:port1,zoo2:port2,zoo3:port3 > >> >> > >> >>>> > - switch DNS, > >> >>> - wait for caches to die, > >> >> > >> >> This is something we thought about however, if I understand it > >> >> correctly, doesn't JVM cache DNS entries forever until it is > restarted > >> >> ? We haven't specifically turned DNS caching off on our clients. So > >> >> this solution would require us to restart the clients to see the new > >> >> list of zookeeper hosts. > >> >> > >> >> Another thought is to use DNS RR and have the client zk url have one > >> >> name that resolves to and returns a list of IPs to the zookeeper > >> >> client. This has the advantage of being able to perform hardware > >> >> migration without changing the client connection url, in the future. > >> >> Do people have thoughts about using a DNS RR ? > >> >> > >> >> Thanks, > >> >> Neha > >> >> > >> >> On Tue, Dec 20, 2011 at 1:06 PM, Ted Dunning <ted.dunn...@gmail.com> > >> wrote: > >> >>> In particular, aren't you using DNS names? If you are, then you can > >> >>> > >> >>> - expand the quorum with the new hardware on new IP addresses, > >> >>> - switch DNS, > >> >>> - wait for caches to die, > >> >>> - restart applications without reconfig or otherwise force new > >> connections, > >> >>> - decrease quorum size again > >> >>> > >> >>> On Tue, Dec 20, 2011 at 12:26 PM, Camille Fournier < > cami...@apache.org > >> >wrote: > >> >>> > >> >>>> How are your clients configured to find the zks now? How many > clients > >> do > >> >>>> you have? > >> >>>> > >> >>>> From my phone > >> >>>> On Dec 20, 2011 3:14 PM, "Neha Narkhede" <neha.narkh...@gmail.com> > >> wrote: > >> >>>> > >> >>>> > Hi, > >> >>>> > > >> >>>> > As part of upgrading to Zookeeper 3.3.4, we also have to migrate > our > >> >>>> > zookeeper cluster to new hardware. I'm trying to figure out the > best > >> >>>> > strategy to achieve that with no downtime. > >> >>>> > Here are some possible solutions I see at the moment, I could > have > >> >>>> > missed a few though - > >> >>>> > > >> >>>> > 1. Swap each machine out with a new machine, but with the same > >> host/IP. > >> >>>> > > >> >>>> > Pros: No client side config needs to be changed. > >> >>>> > Cons: Relatively tedious task for Operations > >> >>>> > > >> >>>> > 2. Add new machines, with different host/IPs to the existing > >> cluster, > >> >>>> > and remove the older machines, taking care to maintain the > quorum at > >> >>>> > all times > >> >>>> > > >> >>>> > Pros: Easier for Operations > >> >>>> > Cons: Client side configs need to be changed and clients need to > be > >> >>>> > restarted/bounced. Another problem is having a large quorum for > >> >>>> > sometime (potentially 9 nodes). > >> >>>> > > >> >>>> > 3. Hide the new cluster behind either a Hardware load balancer > or a > >> >>>> > DNS server resolving to all host ips. > >> >>>> > > >> >>>> > Pros: Makes it easier to move hardware around in the future > >> >>>> > Cons: Possible timeout issues with load balancers messing with > >> >>>> > zookeeper functionality or performance > >> >>>> > > >> >>>> > Read this and found it helpful - > >> >>>> > > >> >>>> > > >> >>>> > >> > http://apache.markmail.org/message/44tbj53q2jufplru?q=load+balancer+list:org%2Eapache%2Ehadoop%2Ezookeeper-user&page=1 > >> >>>> > But would like to hear from the authors and the users who might > have > >> >>>> > tried this in a real production setup. > >> >>>> > > >> >>>> > I'm very interested in finding a long term solution for masking > the > >> >>>> > zookeeper host names. Any inputs here are appreciated ! > >> >>>> > > >> >>>> > In addition to this, it will also be great to know what people > think > >> >>>> > about options 1 and 2, as a solution for hardware changes in > >> >>>> > Zookeeper. > >> >>>> > > >> >>>> > Thanks, > >> >>>> > Neha > >> >>>> > > >> >>>> > >> >