I have a problem with BIND 9.7.x on Ubuntu. I have two servers that are running 9.7.3. They slave 332 zones, and they also master 213,750 malware/spyware zones that we have defined to reroute these domains to a local machine.
When I was upgrading the BIND to 9.7.3-P1 yesterday, an ./rndc stop command ran over 8 minutes, and named did not stop. A "kill" command did not work; I had to revert to a "kill -9" command. What was BIND doing? Gracefully closing all of the zones? BIND 9.7.3-P1 came up fine, but there are two things that concern me: 1) After BIND began responding to queries, it was using 100% of the CPU for about three minutes. I am not sure what BIND was doing. This is not major because BIND was handling customer queries, and after the three minutes the CPU usage dropped to a normal 1%. 2) Two zones reported serial number decreases. This is bad. I did some research on the two zones - both Microsoft Active Directory zones (one _tcp and one _udp) that are mastered on a Windows Domain Controller and slaved on my BIND boxes. I have around 44 AD zones I slave, and only these two reported problems - on my two internal Ubuntu slaves and my two Solaris 10 slaves. The two Solaris 10 slaves do not run the spyware zones, so I had no problem with "./rndc stop". I therefore am not sure that the serial number problems are due to the "kill -9". I looked at the serial number issue on these two zones in detail; I capture the serial numbers on all the AD zones each morning at 6:10. Here is information for the _tcp zone: Date Zone Mast Slav Slav 20 Oct 2010 _tcp. 1233 1233 1233 21 Oct 2010 _tcp. 1239 1239 1239 The master incremented the serial. ... 09 Nov 2010 _tcp. 1239 1239 1239 10 Nov 2010 _tcp. 1238 1239 1239 Master decreased due to MS patch 11 Nov 2010 _tcp. 1238 1238 1238 ... 03 Dec 2010 _tcp. 1238 1238 1238 04 Dec 2010 _tcp. 1238 1238 1239 ?? 05 Dec 2010 _tcp. 1238 1239 1238 ?? 06 Dec 2010 _tcp. 1238 1238 1238 ... 09 Dec 2010 _tcp. 1238 1238 1238 10 Dec 2010 _tcp. 1238 1238 1239 ?? 11 Dec 2010 _tcp. 1238 1239 1238 ?? 12 Dec 2010 _tcp. 1238 1238 1238 ... 05 Jan 2011 _tcp. 1238 1238 1238 06 Jan 2011 _tcp. 1238 1239 1239 ?? 07 Jan 2011 _tcp. 1238 1238 1238 ... 02 Mar 2011 _tcp. 1238 1238 1238 Upgrade 9.7.2-P3 to 9.7.3 03 Mar 2011 _tcp. 1238 1239 1239 04 Mar 2011 _tcp. 1238 1238 1238 ... 16 Apr 2011 _tcp. 1238 1238 1238 17 Apr 2011 _tcp. 1238 1238 1238 1238 1238 Two Sol10 slaves added. ... 02 Jun 2011 _tcp. 1238 1238 1238 1238 1238 Upgrade 9.7.3 to 9.7.3-P1 03 Jun 2011 _tcp. 1238 1239 1239 1239 1239 Both Ubuntu slaves have been up for 149 days (reboot around Jan 15). The zone serial was 1239 until a MS patch run on the Domain Controller decreased the serial by one on the evening of Nov 9. I did nothing to correct the problem; I waited for the two zones to expire, and then new zones were transferred from the Windows master server. The serial number was 1238 on the master and slaves. On a few days, the serial on the slaves increased by one, and I am not sure what happened on those days. On Mar 02 I upgraded BIND from 9.7.2-P3 to 9.7.3, and the serial numbers on the two upgraded BIND slaves reverted to the higher 1239 serial. Again, I did no fixup, and on Mar 04 the serials were the same at the lower value. I think that the serial number decrease was temporary during the patch run. On Apr 17 I added the two Solaris 10 slaves to my morning report, and all five serials were contant at 1238 until I upgraded BIND Tuesday (on the Solaris 10 boxes) and yesterday (on the Ubuntu boxes). Immediately after the upgrade BIND reported the serial number problem on these two zones. The other AD zones have had no serial number problems. I have no idea why BIND would remember the increased 1239 serial number, when the serial number for the zone has been constant at 1238 since Mar 04. I have to assume that between Mar 04 and Jun 03 BIND would have written the zone to disk, either in the base zone file or a .jnl file. -- ---------------------------------------------------------------------- Barry S. Finkel Computing and Information Systems Division Argonne National Laboratory Phone: +1 (630) 252-7277 9700 South Cass Avenue Facsimile:+1 (630) 252-4601 Building 240, Room 5.B.8 Internet: bsfin...@anl.gov Argonne, IL 60439-4828 IBMMAIL: I1004994 _______________________________________________ bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users