Joao Roscoe wrote: > > Seems reasonable. I still use the broadcast protocol instead. But > > what you are doing is supposed to work okay and I can only assume that > > it does. > > Tried the broadcast protocol. Unfortunately, no deal :-(
Don't know. Works for me. I like it since that way any of the servers may be down/up and the client will bind to any of them. That combination gives a nice bit of failover redundancy. (Shrug.) > I have around 20 boxes here. All of them were built as images from a > reference machine, which received a clean squeeze install. > For each machine, the image was dumped (with partimage), the hostname > was changed, and the file /etc/udev/rules.d/70-persistent-net.rules > was removed. Seems reasonable. I do a little bit more than that but mostly things specific to what I have installed. Such as configuring Postfix for the new hostname and so forth. Both /etc/hostname and /etc/mailname get updated. I assign static addresses and therefore /etc/network/interfaces is updated. I use a single ssh server key among the collective because they are intended to be identical. So I ensure that /etc/ssh/ssh_host_*_key* files are updated appropriately. And I think that is sufficient. > So, all of them should behave the same way. However, some > of them boot ok most of the times, others present NIS serve bind > timeout everytime. Quite confusing... If the hardware isn't completely identical then it is reasonable to have differences in the parallel boot timings. With the new parallel boot there will be forks and joins of the process flow during boot time. IIRC it is implemented using 'make -jX' to achieve parallel operation when possible. And since the behavior is new there are bound to be bugs that will affect people using it out of the mainstream paths. Using it with NIS/YP is not so common so I think it not unlikely that there is a bug related to it there. In particular I think I have seen cases, unverified, that even though an init.d script completed that the service it started wasn't yet ready to serve. For example I am pretty sure I have seen problems with bind starting up and being ready to serve immediately. Can't confirm this though. But it seems suspicious given your symptoms. Or nis starting up may be similar. > > In either case, I use the following configuration line for hosts in > > /etc/nsswitch.conf. > > Tried that also. No improvement. In fact, I started getting some DNS > trouble with a few older hosts. Looks like our DNS infrastructure is > completely messed up That seems like a completely separate issue. Probably should separate the two problems and address each one individually. Would be happy to help with the DNS configuration too. Describe how it is set up and the list could provide feedback on how to improve it. DNS is a marvelously designed distributed database system. It isn't perfect. There are a few problems. They didn't think of everything when it was designed. It is a huge improvement over the previous system. But it is only as good as the configured network around it. > Now, what really puzzles me: as I told before, "Restarting nis and > autofs, in this order *does* solve the issue", and that's quite fast! > Why doesn't it work at boot time? Try this experiment. At the last point in the /etc/init.d/nis startup script add a short sleep. That will give the daemons time to finish and get ready to go. It is possible that they are not yet quite ready yet and so immediately after the end of the script the next one to run hits them too early. I suggest changing this in file /etc/init.d/nis: case "$1" in start) do_start ;; stop) To this as an experiment: case "$1" in start) do_start sleep 5 # <-- Add this sleep to give things more time. ;; stop) I would do the same thing for /etc/init.d/bind9 too. Then see if that resolves the problem. I am not proposing this as a full solution nor even saying that must be the problem. But I would definitely try it as an experiment to gain data and characterize the problem. And if it works then that might be a good enough workaround for you until the problem really is resolved. (Or it might be the 'allow-hotplug' described below.) > > I...sounds like > > some incorrectly specified dependency in the /etc/init.d/* scripts. > > I agree with you, but I took a look at the scripts, and they look fine > - autofs seems to depend on nis (I'm afraid I don't know this new init > scheme very well, however). Traditionally Sun systems would store automount maps in nis files making them available through nis/yp to client machines such as through 'ypcat -k auto.master' and other files. The autofs startup script obtains the configuration files this way dynamically at start time. This is optional. It isn't required. You may have configured it either using real files on disk or using files in networked nis/yp files. If in the nis/yp files then the autofs script will try to use them from nis. > Anyway, this kind of issue would probably > break things for a lot of people... I have something else to try that I have learned in the last year since your first note. :-) In /etc/network/interfaces it probably says: allow-hotplug eth0 Change that to: auto eth0 The allow-hotplug enables the event driven startup. The auto enables the traditional startup. I have had some issues with the event driven startup similar where things will block for a long time at boot time waiting for various events to happen. Using auto instead forces the previously hard set flow and avoids the problem. Specifically when using nfs mounts in /etc/fstab. Again as an experiment I would switch to 'auto' for the network startup. That by itself might be your solution. (Or it might be the startup sleep delay described above.) > > But because it is so annoying before too long someone > > will have debugged it and gotten the offenders removed from the > > mailing list. > > Got a probe email a few days ago - someone worked on it. Hope the > issue is already solved. Unfortunately the problem persists. I conversed briefly with the listmasters and they are aware of it but no one has been able to deduce the offender. The joe1assistly spam has also affected some of the Cygwin mailing lists too. I have examined the spam coming my direction and I can't deduce a clear solution to it. Of course I could block it for myself by blocking any Message-Id: with joegiglio.org in it but that wouldn't help the mailing list at large. Bob
signature.asc
Description: Digital signature