Jan Setje-Eilers writes: > James Carlson wrote: > > The current man page seems to be pretty clear that you're not supposed > > to call it unless you really know what you're doing (which would > > preclude calling it when the system is unstable). > > I did quote that to the customer and then explained it to them > further. They did understand, and weren't opposed to changing their > tools, but had no method to roll out updated tools company wide. > Interestingly they do have rather tight controls on system > configuration, making configurable behavior a viable solution. The > ability to configure this behavior does not as far as I can tell violate > the definition of the uadmin interface, and may benefit more than just > this one customer.
As it doesn't cover the other boot-time issues, and it currently is just for one customer, I still think it's a hack. And I'm very much concerned that providing a tunable here will drive customers in exactly the wrong direction. If you wanted a private interface for this (say, an undocumented /etc file or /etc/default entry or /etc/system variable that is written up in an infodoc article and explained to this one customer), then I'd be more supportive of the change. I'd still think that you're putting _way_ too many moving parts into the uadmin(2) system call interface (how exactly does a syscall invoke a user-space archive rebuild anyway? or did you mean uadmin(1M)?), but making that one customer happy sounds like a good trade-off. However, you're proposing it as a public interface, and as something we're committing to for the long term. Given that it doesn't actually solve the underlying problem, and that it mistakenly tells customers that Solaris isn't safe to use, and that the default is to be "unsafe," I can't agree with that. > > It is it at all plausible that someone might fix this problem by some > > means that do not include just "bootadm update-archive"? If so, then > > what exactly is that scenario? Or is it ever possible that someone > > might want to continue running despite the obvious problem? Again, if > > so, why? > > If the administrator knows why the archive is out of date and is for > instance willing to move forward using the older driver (if it has > already been loaded) they can simply clear the service and drive on. It may have already been loaded, but the fact that it's out of sync with the one on disk means: - If it happens to unload, then the next load will cause a *different* copy of the driver to be loaded, with possibly unexpected results. It's all timing dependent and hard to predict. - The fact that it's out of date with respect to the disk is a likely indication that this isn't the only problem. There may well be applications that depend on that driver (drivers usually aren't too interesting without at least some applications that use them), and the fact that the driver has been updated on disk likely indicates that the non-archive-resident applications have *also* been updated by the same patching process. For the normal administrator -- one who hasn't yet memorized the source code for the drivers -- I suspect that the behavior is just unpredictable. I can't see how anyone would accept that as a reasonable risk for running the system, when the alternative is to spend a couple of minutes rebuilding the archive and rebooting to get a stable and predictable system. Perhaps more importantly: if someone actually did this, and then later ran into a problem, what would our support people say when they got the call? > If > whatever files that are out of sync were not used yet (say a driver > that's not part of the boot path), then it's safe to drive on and the > fact that the test failed is really a bug. Can we fix that bug? At the point when the real root is mounted, is it possible to remember the files that have been used, so that when we later check the archive, we know whether the out-of-date files have been used by accident? In any event, this is just a hard-to-predict corner case. As with the other one, rebuilding is safer and easier. It's possible that driver developers and others hacking around in the kernel may know when skipping an archive rebuild is ok, but I'm not seeing a good argument for providing that sort of functionality to regular system administrators. > > If there are no realistic cases where the user can do anything but > > update the archive based on the current disk contents, then this looks > > to me like the same sort of "please hang up and dial 1" annoyance > > features that we ought to be avoiding. > > > > Especially so given the annoying regularity of the problem ... > > We have some plans to address these issues from multiple directions, > but that is a separate case. I don't agree that it's a separate case as long as we're talking about a committed interface and a higher-level statement of direction regarding uadmin. -- James Carlson, Solaris Networking <james.d.carlson at sun.com> Sun Microsystems / 35 Network Drive 71.232W Vox +1 781 442 2084 MS UBUR02-212 / Burlington MA 01803-2757 42.496N Fax +1 781 442 1677