2009]

Jan Setje-Eilers Wed, 20 May 2009 14:57:29 -0700

James Carlson wrote:
> Jan Setje-Eilers writes:
>>   The previous users of uadmin(3c) that we had encountered were 
                            ^^ correction:  uadmin(2), the syscall

>> clustering applications that require the ability to tear down a node as 
>> quickly as possible. The current customer as an update/configuration 
>> deployment framework that calls uadmin(3c) after applying the changes to 
>> the system.
> 
> Haven't we always documented uadmin(2) as the wrong way to do that?

  I suspect you looked at the page, but for the record, the language is:

        "This function is tightly coupled to the system
        administrative procedures and is not intended for
        general use."

> The current man page seems to be pretty clear that you're not supposed
> to call it unless you really know what you're doing (which would
> preclude calling it when the system is unstable).

  I did quote that to the customer and then explained it to them 
further.  They did understand, and weren't opposed to changing their 
tools, but had no method to roll out updated tools company wide. 
Interestingly they do have rather tight controls on system 
configuration, making configurable behavior a viable solution. The 
ability to configure this behavior does not as far as I can tell violate 
the definition of the uadmin interface, and may benefit more than just 
this one customer.

> Jan Setje-Eilers writes:
>>> +1.  (I will also need to reproduce a similar problem with smf which also
>>> needs to be fixed by the system)
>>>
>>> Casper
>>   The risk there is that the system just identified itself as 
>> potentially unstable. Prior to zfs root we were very careful to perform 
>> this check before any fs is mounted rw, in order to avoid the 
>> possibility of corrupting data if the system were to run into issues 
>> with incompatible modules.
>>
>>   If you all don't see a problem with this, I'm happy to implement the 
>> automated re-build during boot, it's not like it's much code.
> 
> At least to me, one of the basic questions is: what the heck can the
> administrator realistically do when the archive is out of date?
> 
> It is it at all plausible that someone might fix this problem by some
> means that do not include just "bootadm update-archive"?  If so, then
> what exactly is that scenario?  Or is it ever possible that someone
> might want to continue running despite the obvious problem?  Again, if
> so, why?

  If the administrator knows why the archive is out of date and is for 
instance willing to move forward using the older driver (if it has 
already been loaded) they can simply clear the service and drive on. If 
whatever files that are out of sync were not used yet (say a driver 
that's not part of the boot path), then it's safe to drive on and the 
fact that the test failed is really a bug.

> If there are no realistic cases where the user can do anything but
> update the archive based on the current disk contents, then this looks
> to me like the same sort of "please hang up and dial 1" annoyance
> features that we ought to be avoiding.
> 
> Especially so given the annoying regularity of the problem ...

  We have some plans to address these issues from multiple directions, 
but that is a separate case.

-jan

Configurable Boot Archive Updates [2009/312 05/26/2009]

Reply via email to