[caiman-discuss] Client Redesign Functional Spec Review Meeting 6/8/09

William Schumann Tue, 09 Jun 2009 14:41:14 +0200

Peter,

Peter Tribble wrote:
> On Mon, Jun 8, 2009 at 2:51 PM, William
> Schumann<William.Schumann at sun.com> wrote:
>   
>> Peter Tribble wrote:
>>     
>>> On Thu, Jun 4, 2009 at 8:01 PM, Susan Sohn<Susan.Sohn at sun.com> wrote:
>>>
>>>       
>>>> There will be a review meeting on Monday, June 8, to discuss the
>>>> functional spec for the Client redesign. The spec can be found here:
>>>>
>>>> http://www.opensolaris.org/os/project/caiman/auto_install/ai_client_func_spec_0604.pdf
>>>>         
>>> Some comments on the spec:
>>>
>>> 5.1.1 I would much prefer, and believe it would be easier, if the
>>> default would be
>>> to fail immediately if no disk was specified. The user could
>>> explicitly specify "default"
>>> as the disk selection, at which point the guessing algorithm would
>>> come into play.
>>>       
>> It is assumed that in the near future, the AI will be used mostly in trial
>> situations,
>>     
>
> That worries me. I would expect that we're building a system for production
> deployment. Certainly if I was trialling it now, I would expect its behaviour 
> to
> be exactly what I would get in a real deployment in a year's time.
>   
This only applies to the default, out-of-the-box manifest.  Any changes 
to the default manifest will result in identical behavior in the 
future.  In the redesign, particular consideration has been given to 
ensure that the same manifest will produce the same results over time 
and repeated installs.
>> so an out-of-the-box configuration (i.e., without any user
>> modification) that results in an installed system has some attractive
>> points. Unfortunately, it is in conflict with the principle of protecting
>> user data, since it is hard to come up with a useful algorithm that has no
>> chance of deleting user data. For example, we want to use the boot disk, so
>> that when the system reboots, the target OpenSolaris is booted
>> automatically, but the boot disk probably isn't going to be an unformatted,
>> out-of-the-box disk, and historically, when Solaris is installed, the usual
>> default behavior is to use an existing Solaris partition.
>>
>> So this is still up for debate. Other input on this would be appreciated.
>>     
>>> 5.1.1 What if there are multiple bootable drives? Does "boot disk"
>>> mean the first
>>> bootable device? For x86, are you looking at what the BIOS thinks are
>>> bootable
>>> devices, or following into grub?
>>>       
>> At this point, it would not involve looking into grub menus.
>>     
>>> For sparc, are you looking at OBP for the list and,
>>>       
>> Yes, OBP variables are used.
>>     
>
> Does it resolve device aliases? For instance, some of my machines specify
> "disk3:d" as the boot device. What would it make of that?
>   
I think that the alias would be resolved and the ioctl for dkio disk 
media info would indicate that it is the boot device to AI.
>   
>>> if so, are you checking whether the listed devices are actually bootable?
>>>       
>> Not as yet. Do you see an issue here?
>>     
>
> Well, yes. I'm fairly sure there are machines on my network, and it's common
> to see this, that have a list of boot devices (some have been cleaned up, but
> not all) to go through. Sometimes the devices aren't actually bootable (they 
> may
> have been in the past but the configuration has been changed and they're now
> used for storing data). This doesn't do any harm because the non-bootable
> devices can't actually be booted from.
>   
I think it is worthwhile to try to determine what devices are bootable.  
Obviously, the only real test is to boot them, and even that may be 
dependent on circumstances.  But there are several things that might be 
checked:
- Solaris 10
-- presence of critical system files in a manner now done by Target 
Discovery
- OpenSolaris
-- presence of Boot Environments and an active BE


Also, to determine what non-Solaris disks are bootable is probably 
beyond what is feasible at this stage.

> I think you're pretty safe looking at the first device. But consider
> the following:
> a system has a pair of 18G drives that it boots from and a pair of 300G drives
> used for data. Unfortunately in a previous life they were all 18G drives and
> due to various upgrade activities all 4 drives are still listed in the
> boot list. Along
> comes AI, regards all 4 disks as candidates, throws away the real boot drives
> as being too small...
>
>   
We have looked at a variety of use cases and it has been difficult to 
devise a default algorithm that:
- is simple and easily understandable
- absolutely insures protection of user data in all circumstances
- handles single or multiple disk cases
- results in a successful reboot into the new OpenSolaris without 
intervention

> I'm fairly sure that guessing is something that should be avoided. It 
> certainly
> shouldn't be default behaviour. (Unless the user explicitly requests it, of
> course.)
>   
Installation based on guessing is problematic.  Algorithms become 
complex and must be sophisticated to handle a wide variety of 
configurations and use cases, and in the end might choose a target that 
is far from user expectations.
> The two cases where I use the 'rootdisk' specifier now are for single-disk
> workstations where there's no possible ambiguity, and if I get a new box
> where I don't know what the disk layout will look like and I just jumpstart it
> cheaply the once to work out what the actual configuration will be and then
> jumpstart it properly.
>   
Dry run for AI should provide configuration info.
>>> 5.1.2.1 If the largest is chosen, what happens if all valid disk are
>>> the same size?
>>>       
>> This was considered a level of detail beyond what is required in the
>> functional spec, but it would be according to a deterministic algorithm that
>> would be the same if the AI was repeated with the same manifest.
>>     
>>> 5.1.2.1 For target_device_select_unformatted_disk, what does without data
>>> slices mean? If its got a label, how do you define what that means?
>>>       
>> This can be more clearly described than it is currently in the spec. Without
>> data slices can mean that the VTOC is uninitialized or perhaps initialized,
>> but all slices are of zero length. Label should probably be removed from the
>> spec, since it is not necessary for a disk to have a label, but there is an
>> existing defect (6260) that is of concern, so labeling is mentioned here.
>>     
>
> Ah. Yes, I've been burnt by 6260 for jumpstart. I suspect that trying to
> interpret the partition table is unlikely to be of benefit.
>
>   
>>> 5.1.2.1 I read the qualifiers as adding disks to the list of valid
>>> targets. What about
>>> using a qualifier to remove a device from the list?
>>>       
>> This was mentioned in a discussion, but did not make it into the functional
>> spec. It seems a reasonable and useful idea and should be mentioned.
>>     
>
>   
Propose:
Users should be able to explicitly exclude devices from selection.  The 
determinism of the device identification over time and multiple installs 
should be maintained.  devid and phys_path are good criteria, but 
require specific knowledge of the target.  ctd (and perhaps mpxio) 
device specifications should perhaps be excluded, since they can vary 
and lead the user to incorrectly think that the device is being protected.

New elements to specify devices to exclude:
target_device_exclude_devid - user supplies devid
target_device_exclude_phys_path - user supplies path in /devices

These also might take partial matches, "glob" wildcarding, or regular 
expressions:  they are quite long, and it might give the user a chance 
to specify devices as a group.

Some of the target selection criteria may be useful to the user to 
specify as exclusion criteria as well: controller type, min/max size, 
vendor, and others may be used, but the user community interest in these 
should first be determined.
> Indeed. "Keep your fingers away from my SAN" seems a reasonable rule
> in many circumstances.
>   
It sounds useful to identify SAN disks for these circumstances.

How might a SAN disk be identified generally?  (Also, is this a 
consideration in the AI microroot?)
>   
>>> 5.3.2 If only file systems are managed, how are swap and dump managed?
>>>       
>> Currently, AI provides swap and dump definitions that should suffice. There
>> is currently no plan for managing them through AI. Are you proposing that
>> the user should be able to manage them through AI?
>>     
>
> I would expect to be able to specify dump and swap locations (possibly 
> multiple
> - although that's clearly going to need a lot more thought if they're
> zvols, because
> how can you say "8G swap on each of the 4 internal drives" if all you have is 
> a
> pool?) and sizes. 
We currently have a scheme for defining zfs file systems.  This could be 
expanded to allow zfs volumes that could be used for swap or dump.  The 
manifest volumes definitions then might be labeled so that they are 
added to the swap pool.  Adding volume support could be used eventually 
for iSCSI.
> How much does AI currently handle?
>   
Currently in AI, depending on memory size, the swap volume size is 
restricted according to this table:
memory-swap
<1G   -   .5G
1-64G - 1/2 available memory
 >64G -  32G

Thanks,
William

[caiman-discuss] Client Redesign Functional Spec Review Meeting 6/8/09

Reply via email to