On 3/24/06, Dave Miner <Dave.Miner at sun.com> wrote:
> Mike Gerdts wrote:
> ...
> > I'm *so* glad to see that this is an area of focus.  My comments on
> > the document and installation related tasks follow.
> >
> > Page 6, Bullet 2, sub-item 2: SUNWCXall no longer does the trick...
> > when SUNWCXall is installed on a sun4u box (15k domain) the sun4v
> > platform support is not added.  This implies that in addition to my
> > 15k domain used primarily for image development (that ain't cheap), I
> > now need to have a T1000 or T2000 sitting around for the same purpose.
> >  In a globally distributed jumpstart environment, I now need to
> > distribute three ~2 GB flash archives to get x86-64, sun4u, and sun4v
> > support.
> >
>
> Thanks for pointing this out, as I hadn't noticed it.

Is this a bug or accepted limitation for some reason?  Has pointing it
out caused it to be noted in an updated version of the document, a bug
filed, or both?  I can file the bug through OpenSolaris if this is not
a conscious design decision.

> > Page 8 - Live Upgrade is also hampered by the following in my environment:
> >
> > 1) It uses a version of cpio which does not support sparse files.
> > This causes files like /var/adm/lastlog to balloon in size when large
> > UID's (100,000,000 - 999,999,999) are used.  Similar issues likely
> > exist if a quotas file happens to be in a partition used for live
> > upgrade.

Bug 4480319.  I'm not sure if this is the one that I filed or not, but
it's been out there for a while.  I've discussed this a bit on
zones-discuss as well because "zoneadm clone" now has the same problem
as live upgrade and flash archives.

> > 2) It has spotty support for upgrading to metadevices.

I am pretty sure that there is a bug on this one, but I am having
troubles finding it.   Essentially, it boils down to the following
blowing up:

lucreate -s - -n newbe -m /:d30:ufs,preserve
luupgrade -f -n newbe -s $osmedia -J 'archive_location nfs://somewhere'

To work around this, I have done:

# cp $osmedia/Solaris_10/Tools/Boot/usr/sbin/install.d/pfinstall \
    /var/tmp/pfinstall.orig
# mount -F lofs -O /dir/pfinstall-wrapper \
    $osmedia/Solaris_10/Tools/Boot/usr/sbin/install.d/pfinstall

The wrapper causes the following change in the profile before calling
/var/tmp/pfinstall.orig

< filesys d30 existing /
---
> filesys mirror:d30 c0t0d0s3 c0t1d0s3 existing /
> metadb c0t0d0s7
> metadb c0t1d0s7

Note that this has worked for me on one machine and just got it to
work in the past 24 hours.  By no means am I convinced that it is a
robust workaround yet.

Beyond that, I found the following problems going from S9 to S10:

1) netgroup entries in /etc/shadow were missing but they were in /etc/passwd.
2) Solaris 10 should have more default password entries than Solaris 9
(gdm, webservd, etc.).  These were lost.
3) swap and other metadevices were commented from vfstab
4) mount points for lofs file systems were missing
5) Complaints about svc:/system/cvc:default in maintenance mode when
it was not appropriate for the platform (should not have been enabled)
6) SVM related sevices are not enabled
7) JASS to the new boot environment looked kinda scary when it started
out with complaining about shared library problems calling zonename.

> > 3) It sometimes requires applying patches and new packages to a running 
> > system
>
> I hope you've filed bug reports on the first two.  Using ZFS will make
> both less of an issue, since we don't need to copy or use metadevices.

#2 is an ongoing issue that a co-worker has a case open on right now.

> #3 is kind of a hard problem; I've got some ideas mentioned in the paper
> about perhaps using VM's to provide the environment in which the upgrade
> would run, which would limit the need for patches specifically for the
> upgrade.  I need to kick those around with the experts a bit to see if
> they're actually feasible.

I really like this idea.  I suspect that it will be hard to achieve
(lots of dom0 support) if Xen cannot be nested arbitrarily deep.

> I expect we'll fix the fragmentation between x86 and SPARC by going to
> GRUB on SPARC as well.  Long term, I think the model is better for most
> people, but the transition could have been handled better, I agree.

This oughta be interesting... Is this part of making zfs bootable
(that is, is it easier to write the bootstrap code for grub than it is
for openboot?)

> > It is really hard to find a reference to anyone other than Sun using
> > Sun's DHCP server for jumpstart.  Perhaps Sun should consider
> > migrating to ISC dhcpd where there is much more mindshare (and support
> > for vendor options that exceed 255 bytes).
> >
>
> There are some postings on comp.unix.solaris on how to use the ISC
> server if you wish, and I also have a script that one customer was kind
> enough to provide which does a similar setup for the Microsoft DHCP server.

The point here is that I need the DHCP server to be supported.  I wish
that I could support my Solaris jumpstart environment using a sun
supported ISC dhcp server.  My next best option is ISC on Red Hat.

> > When debugging jumpstart installations, it is common to spend more
> > time waiting for the system to reset than it does to figure out that
> > the jumpstart is just going to bomb out again.  I saved a lot of time
> > when I learned about using two or more of the following commands:
> >
> > rm /tmp/.jumpstart
> > ifconfig bge0 dhcp renew
> > suninstall
> >
>
> So, extrapolating a requirement, you want a supported restart of an
> installation without a reboot, right?  Sounds good to me.

Yes, that would be a good summary.  Especially important when trying
to debug 15k-specific problems due to the extremely long POST.

> > Section 2.2.6 - Installing from flash archives whacks all sysidcfg
> > information.  In a disaster recovery scenario, you likely don't want
> > that to happen.  Integration with flash archives and a custom
> > netbackup agent would be nice...
> >
>
> Agreed about the recovery requirement; can you elaborate on what you're
> looking for in the integration with a backup system?

Many backup programs have the ability to do one or more of the following:

1) Call a custom module that will generate a data stream to be backed up.
2) Call a custom script as a "pre-backup" script.

It may be (or may not be) useful to tie those mechanisms in with flash
tools to allow the backup system to manage the retention of old flash
archives (full or differential).  Then in a restore situation, the
"special flash" tools on a live CD or network boot would be able to
load the appropriate data from tape, using the backup system as an
intermediary.

Simply doing flash archives to disk, then taking those to tape as
required may be more practical.

> > Optimization of network performance is sometimes a matter of
> > optimizing the size of installation media.  If something like flash
> > archives continues to exist, they should use a better compression tool
> > than compress(1).
>
> Sure, providing options here seems reasonable.

A key here may be to devise a file format that chunks a data stream
into lots of somewhat large pieces that are individually compressed,
using the compression algorithm that gives the right mix of speed and
size.  When the data stream is being extracted, the various chunks
could be individually uncompressed on multiple hardware threads.

With ZFS promoting compressed file systems, perhaps compression is as
interesting of a use for a customized core as an FPU or encryption
accelerator.

> > Other ramblings...
> >
> > - Options used for debugging the installation process should be part
> > of the public interface.  Custom installation modules should be able
> > to hook into the debugging framework through a public interface.
> >
>
> Can you tell me more about what you're after?

Typical scenarios include:

1) Jumsptart tells me it can't find a matching rule in rules.ok.  This
may mean that it could not find the SjumpsCF directory, that I
misspelled a hostname, or a host of other things.  To debug, today I
have to exit the installation (ok) see what is mounted, run my custom
"catdhcp" script to show me what DHCP really sent me.  Decoding vendor
options using dhcpinfo is non-trivial.

2) When trying to debug jumpstart installations, often times it
involves trying to understand what is really happening because there
is not a lot of documentation about how jumpstart really works.  If
jumpstart were not running, I would typically use tools like truss,
snoop, etc. to figure out what is going on.  Obviously, Sun has seen
this need too because in various Solaris releases you see other
options in the code that parses the boot command line to take
debugging arguments.  However, those are not documented and appear
private.

Tools that would be useful:

1) Create an option that enables sshd during installation.
2) Create boot options that specify the dtrace script to run during
installation.
3) Include dtrace in the miniroot (haven't checked sparc, missing from newboot)
4) Consider having the ability to syslog installation progress.

One thing that I have found very nice on with various Linux distros
and Nexenta is that I have virtual consoles (or approximations
thereof) that allow me to observe the installation process more than
watching a progress bar.  This is very helpful when getting to know a
new installer or debugging changes.

Mike

--
Mike Gerdts
http://mgerdts.blogspot.com/

Reply via email to