Hi Jan,

Great writeup of the issues! Thank you for taking the time to 
investigate this so thoroughly. I have some comments/questions inline..
>
> Agreed. This is the scenario which I think we implicitly assumed to 
> work before
> 8130 was reported. As you pointed out it defines two install service 
> scopes
> or bindings:
>
> [1] per-client binding - explicitly established by 'create-client'
>    - max one per client, total number not limited
>
> [2] 'default' - serving clients without per-client binding
>    - only one is active at a time
>
> Right now, [1] works correctly for both x86 & Sparc.
>
> When investigating 8130, I have realized that [2] doesn't seem to work
> correctly neither for x86 nor for Sparc.
>
> Sparc is more problematic, since if more than one install service is 
> created,
> mismatch between boot archive and compressed archives appears. This is 
> what
> was initially assumed (when this discussion started) to be addressed 
> by fix
> for 8130.
>
> When discussion was evolving, it seemed to appear that fixing 8130 in 
> such
> a way that [1]+[2] continues to work is problematic and risky at this 
> point.
>
> Then two approaches were suggested:
>
> * support only [1] for Sparc
>
>  - it solves problem with mismatch, since mismatch only occurs for [2]
>  - it is not acceptable
>
> * allow only one install service to be available at a time and
>  use it as default - which is [2]
>
>  - it seems to be too restrictive - only one install service for
>    all Sparc clients
>  - it doesn't solve mismatch problem
>
> If my understanding is correct, now we are back in situation when
> [1]+[2] is required to work ?
>
> Let me try to verify if I understand what problems we are trying to solve
> (and how) before I get completely lost :-)
>
> [A] mismatch problem - tracked by bug 8130
> ------------------------------------------
> affects only Sparc AI client when it uses [2]
>
> Since it is caused by the fact that Sparc client obtains location
> of AI image pieces from two places, I think the solution (at least
> for now) would be to query one common source for obtaining
> location for both boot archive as well as solaris*.zlib archives.
>
> We can't do too much about boot archive stage at this point,
> so let's take a look at the stage when solaris*.zlib archives
> are obtained.
>
> Since full path to boot archive is defined by 'root_file' option
> in wanboot.conf file, if we have this info available in AI client,
> I think it would help to solve the problem.
>
> I was investigating what wanboot offers in this point and it seems
> we can easily obtain particular wanboot.conf. During wanboot process,
> wanboot.conf along with other files is bundled as HSFS filesystem
> and provided by kernel as /devices/ramdisk-bootfs:a block device
> (I finally realized what the wanbootfs file downloaded by client
> can be used for :-).
>
> Looking at how legacy installer makes it available, it is 
> straightforward:
> ...
> BOOTFS_DISK="/devices/ramdisk-bootfs:a"
>
> if [ -b "$BOOTFS_DISK" ] ; then
>    mount -r -F hsfs "$BOOTFS_DISK" /etc/netboot > /dev/null 2>&1
> ...
>
> Then we have wanboot.conf available in /etc/netboot/ directory in
> microroot and we could determine location of solaris*.zlib archives
> by inspecting 'root_file' option - it is created by installadm(1M)
> in following format:
>
> root_file=<ai_image>/boot/boot_archive
>
> and solaris*.zlib archives are in
>
> <ai_image>/solaris*.zlib
>
> I haven't tried this yet, but it seems promising.
>
>
> [B] correctly working [2] scope
> -------------------------------
> Assuming [A] is addressed, then [2] would work in following way:
>
> x86 - first created service has 'default' scope
> -----------------------------------------------
> When local DHCP server is to be used, the first install service is
> created along with -i -c options and IP pool is associated with
> service-specific DHCP macro. Thus it is used as default, since
> client loads menu.lst file defined in service-specific macro.
>
> Any subsequent service (if created without -i -c) doesn't update IP pool
> by calling pntadm(1M) leaving the first service used as 'default'.
>
> Issues:
> - It is a little bit problematic to change the default service
>  (e.g. delete old/create new), since appropriate IP pool has to be
>  updated with new dhcp service-specific macro. This is now only done
>  if '-i -c' options are provided and user has to know which IP pool
>  is to be updated.
So, when we delete a service, we don't delete the service specific macro 
on the dhcp server? So, this data for this service is still kept?
>
> - If default service is deleted, client is still served with menu.lst
>  file referring to it, as IP pool is still associated with service
>  specific DHCP macro.
>
We don't delete the dchp macro when we delete the service? Isn't the 
image gone at this point though, so even if the client is served the 
menu.lst it will fail?
> - First one is always set as a default - this could be addressed by
>  explicitly specifying which one should be set by default (e.g.
>  by providing '-d' option as suggested).
>
>
> Sparc - last created service has 'default' scope
> ------------------------------------------------
> The reason is that each time new service is created,
> /etc/netboot/wanboot.conf is updated. Specifying -i -c doesn't
> affect this behavior, since DHCP server would be queried only for
> network info (client IP, server IP, ...) and boot file (which is
> wanboot-cgi common for all clients). Once [A] is fixed, 'RootPath'
> option would be no longer utilized.
>
> Issues:
> - Last one is always set as default - this could be addressed by
>  explicitly specifying which one should be set by default (e.g.
>  by providing '-d' option as suggested).
>
> I think we would need to clarify
> * desired behavior for [2] along with corner cases and possible states
A couple of questions:

1. If we used -d, could we update the ip pool for the user in the case 
of x86? Do we have the data to do this? Seems like -d for sparc is 
straightforward. Not so much for x86.

2. If the dhcp server is on a separate machine, we do not currently 
update the macro or pool data, correct?

These are some issues I see with default behavior that we need to think 
about(off the top of my head):

-We need to ensure that we clean up everything associated with a default 
service, IP pool, wanboot.conf, images, etc
I am not sure what the state of our cleanup code in installadm is when 
we delete a service.

-If we cannot do the correct cleanup for a deleted default service, we 
fail, and don't allow the user to create another default. Not sure how 
we would note this 'failure' and keep track of this so we could fail the 
creating of another default service.

-If the dhcp server is on a different machine than the install server, I 
think we run in to issues if we rely on the users to delete the 
associated macros. We cannot enforce this at this time.

-If we implement -d(default) for this release, what are the implications 
of using the new command on services setup prior to use actually 
explicitly defining a default service?

-My take on what is desired behavior and attributes for a 'default' 
service(high level thoughts):

    1. It has all components necessary for the specific architecture
    defined.

        x86: appropriate service specific dhcp_macro and ip pools setup
        on dhcp server(even if it is a remote server), complete service
        image
        sparc: appropriate wanboot.conf data available in
        /etc/netboot/wanboot.conf, service specific dhcp macro setup on
        dhcp server, complete service image

    2. There is only 1 default service for each supported architecture
    at any time on an install server. And, the service must explicitly
    be setup as the 'default' service via installadm -d.

        This begs the question.. is it required to have a default
        service on an install server for each architecture? In the case
        of sparc if we don't have a 'default', this is problematic for
        all sparc clients, right? Even those that are setup with
        create-client?

    3. If we cannot setup all components of a default service, we fail.
    Or we provide an override flag if user wants to go ahead and have us
    setup what we can.

        We need to figure out what is allowable as user controlled setup
        in this scenario.

    I'll think on this more and reply again.

> * what is appropriate/desirable to address for the current release
>
Based on the question above, it is clear we have a lot of thinking to do 
about the design of a 'default' service. What I think we can do for this 
release is the following:

    1. Fix the mismatch issue for sparc, perhaps as you have defined
    above with the wanboot changes.

    2. With regard to the -d, explicit setting of the default service I
    think we have a couple of choices:

        -We do not implement -d. I think the design of this is going to
        take some careful thought.

            Part of the problem we have now is that the design doesn't
            hold for default services. And, we are missing some key
            functionality, such as setup of dhcp macros and pools on
            remote dhcp servers.  We allow the implicit behavior that is
            currently in the implementation to stand.

        or.. we implement -d with the bare minimum. What I consider bare
        minimum is:

            -We overwrite /etc/netboot/wanboot.conf for sparc

            -We cleanup the dhcp macro and ip pool for the users on x86
            if we can. If not, we output messages telling them what they
            must do.

            -We ensure that when we delete a service we delete all the
            pieces we need to so we don't get clients booting from old
            menu.lst files or wanboot.conf. If we cannot delete all the
            pieces we output messages telling them what they must do.

    3. We update the docs and manpage to explicitly let users know:

        If we do not implement -d:

            -The last sparc service created on an install service is the
            default
            -The first x86 service created on an install server is the
            default
            -to change x86 they need to ensure the dhcp macro and IP
            pool data is updated
            -If they don't want to use whatever 'default' is on the
            server, they must explicitly do a create-client

Do you have any sense of the amount of work it would take to do the bare 
minimum for the -d support? Do you have any ideas on what you think the 
bare minimum might be? We have to try to be sure whatever bare minimum 
we implement, if we decide to go this way, doesn't cause other design 
issues later.

thanks,
sarah
***


> Thank you,
> Jan
>
> _______________________________________________
> caiman-discuss mailing list
> caiman-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/caiman-discuss


Reply via email to