I'm submitting this fast-track for Cathy Zhou, it times out on
07/17/2009.  This case depends on PSARC/2008/693.  2008/693 requested
"micro" release binding, but no incompatible changes are introduced, and
"patch" would be more appropriate (although no backport is planned).  As
such, this case both requests "patch" binding, and updates 2008/693 to
have "patch" binding as well.

The materials directory contains this specification (vrrp_psarc) as well
as the documents listed in the References section.

VRRP Update:

Summary
=======

   This case describes several design issues of the original VRRP case
   (PSARC/2008/693 VRRP) and the proposals to address those issues.

Problem area
============

   Specifically, the problems with the existing VRRP design are:

   1) Incorrect false accept_mode support

      According to the VRRP protocol, accept_mode can be set to either
      true or false by an administrator over a non-address-owner VRRP
      virtual router. If the accept_mode is set to be false, when the
      VRRP virtual router becomes the master (non-backup), this
      virtual router must not accept packets destined to the virtual
      IP addresses which are configured on this router. But the router
      must respond to the ARP request/ND solicitations for the virtual
      IP addresses.

      The existing VRRP design described two approaches to support the
      false accept_mode:

        a. For non-link-local virtual IP addresses, add the ARP/ND
           cache using the SIOCSXARP/SIOCLIFSETND ioctl in order to
           respond to the ARP request/ND solicitations. Note that
           these virtual IP addresses are not brought up on the VRRP
           interfaces.

        b. For link-local virtual IP address, "a new interface flag
           IFF_LL_NOACCEPT is introduced to mark a VNIC as non-accept
           mode. On receiving ioctl SIOCSLIFFLAGS request for
           IFF_LL_NOACCEPT, the IRE entry for the link-local address
           of the interface will be marked as IRE_MARK_NOACCEPT. If an
           ire which is looked up for a link local destination address
           turns out to have this flag marked, the input packets will
           be dropped in ip_rput_data_v6()".

      The above approaches do not work. Note that with approach (a),
      no IP address is brought up on the specific interface (in the
      IPv4 case), the ill of this interface will not be "bound" and
      the SIOCSXARP ioctl will simply fail. Furthermore, both
      approaches would drop all the unicast packets including the
      unicast Neighbor solicitation, which makes the Neighbor
      Unreachability Detection mechanism unusable.

   2) Duplication of administrative interfaces and system services

      The administration model proposed by PSARC/2009/693 assumes
      that VRRP service has full management over the VRRP IP interfaces,
      the primary IP addresses and and the virtual IP addresses used
      by the specific VRRP router. The vrrpd daemon plumbs/unplumbs
      the VRRP interfaces and configures/unconfigures the associated
      IP addresses based on the current state of the VRRP router. But
      the design does not consider the interaction with the other
      existing IP interfaces configuration tools (network/physical
      service, ifconfig etc.), and the potential impacts of such
      interaction is unkown.

      More important, since the existing design assumes the entire
      control over the VRRP interface IP addresses, it makes the
      vrrpadm another administrative tool to configure IP addresses,
      which will never be as flexible as the existing tool. E.g.,
      do we need to extend vrrpadm to configure one specific IP
      address to be the "preferred" address?

      Likewise, since the vrrpd internally create/destroy the special
      VRRP VNICs, we will lose the flexibilities provided by the
      existing "dladm" command and the feature it provides (flows,
      bandwidth, priority etc.).

   3) No exclusive-zone support

      With the old VRRP PSARC case, the administrator specifies the
      name of the interface that will be managed by VRRP, and vrrpd
      would internally create and plumb a special VRRP VNIC over
      that interface and configure virtual IP addresses over that VNIC.

      This makes the VRRP support in an exclusive-zone problematic,
      since creating VNIC in an non-global zone is not supported.

   4) No VLAN support

      The same design makes VRRP support over VLAN problematic as well,
      since the current VRRP design blindly try to create VNIC over
      the specified interface, and creating VNIC over an VLAN is not
      allowed either.

Proposal Overview
=================

- vrrpadm changes

  1) Add "-router" to each vrrpadm subcommand

     Following the precedent of "dladm", all the vrrpadm subcommands will
     contain the object of the operation - the "router". For example,
     "vrrpadm create" will be changed to "vrrpadm create-router", and
     "vrrpadm show" will be changed to "vrrpadm show-router".

  2) Change the "startup/shutdown" subcommands to "enable-router/disable-router"

     "startup" sounds like something that needs to be done on each reboot,
     which is not the case.
      
- MAC_CAPAB_VRRP, DL_CAPAB_VRRP, IFF_VRRP and SO_VRRP

  For each VRRP router, a special VRRP VNIC is created with the
  special VRRP virtual MAC address. All the IP addresses reside on
  this VNIC are regarded as virtual IP addresses protected by the
  VRRP router. The vrrpd daemon brings up those addresses when the
  router becomes master and brings down the addresses when the router
  becomes backup. In other words, vrrpd has full management of the
  up/down state of the virtual IP addresses and no other applications
  and services are allowed to change the up/down state of those IP
  addresses.

  A new mac capability MAC_CAPAB_VRRP will be introduced and the
  special VRRP VNICs will have such capability. The VNICs will then
  advertise a new DL_CAPAB_VRRP DLPI capability as part of the
  DL_CAPABILITY_REQ/ACK negotiation with IP. IP will learn that
  the corresponding ill is VRRP capable and mark each IP addresses
  configured over such ill with a IFF_VRRP flag, to indicate that is
  a VRRP virtual IP address.

  To make VRRP service the only authorization to bring up and down
  the virtual IP addresses, a new SO_VRRP socket option will be
  introduced. Socket has the SO_VRRP socket option set is a VRRP
  control socket, and only VRRP control sockets are allowed to change
  the IFF_UP flag of a VRRP virtual IP address (with IFF_VRRP set).
  Other attempts to change the IFF_UP flag will fail.

  The priv_sys_ip_config privilege is required to set the SO_VRRP
  socket option.

  For now, the vrrpd daemon will be the only application to set the
  SO_VRRP socket option on the socket changing the IFF_UP flag of the
  virtual IP addresses, in order to manage their up/down state based
  on the state of the VRRP router.

- IFF_NOACCEPT

  When a VRRP router becomes master, if its accept_mode is false, all
  the virtual IP addresses will be brought up but with the new
  IFF_NOACCEPT flag set by vrrpd. IP will mark all the local IREs
  associated with these IFF_NOACCEPT IP addresses to be "no_accept",
  and all the received unicast local packets will be dropped, with
  the exception of the Neighbor Solicitation packets and Neighbor
  Advertisement packets. This allows the Neighbor Unreachability
  Detection mechanism work as expected in the false accept_mode.

  The IFF_NOACCEPT flag can only be set on a IP addresses if its
  IFF_VRRP flag is set. Further, the same SO_VRRP socket option must
  be set to change the IFF_NOACCEPT flag over a IP address.

- Interact with the existing data-link and IP administrative tools
 
  To address issue 2, VRRP configuration will be integrated with the
  existing IP administrative model seamlessly: the existing data-link
  and IP administrative tools and service will be able to be used to
  create/delete the VRRP special VNICs, plumb/unplumb the physical
  interfaces (which own the primary IP address) and VRRP VNICs (which
  own the virtual IP addresses), and configure the primary and
  virtual IP addresses needed by a VRRP router.

  The vrrpadm command will still be needed to create/delete a VRRP
  router, and configure the primary arguments required by the VRRP
  protocol: the VRID, the address family, the interface over which
  the VRRP router is created on, the priority, the advertisement
  interval etc.

  The "dladm create-vnic" subcommand will be extended to create the
  VRRP special VNICs where the virtual IP addresses reside on. A
  new mac address keyword "vrrp" will be introduced and will be
  used to create VRRP VNICs[2]. A new vnic_mac_addr_type_t
  VNIC_MAC_ADDR_TYPE_VRID will be added and the dladm_vnic_create()
  API will be extended accordingly:

     dladm_status_t dladm_vnic_create(dladm_handle_t handle,
         const char *vnic, datalink_id_t linkid,
         vnic_mac_addr_type_t type, uchar_t * mac_addr,
         uint_t mac_len, int *mac_slot, uint_t mac_prefix_len,
         uint16_t vid, vrid_t vrid, int af, datalink_id_t *vnic_id_out,
         dladm_arg_list_t *proplist, uint32_t flags);

  To track the VRRP virtual IP addresses, the vrrpd daemon will
  determine the VRRP special VNIC used by a specific VRRP router
  based on the VRID, the IP address family (IPv4 or IPv6) and the
  physical interface (including VLANs and aggregations) the router
  is created on, and regard all the IP addresses configured over
  the VNIC as the virtual IP addresses associated with this VRRP router.

  Since the existing PF_ROUTE event opcodes only report the changes
  of the set of "UP" IP addresses, and the vrrpd daemon needs to
  track all the virtual IP addresses configured over each VRRP router,
  regardless the IP address is brought up or not. Therefore, two
  new PF_ROUTE event opcodes will be introduced to report the changes
  of the IP addresses configuration, including the IP addresses that
  have not been brought up:

     - RTM_CHGADDR

       The RTM_CHGADDR event will be generated when a new IP address
       is newly configured (added or updated to).

     - RTM_FREEADDR

       The RTM_FREEADDR event will be generated when a IP address is
       removed from the configuration.

  The message format of the above routing socket events will be the
  same as the format of the RTM_NEWADDR/RTM_DELADDR messages.

  Note that both events will not report the unspecified (all-zero)
  IP addresses.

  Further, vrrpd will also track all the "UP" IP addresses configured
  over the physical interface, and select one as the primary IP address
  which will be used to send the VRRP advertisement.

  Below gives an example which creates an IPv4 VRRP router (vrrp1)
  with VRID 12, priority 100 and false accept_mode over the bge1
  data-link. The virtual IP addresses are 11.1.1.1/24, and the
  primary IP address in the VRRP advertisement is 11.1.1.100.
  Note that the IP addresses in this example could be configured
  by any other IP configuration tools/services other than "ifconfig".

    # dladm create-vnic -m vrrp -V 12 -A inet -l bge1 vrrp_vnic1
    # vrrpadm create-router -V 12 -l bge1 -A inet -p 100 -o no_accept vrrp1
    # ifconfig vrrp_vnic1 plumb 11.1.1.1/24
    # ifconfig bge1 plumb 11.1.1.100/24 up

  Since in some cases, the VRRP configuration is pretty simple and
  the administrator would prefer to have one single administrative
  tool to configure everything related to a VRRP router. Therefore,
  we give the option to an administrator to do that using only
  vrrpadm: the optional "-a" option can be used to specify a list
  of virtual IP addresses when creating a VRRP router, the "-P"
  option can be used to specify the primary IP address that
  is used to send VRRP advertisement packets. Further, if the "-f"
  option is specified, vrrpadm will create/plumb the VNIC if that
  has not been done:

    # vrrpadm create-router -V 12 -l bge1 -A inet -p 100 -o no_accept \
      -a 11.1.1.1/24 -P 11.1.1.100/24 -f vrrp1

  In this case, vrrpadm will create and plumb the VNIC, configure the
  virtual IP address over it and bring up the primary IP adddress
  over bge1. From this point on, the system will behave exactly
  the same as if the VNIC and and virtual IP addresses were
  configured by the other tools/services. 

  Note that the administrator has to unconfigure all the virtual IP
  addresses and delete the VNIC using other tools to completely
  cleanup the VRRP configuration.

  More details of the vrrpadm and dladm configuration changed are
  discussed in [1] and [2].

- VLAN support

  The administrative model described above allows the administrator
  to configure the VRRP router over a VLAN. Note that the VRRP
  special VNIC has to be created using the "-v" option, which
  specifies the VLAN ID. For example:
   
    # dladm create-vnic -m vrrp -V 14 -A inet -l bge1 -v 2 vrrp_vnic2
    # dladm create-vlan -l bge1 -v 2 vlan1
    # vrrpadm create-router -V 14 -l vlan1 -A inet -p 100 -o no_accept vrrp2
    # ifconfig vrrp_vnic2 plumb 12.1.1.1/24
    # ifconfig vlan1 plumb 12.1.1.100/24 up

- Exclusive zone support

  In the new design, VNICs will no longer be created internally by
  the vrrpd daemon, instead, one can create the VRRP special VNIC
  in the global-zone and assign the VNIC to the non-global zone
  where the VRRP router is configured.

  The VRRP (network/vrrp/default) service will be started in the
  non-global and it will start the vrrpd daemon.

- Other miscellaneous changes

  * Least privileges of the vrrpd daemon

    In the VRRP SMF manifest, the vrrpd will be set to be run by the
    "root" user, and its privilege property will be set to only
    include the "basic" privilege and the following privileges:

      - priv_sys_config
 
        Required to post VRRP sysevents. Note that this privilege
        is only needed in the global zone since sysevents are not
        supported in the non-global zone.

      - priv_net_rawaccess

        Required to hold the physical data-link (which owns the
        primary IP address) and the vnic (which owns the virtual
        IP addresses) open to prevent them from being deleted.

      - priv_net_icmpaccess

        Required to open the RAW socket

      - priv_sys_ip_config

        Required to bring up/down the virtual IP addresses and set
        the SO_VRRP socket option

  * solaris.network.vrrp authorization

    A new solaris.network.vrrp authorization will be introduced and
    will be required to configure the VRRP service. Note that it will
    only be needed by the "write" operation but not the "read-only"
    operation (e.g., "vrrpadm show").

    The solaris.network.vrrp authorization will be added to the
    "Network Management" profile. 

  * Interaction with the existing IP address autoconfiguration tools

    in.ndpd (IPv6 autoconfiguration) and dhcpagent (DHCP client) are
    the two major IP address autoconfiguration tools exist in Solaris.
    Because the master and the backup VRRP routers (VNICs) share the
    same mac address, this will simply confuses in.ndpd and dhcpagent
    and eventually cause unexpected results. Therefore, IPv6
    autoconfiguration and DHCP configuration will not be supported
    over VRRP VNICs.

    If an administrator configures either IPv6 autoconfiguration or
    DHCP over a VRRP VNIC, since neither in.ndpd or dhcp client sets
    the SO_VRRP socket option, the attempt to bring up the
    auto-configured IP address will fail, which causes the failure
    of the autoconfiguration operation.

    An ongoing project (ipadm) will make static IPv6 address
    configuration and IPv6 autoconfiguration exclusive to each other
    on a specific IPv6 interface. Since VRRP configuration requires
    static IP address configuration, it will prevent in.ndpd from
    trying to autoconfigure over a VRRP VNIC from the beginning.
   
  * Service dependencies

    The vrrpd daemon is started by the svc:/network/vrrp:default
    service, which depends on the svc:/network/physical service and
    the svc:/system/filesystem/usr service. The former service is
    required to configure the data-links and the IP addresses needed
    by the specific VRRP router, and the latter service is required
    since the vrrpd, vrrpadm and libvrrpadm.so.1 binaries reside
    under the /usr directory.

  * Removing the protected service support

    In the old VRRP design, there is a notion of protected services:
    an administrator can specify a set of SMF services to be
    protected by a VRRP router. To support that, vrrpd tracks the
    state of those SMF services and when vrrpd finds the service
    being offline/disabled, vrrpd changes the state of the VRRP
    router so the router will no longer be the master. Another
    backup router becomes the master and the SMF service can be
    started there.

    After discussion with the team that potentially would consume
    this feature (the ILB team), we found that this is an
    incomplete solution which does not simplify the management
    complexity. Ideally, the HA support configuration solution
    would be: 

    configuring the following on the owner of the virtual IP addresses: 
         - the virtual IP address(es)
         - the VRRP VRID
         - the IP addresses of the backup(s)
         - the ILB server groups, rules, hc, etc
         - some security configiguration (ssh keys) that allow
           automatic synchronization of things with the backups

    and all the other configurations and sync up between master and
    backup should be automatic.

    Clearly, the protected service support proposed by the old VRRP
    design does not help with the ideal world, therefore, we decide
    to remove the protected service support.

Interface Table
===============

    Interface                   Classification   Comments
    ==================          ==============   ========
    IFF_NOACCEPT                Committed        no_accept VRRP mode
    IFF_VRRP                    Committed        VRRP virtual IP address
    SO_VRRP                     Committed        VRRP control socket option

    RTM_CHGADDR                 Committed        New PF_ROUTE event
    RTM_FREEADDR                Committed        New PF_ROUTE event

    solaris.network.vrrp        Committed        authorization required by
    authorization                                VRRP configuration

    /usr/sbin/vrrpadm           Committed        VRRP administration tool
    vrrpadm show-router output  Uncommitted

    dladm create-vnic -m vrrp   Committed        create VRRP vnics

    dladm_vnic_create()         Consolidation
                                Private
    VNIC_MAC_ADDR_TYPE_VRID     Consolidation    New vnic_mac_addr_type_t
                                Private

    MAC_CAPAB_VRRP              Project Private  MAC layer capability
    DL_CAPBB_VRRP               Project Private  DLPI capability

    svc:/network/vrrp:default   Project Private  The service that starts vrrpd
    /usr/sbin/vrrpd             Project Private  VRRP daemon
    /usr/lib/libvrrpadm.so      Project Private  VRRP library
    /etc/vrrp.conf              Project Private

References
==========

    [1] VRRP design specification
    [2] vrrpadm(1M), vrrpd(1M), dladm(1M) manpages



Reply via email to