I'm submitting this fast-track for Cathy Zhou, it times out on 07/17/2009. This case depends on PSARC/2008/693. 2008/693 requested "micro" release binding, but no incompatible changes are introduced, and "patch" would be more appropriate (although no backport is planned). As such, this case both requests "patch" binding, and updates 2008/693 to have "patch" binding as well.
The materials directory contains this specification (vrrp_psarc) as well as the documents listed in the References section. VRRP Update: Summary ======= This case describes several design issues of the original VRRP case (PSARC/2008/693 VRRP) and the proposals to address those issues. Problem area ============ Specifically, the problems with the existing VRRP design are: 1) Incorrect false accept_mode support According to the VRRP protocol, accept_mode can be set to either true or false by an administrator over a non-address-owner VRRP virtual router. If the accept_mode is set to be false, when the VRRP virtual router becomes the master (non-backup), this virtual router must not accept packets destined to the virtual IP addresses which are configured on this router. But the router must respond to the ARP request/ND solicitations for the virtual IP addresses. The existing VRRP design described two approaches to support the false accept_mode: a. For non-link-local virtual IP addresses, add the ARP/ND cache using the SIOCSXARP/SIOCLIFSETND ioctl in order to respond to the ARP request/ND solicitations. Note that these virtual IP addresses are not brought up on the VRRP interfaces. b. For link-local virtual IP address, "a new interface flag IFF_LL_NOACCEPT is introduced to mark a VNIC as non-accept mode. On receiving ioctl SIOCSLIFFLAGS request for IFF_LL_NOACCEPT, the IRE entry for the link-local address of the interface will be marked as IRE_MARK_NOACCEPT. If an ire which is looked up for a link local destination address turns out to have this flag marked, the input packets will be dropped in ip_rput_data_v6()". The above approaches do not work. Note that with approach (a), no IP address is brought up on the specific interface (in the IPv4 case), the ill of this interface will not be "bound" and the SIOCSXARP ioctl will simply fail. Furthermore, both approaches would drop all the unicast packets including the unicast Neighbor solicitation, which makes the Neighbor Unreachability Detection mechanism unusable. 2) Duplication of administrative interfaces and system services The administration model proposed by PSARC/2009/693 assumes that VRRP service has full management over the VRRP IP interfaces, the primary IP addresses and and the virtual IP addresses used by the specific VRRP router. The vrrpd daemon plumbs/unplumbs the VRRP interfaces and configures/unconfigures the associated IP addresses based on the current state of the VRRP router. But the design does not consider the interaction with the other existing IP interfaces configuration tools (network/physical service, ifconfig etc.), and the potential impacts of such interaction is unkown. More important, since the existing design assumes the entire control over the VRRP interface IP addresses, it makes the vrrpadm another administrative tool to configure IP addresses, which will never be as flexible as the existing tool. E.g., do we need to extend vrrpadm to configure one specific IP address to be the "preferred" address? Likewise, since the vrrpd internally create/destroy the special VRRP VNICs, we will lose the flexibilities provided by the existing "dladm" command and the feature it provides (flows, bandwidth, priority etc.). 3) No exclusive-zone support With the old VRRP PSARC case, the administrator specifies the name of the interface that will be managed by VRRP, and vrrpd would internally create and plumb a special VRRP VNIC over that interface and configure virtual IP addresses over that VNIC. This makes the VRRP support in an exclusive-zone problematic, since creating VNIC in an non-global zone is not supported. 4) No VLAN support The same design makes VRRP support over VLAN problematic as well, since the current VRRP design blindly try to create VNIC over the specified interface, and creating VNIC over an VLAN is not allowed either. Proposal Overview ================= - vrrpadm changes 1) Add "-router" to each vrrpadm subcommand Following the precedent of "dladm", all the vrrpadm subcommands will contain the object of the operation - the "router". For example, "vrrpadm create" will be changed to "vrrpadm create-router", and "vrrpadm show" will be changed to "vrrpadm show-router". 2) Change the "startup/shutdown" subcommands to "enable-router/disable-router" "startup" sounds like something that needs to be done on each reboot, which is not the case. - MAC_CAPAB_VRRP, DL_CAPAB_VRRP, IFF_VRRP and SO_VRRP For each VRRP router, a special VRRP VNIC is created with the special VRRP virtual MAC address. All the IP addresses reside on this VNIC are regarded as virtual IP addresses protected by the VRRP router. The vrrpd daemon brings up those addresses when the router becomes master and brings down the addresses when the router becomes backup. In other words, vrrpd has full management of the up/down state of the virtual IP addresses and no other applications and services are allowed to change the up/down state of those IP addresses. A new mac capability MAC_CAPAB_VRRP will be introduced and the special VRRP VNICs will have such capability. The VNICs will then advertise a new DL_CAPAB_VRRP DLPI capability as part of the DL_CAPABILITY_REQ/ACK negotiation with IP. IP will learn that the corresponding ill is VRRP capable and mark each IP addresses configured over such ill with a IFF_VRRP flag, to indicate that is a VRRP virtual IP address. To make VRRP service the only authorization to bring up and down the virtual IP addresses, a new SO_VRRP socket option will be introduced. Socket has the SO_VRRP socket option set is a VRRP control socket, and only VRRP control sockets are allowed to change the IFF_UP flag of a VRRP virtual IP address (with IFF_VRRP set). Other attempts to change the IFF_UP flag will fail. The priv_sys_ip_config privilege is required to set the SO_VRRP socket option. For now, the vrrpd daemon will be the only application to set the SO_VRRP socket option on the socket changing the IFF_UP flag of the virtual IP addresses, in order to manage their up/down state based on the state of the VRRP router. - IFF_NOACCEPT When a VRRP router becomes master, if its accept_mode is false, all the virtual IP addresses will be brought up but with the new IFF_NOACCEPT flag set by vrrpd. IP will mark all the local IREs associated with these IFF_NOACCEPT IP addresses to be "no_accept", and all the received unicast local packets will be dropped, with the exception of the Neighbor Solicitation packets and Neighbor Advertisement packets. This allows the Neighbor Unreachability Detection mechanism work as expected in the false accept_mode. The IFF_NOACCEPT flag can only be set on a IP addresses if its IFF_VRRP flag is set. Further, the same SO_VRRP socket option must be set to change the IFF_NOACCEPT flag over a IP address. - Interact with the existing data-link and IP administrative tools To address issue 2, VRRP configuration will be integrated with the existing IP administrative model seamlessly: the existing data-link and IP administrative tools and service will be able to be used to create/delete the VRRP special VNICs, plumb/unplumb the physical interfaces (which own the primary IP address) and VRRP VNICs (which own the virtual IP addresses), and configure the primary and virtual IP addresses needed by a VRRP router. The vrrpadm command will still be needed to create/delete a VRRP router, and configure the primary arguments required by the VRRP protocol: the VRID, the address family, the interface over which the VRRP router is created on, the priority, the advertisement interval etc. The "dladm create-vnic" subcommand will be extended to create the VRRP special VNICs where the virtual IP addresses reside on. A new mac address keyword "vrrp" will be introduced and will be used to create VRRP VNICs[2]. A new vnic_mac_addr_type_t VNIC_MAC_ADDR_TYPE_VRID will be added and the dladm_vnic_create() API will be extended accordingly: dladm_status_t dladm_vnic_create(dladm_handle_t handle, const char *vnic, datalink_id_t linkid, vnic_mac_addr_type_t type, uchar_t * mac_addr, uint_t mac_len, int *mac_slot, uint_t mac_prefix_len, uint16_t vid, vrid_t vrid, int af, datalink_id_t *vnic_id_out, dladm_arg_list_t *proplist, uint32_t flags); To track the VRRP virtual IP addresses, the vrrpd daemon will determine the VRRP special VNIC used by a specific VRRP router based on the VRID, the IP address family (IPv4 or IPv6) and the physical interface (including VLANs and aggregations) the router is created on, and regard all the IP addresses configured over the VNIC as the virtual IP addresses associated with this VRRP router. Since the existing PF_ROUTE event opcodes only report the changes of the set of "UP" IP addresses, and the vrrpd daemon needs to track all the virtual IP addresses configured over each VRRP router, regardless the IP address is brought up or not. Therefore, two new PF_ROUTE event opcodes will be introduced to report the changes of the IP addresses configuration, including the IP addresses that have not been brought up: - RTM_CHGADDR The RTM_CHGADDR event will be generated when a new IP address is newly configured (added or updated to). - RTM_FREEADDR The RTM_FREEADDR event will be generated when a IP address is removed from the configuration. The message format of the above routing socket events will be the same as the format of the RTM_NEWADDR/RTM_DELADDR messages. Note that both events will not report the unspecified (all-zero) IP addresses. Further, vrrpd will also track all the "UP" IP addresses configured over the physical interface, and select one as the primary IP address which will be used to send the VRRP advertisement. Below gives an example which creates an IPv4 VRRP router (vrrp1) with VRID 12, priority 100 and false accept_mode over the bge1 data-link. The virtual IP addresses are 11.1.1.1/24, and the primary IP address in the VRRP advertisement is 11.1.1.100. Note that the IP addresses in this example could be configured by any other IP configuration tools/services other than "ifconfig". # dladm create-vnic -m vrrp -V 12 -A inet -l bge1 vrrp_vnic1 # vrrpadm create-router -V 12 -l bge1 -A inet -p 100 -o no_accept vrrp1 # ifconfig vrrp_vnic1 plumb 11.1.1.1/24 # ifconfig bge1 plumb 11.1.1.100/24 up Since in some cases, the VRRP configuration is pretty simple and the administrator would prefer to have one single administrative tool to configure everything related to a VRRP router. Therefore, we give the option to an administrator to do that using only vrrpadm: the optional "-a" option can be used to specify a list of virtual IP addresses when creating a VRRP router, the "-P" option can be used to specify the primary IP address that is used to send VRRP advertisement packets. Further, if the "-f" option is specified, vrrpadm will create/plumb the VNIC if that has not been done: # vrrpadm create-router -V 12 -l bge1 -A inet -p 100 -o no_accept \ -a 11.1.1.1/24 -P 11.1.1.100/24 -f vrrp1 In this case, vrrpadm will create and plumb the VNIC, configure the virtual IP address over it and bring up the primary IP adddress over bge1. From this point on, the system will behave exactly the same as if the VNIC and and virtual IP addresses were configured by the other tools/services. Note that the administrator has to unconfigure all the virtual IP addresses and delete the VNIC using other tools to completely cleanup the VRRP configuration. More details of the vrrpadm and dladm configuration changed are discussed in [1] and [2]. - VLAN support The administrative model described above allows the administrator to configure the VRRP router over a VLAN. Note that the VRRP special VNIC has to be created using the "-v" option, which specifies the VLAN ID. For example: # dladm create-vnic -m vrrp -V 14 -A inet -l bge1 -v 2 vrrp_vnic2 # dladm create-vlan -l bge1 -v 2 vlan1 # vrrpadm create-router -V 14 -l vlan1 -A inet -p 100 -o no_accept vrrp2 # ifconfig vrrp_vnic2 plumb 12.1.1.1/24 # ifconfig vlan1 plumb 12.1.1.100/24 up - Exclusive zone support In the new design, VNICs will no longer be created internally by the vrrpd daemon, instead, one can create the VRRP special VNIC in the global-zone and assign the VNIC to the non-global zone where the VRRP router is configured. The VRRP (network/vrrp/default) service will be started in the non-global and it will start the vrrpd daemon. - Other miscellaneous changes * Least privileges of the vrrpd daemon In the VRRP SMF manifest, the vrrpd will be set to be run by the "root" user, and its privilege property will be set to only include the "basic" privilege and the following privileges: - priv_sys_config Required to post VRRP sysevents. Note that this privilege is only needed in the global zone since sysevents are not supported in the non-global zone. - priv_net_rawaccess Required to hold the physical data-link (which owns the primary IP address) and the vnic (which owns the virtual IP addresses) open to prevent them from being deleted. - priv_net_icmpaccess Required to open the RAW socket - priv_sys_ip_config Required to bring up/down the virtual IP addresses and set the SO_VRRP socket option * solaris.network.vrrp authorization A new solaris.network.vrrp authorization will be introduced and will be required to configure the VRRP service. Note that it will only be needed by the "write" operation but not the "read-only" operation (e.g., "vrrpadm show"). The solaris.network.vrrp authorization will be added to the "Network Management" profile. * Interaction with the existing IP address autoconfiguration tools in.ndpd (IPv6 autoconfiguration) and dhcpagent (DHCP client) are the two major IP address autoconfiguration tools exist in Solaris. Because the master and the backup VRRP routers (VNICs) share the same mac address, this will simply confuses in.ndpd and dhcpagent and eventually cause unexpected results. Therefore, IPv6 autoconfiguration and DHCP configuration will not be supported over VRRP VNICs. If an administrator configures either IPv6 autoconfiguration or DHCP over a VRRP VNIC, since neither in.ndpd or dhcp client sets the SO_VRRP socket option, the attempt to bring up the auto-configured IP address will fail, which causes the failure of the autoconfiguration operation. An ongoing project (ipadm) will make static IPv6 address configuration and IPv6 autoconfiguration exclusive to each other on a specific IPv6 interface. Since VRRP configuration requires static IP address configuration, it will prevent in.ndpd from trying to autoconfigure over a VRRP VNIC from the beginning. * Service dependencies The vrrpd daemon is started by the svc:/network/vrrp:default service, which depends on the svc:/network/physical service and the svc:/system/filesystem/usr service. The former service is required to configure the data-links and the IP addresses needed by the specific VRRP router, and the latter service is required since the vrrpd, vrrpadm and libvrrpadm.so.1 binaries reside under the /usr directory. * Removing the protected service support In the old VRRP design, there is a notion of protected services: an administrator can specify a set of SMF services to be protected by a VRRP router. To support that, vrrpd tracks the state of those SMF services and when vrrpd finds the service being offline/disabled, vrrpd changes the state of the VRRP router so the router will no longer be the master. Another backup router becomes the master and the SMF service can be started there. After discussion with the team that potentially would consume this feature (the ILB team), we found that this is an incomplete solution which does not simplify the management complexity. Ideally, the HA support configuration solution would be: configuring the following on the owner of the virtual IP addresses: - the virtual IP address(es) - the VRRP VRID - the IP addresses of the backup(s) - the ILB server groups, rules, hc, etc - some security configiguration (ssh keys) that allow automatic synchronization of things with the backups and all the other configurations and sync up between master and backup should be automatic. Clearly, the protected service support proposed by the old VRRP design does not help with the ideal world, therefore, we decide to remove the protected service support. Interface Table =============== Interface Classification Comments ================== ============== ======== IFF_NOACCEPT Committed no_accept VRRP mode IFF_VRRP Committed VRRP virtual IP address SO_VRRP Committed VRRP control socket option RTM_CHGADDR Committed New PF_ROUTE event RTM_FREEADDR Committed New PF_ROUTE event solaris.network.vrrp Committed authorization required by authorization VRRP configuration /usr/sbin/vrrpadm Committed VRRP administration tool vrrpadm show-router output Uncommitted dladm create-vnic -m vrrp Committed create VRRP vnics dladm_vnic_create() Consolidation Private VNIC_MAC_ADDR_TYPE_VRID Consolidation New vnic_mac_addr_type_t Private MAC_CAPAB_VRRP Project Private MAC layer capability DL_CAPBB_VRRP Project Private DLPI capability svc:/network/vrrp:default Project Private The service that starts vrrpd /usr/sbin/vrrpd Project Private VRRP daemon /usr/lib/libvrrpadm.so Project Private VRRP library /etc/vrrp.conf Project Private References ========== [1] VRRP design specification [2] vrrpadm(1M), vrrpd(1M), dladm(1M) manpages