I'm sponsoring this fasttrack for Haik Aftandilian. This is an Open case seeking Patch binding (for backport to S10). Timeout on 11/05/2009.
A copy of the proposed contract is in the case directory. Dan Template Version: @(#)sac_nextcase 1.68 02/23/09 SMI This information is Copyright 2009 Sun Microsystems 1. Introduction 1.1. Project/Component Working Name: LDOM-SunCluster suspend callbacks 1.2. Name of Document Author/Supplier: Author: Haik Aftandilian 1.3 Date of This Document: 29 October, 2009 4. Technical Description Introduction ------------ Solaris Cluster (SC) runs in LDoms guest domains and provides infrastructure to make applications deployed in the guest domains HA. SC cluster nodes monitor each other via (so called) heartbeats, as well associated hardware. SC manages Storage devices in a way which is specific to the server it is running on, such as performing SCSI reservations, which are meaningful only from a specific physical HBA. LDoms guest domains can be migrated from one server to another with the LDoms "Warm Migration" feature. During the migration, the domain being migrated is suspended. While a domain is suspended, which can be several minutes, the domain is totally inactive and not responsive to any requests. Thus, if a domain running SC is migrated from one server to another, other cluster nodes need to be made aware of this fact so that they can suspend monitoring of the cluster node under migration. Additionally, cluster nodes need to act co-operatively to make sure the SCSI reservations on storage devices are also migrated correctly. The proposed callbacks would allow SC to perform these tasks, thereby enabling a seamless migration of the LDoms guest domain from an end user perspective. References ---------- 1. Suspend Domain Service http://sac.eng/Archives/CaseLog/arc/FWARC/2009/559/ 2. Current list of Sun Cluster/ON contract cases in use in Solaris 10. /ws/osc-gate/usr/src/uts/sparc/cl/imported_symbols.private.Sol_10 3. Example of an existing Sun Cluster/ON contract case. http://sac.eng/Archives/CaseLog/arc/PSARC/2005/602/ Overview -------- In the Solaris kernel, hooks/callback functions will be run before and after the domain is suspended. A single callback will be made to SC before the suspension and a single callback will be made after the resumption. Note that the use of "suspend" in this contract only applies to suspend operations initiated by LDoms infrastructure using the suspend domain service on sun4v guest domains. And that today, suspend operations are only performed to facilitate LDoms domain migration. These suspend operations are entirely distinct from CPR suspend operations. Commitment level for all the interfaces: Contracted Project Private Interface Details ----------------- When SC is loaded and wishes to receive suspend notifications, it will set the callback function pointers to point to SC functions that handle the notifications. When setting these callbacks, the cl_suspend_error_decode callback should be set first, then the cl_suspend_post_callback, and then the cl_suspend_pre callback. The cl_suspend_pre_callback and cl_suspend_post_callback will never be invoked concurrently and solaris will wait indefinitely for the callbacks to return. Pre-suspend callback: int (*cl_suspend_pre_callback)(void); Called before the domain is suspended. This serves to notify SC that this domain is in the process of being suspended. SC should return 0 if it successfully suspended monitoring of this domain. If a failure occurred which should prevent the guest domain from being suspended and possibly migrated, or if SC can not support a migration at this time, SC should return a non-zero error code. If the cl_suspend_pre_callback returns an error code, the suspension will aborted. The intent is for this error to be sent back to the domain manager and then used to build an error message informing the user why the migration could not be completed. Post-suspend callback: int (*cl_suspend_post_callback)(void); Called after the guest domain has been resumed following a successful suspension. It is also called after a failed suspension attempt as well as a canceled suspension attempt. i.e., it is possible that this function will be called when the guest domain was suspended and then resumed without being migrated (as a result of a failure or cancellation). It can also be called even when the guest domain was never suspended (due to a failure before the suspension) and therefore never migrated. If the callbacks are set after a suspend operation is already in progress, since the pre callback is set after the post callback, it is also possible that this function will be called after a suspension even when the cl_suspend_pre_callback was not called. Therefore, SC should not consider it an error if cl_suspend_post_callback is called before cl_suspend_pre_callback without a corresponding call to cl_suspend_pre_callback. SC should return 0 if it successfully resumed monitoring of this domain. If a failure occurred which prevents the guest domain from resuming normal activity in the cluster, a non-zero error value should be returned. The error will be sent back to the domain manager which will display an error message informing the user that an error occurred after the migration and that manual inspection and recovery may be required for the domain to resume normal operation. The domain will have been resumed and Solaris and applications will be running. Error code decode callback: const char *(*cl_suspend_error_decode)(int); Called at any time to convert an error code returned from the cl_suspend_pre_callback or cl_suspend_post_callback into a descriptive error string suitable for use in an error message presented to the user. Returns a NULL- terminated statically allocated string of length less than or equal to 256 including the NULL terminator. The caller will consider this string immutable and will not modify it or deallocate it. This function may return NULL. When the cl_suspend_pre_callback or cl_suspend_post_callback return an error, cl_suspend_error_decode will be used to obtain an error message string that corresponds to the error. i.e., cl_suspend_error_decode will be called and its argument will be an error code returned from cl_suspend_pre_callback or cl_suspend_post_callback. 6. Resources and Schedule 6.4. Steering Committee requested information 6.4.1. Consolidation C-team Name: ON 6.5. ARC review type: FastTrack 6.6. ARC Exposure: open