I am sponsoring the following fasttrack for Srivijitha Dugganapalli
set to 01/28/2009. Micro/patch binding is requested
1. Introduction
1.1. Project/Component Working Name:
scsi_hba_pkt_comp(9F)
1.2. Name of Document Author/Supplier:
Author: Srivijitha Dugganapalli
1.3. Date of This Document:
18 January, 2009
4.0 Technical Description
4.1 Background
The scsi_pkt(9S) 'pkt_comp' field [1] is the completion callback
routine established by a SCSA target driver (sd disk driver) before
calling scsi_transport(9F) to execute a scsi_pkt(9S). Currently,
the 'pkt_comp' completion callback is invoked via
(*pkt->pkt_comp)(pkt);
by the SCSA HBA driver - transferring control directly from the HBA
driver back to the target driver.
4.2 Problem
We currently have two issues related to completion of a
scsi_pkt(9S):
1. With SCSI, changes in lun configuration are communicated
in-band via special status information [2]. Solaris needs to
observe completed commands, and detect the in-band lun level
hotplug events to trigger dynamic lun reconfiguration.
2. Having an interrupt thread completely process the entire
'pkt_comp' code path serially can cause scalability problems.
Implementing asynchronous 'pkt_comp' interrupt processing with
fanout to threads running on other cpus has been shown to help
performance significantly under heavy load.
4.3 Proposal
Both of these issues can be effectively worked from common SCSA
framework code by introducing a new public scsi_hba_pkt_comp(9F)
function. To use scsi_hba_pkt_comp(9F), the changes to an HBA
driver are minimal: instead of
(*pkt->pkt_comp)(pkt);
the HBA driver does
scsi_hba_pkt_comp(pkt);
In addition to the the problems mentioned above, the SCSA framework
support for the tran_setup_pkt(9E) interfaces introduced by
PSARC/2006/240 [3] can be simplified by requiring HBA drivers that
use tran_setup_pkt(9E) to also use the new scsi_hba_pkt_comp(9F)
function. This is '3' below. We currently have a small number of
HBA drivers in ON (adpu320, fcp, mpt, blk2scsa, and pmcs(under
development)) that must be modified to comply with this new
requirement. While tran_setup_pkt(9E) is public, it has not been
backported to S10. Since nevada has not been 'released', adding new
requirements to tran_setup_pkt(9E) is permissible.
The proposed scsi_hba_pkt_comp(9F) function will allow the SCSA
framework to observe completion and perform completion
synchronization and optimization.
1. Hotplug: The SCSA framework can observe lun hotplug events. Lun
level hotplug events are reported in-band via scsi check
conditions with specific ASC/ASCQ values that indicate a
'rescan' of the lun configuration is required. This information
is required for SCSA to perform dynamic lun reconfiguration
2. Fanout: The SCSA framework can implement completion fanout,
transparent to the SCSA HBA and target drivers.
3. PSARC/2006/240 implementation: Perform DMA sync prior to
'pkt_comp' inside implementation of new scsi_hba_pkt_comp(9F).
This eliminates the 'pcw_orig_comp' interpose currently used in
scsi_transport().
The initial putback will deliver the new scsi_hba_pkt_comp(9F)
interface, and use the new interface for '3'. The initial putback
enables subsequent delivery of hotplug observation and completion
fanout inside SCSA. The 'hotplug' use case is related to SCSAv3
lun-level hotplug support - which is actively being worked on. The
'fanout' use case is unplanned, but HBA drivers are already
starting to use non-DDI interfaces to solve this type of problem
[5] in invasive ways.
The performance overhead associated with scsi_hba_pkt_comp(9E) is
expected to be minimal, and the future delivery of completion
fanout is expected to result in improved performance.
A call to the scsi_hba_pkt_comp() function will always result in a
call to the 'pkt_comp' callback function defined in the
scsi_pkt(9S). This 'pkt_comp' callback may, however, occur after
return from scsi_hba_pkt_comp() and may occur from a different
thread executing on a different CPU.
With this case:
o HBA drivers that implement tran_setup_pkt(9E) must use
scsi_hba_pkt_comp(9F).
Failure to comply will result in missed dma_sync operations.
Man page changes covering this new requirement are provided
below.
o HBA driver that use SCSI_HBA_ADDR_COMPLEX [4] must use
scsi_hba_pkt_comp(9F).
Failure to comply will result in improper lun-level hotplug
behavior.
Currently SCSI_HBA_ADDR_COMPLEX is private, when it is promoted
man page changes describing the scsi_hba_pkt_comp(9F)
dependency will be provided.
o Other HBA drivers can switch to using scsi_hba_pkt_comp() at
any time.
Switching early should enable immediate benefit when completion
fanout is implemented.
During internal discussion, the desire for a vector fanout
interface, called something like scsi_hba_pkt_comp_vec(9F), was
raised. This interface would take a vector of scsi_pkt(9S)s to
complete. While this is a good idea, it is not this case. Such an
interface should be investigated by those delivering a real
'fanout' implementation. Also, at that time, careful consideration
should also be given to ensure that no semantic completion-order
related hazards are being introduced by the fanout implementation.
4.4 Interfaces:
------------------------------------------------------------------
Interface Name Comm.Level Comments
------------------------------------------------------------------
scsi_hba_pkt_comp(9F) Committed use instead of
(*pkt->pkt_comp)(pkt);
Prototype:
void scsi_hba_pkt_comp(struct scsi_pkt *pkt);
4.5 Man page changes:
New scsi_hba_pkt_comp(9F) manpage required, see Appendix A.
Modified scsi_pkt(9S) and tran_setup_pkt(9E), see Appendix B and C.
4.6 Release Binding
Micro/patch binding is requested.
4.7 References
[1] scsi_pkt(9S)
http://docs.sun.com/app/docs/doc/816-5181/scsi-pkt-9s?a=view
[2] http://www.t10.org/lists/asc-num.htm
From the above table, following ASC/ASCQ value for lun-hotplug
3F/0E DTLPWROMAE REPORTED LUNS DATA HAS CHANGED
[3] PSARC/2006/240 scsa dma enhancement
http://sac.eng/PSARC/2006/240
http://www.opensolaris.org/os/community/arc/caselog/2006/240
[4] PSARC/2008/675 SCSI_HBA_ADDR_COMPLEX SCSA data structure linkage
http://sac.eng/PSARC/2008/675
http://www.opensolaris.org/os/community/arc/caselog/2008/675
[5] mpt driver needs to deliver competitive IO throughput for SSD
6784459 mpt driver needs to deliver competitive IO throughput for SSD
http://onnv.sfbay/log/nv/2009/01/14.brian.xu.560cd237cd72/webrev/
http://monaco.sfbay.sun.com/detail.jsf?cr=6784459
Appendix A: scsi_hba_pkt_comp(9F) man page
Kernel Functions for Drivers scsi_hba_pkt_comp(9F)
NAME
scsi_hba_pkt_comp - scsi_pkt(9S) completion routine
SYNOPSIS
#include <sys/scsi/scsi.h>
void scsi_hba_pkt_comp(struct scsi_pkt *pkt);
INTERFACE LEVEL
Solaris DDI specific (Solaris DDI).
PARAMETERS
pkt Pointer to a scsi_pkt(9S) structure.
DESCRIPTION
After filling in scsi_pkt(9S) fields with packet completion
information, an HBA driver should call the scsi_hba_pkt_comp()
function. The scsi_hba_pkt_comp() function is the recommended
way for an HBA driver to signal completion of a scsi_pkt(9S).
Use is mandatory for HBA drivers that use tran_setup_pkt(9E).
Calling the scsi_hba_pkt_comp() function allows SCSA to observe,
and possibly react to the completion of a scsi_pkt(9S) request.
A call to the scsi_hba_pkt_comp() function will always result in
a call to the 'pkt_comp' callback function defined in the
scsi_pkt(9S). This 'pkt_comp' callback may, however, occur after
return from scsi_hba_pkt_comp(), and may occur from a different
thread executing on a different CPU.
CONTEXT The scsi_hba_pkt_comp() function can be called from user,
interrupt, or kernel context.
ATTRIBUTES
See attributes(5) for descriptions of the following
attributes:
____________________________________________________________
| ATTRIBUTE TYPE | ATTRIBUTE VALUE |
|_____________________________|_____________________________| |
Interface Stability | Committed |
|_____________________________|_____________________________|
SEE ALSO
scsi_pkt(9S), tran_setup_pkt(9E)
NOTE:
HBA driver calls scsi_hba_pkt_comp() instead of calling
scsi_pkt(9S) 'pkt_comp' directly
Appendix B: scsi_pkt(9S) manpage diffs
diff -U 2 scsi_pkt.txt scsi_pkt.new.txt
--- scsi_pkt.txt Thu Dec 11 14:13:13 2008
+++ scsi_pkt.new.txt Sat Jan 17 21:43:30 2009
@@ -358,5 +358,13 @@
SEE ALSO
tran_init_pkt(9E), scsi_arq_status(9S), scsi_init_pkt(9F),
- scsi_transport(9F), scsi_status(9S)
+ scsi_transport(9F), scsi_status(9S), scsi_hba_pkt_comp(9F)
+NOTE
+ HBA drivers should signal scsi_pkt(9S) completion by
+ calling scsi_hba_pkt_comp(9F). This is mandatory for
+ HBA drivers that implement tran_setup_pkt(9E), failure
+ to comply results in undefined behavior.
Appendix C: tran_setup_pkt(9E) manpage diffs
diff -U 2 tran_setup_pkt.txt tran_setup_pkt.new.txt
--- tran_setup_pkt.txt Sat Jan 17 20:53:46 2009
+++ tran_setup_pkt.new.txt Sat Jan 17 21:32:58 2009
@@ -141,4 +141,10 @@
tor.
+ HBA drivers that implement tran_setup_pkt() must signal
+ scsi_pkt(9S) completion by calling scsi_hba_pkt_comp(9F) - direct
+ use of the scsi_pkt(9S) 'pkt_comp' field is not permitted, and
+ result in undefined behavior.
RETURN VALUES
tran_setup_pkt() must return zero on success, and -1 on
@@ -151,5 +157,6 @@
scsi_hba_attach(9F), scsi_hba_pkt_alloc(9F),
scsi_hba_pkt_free(9F), scsi_init_pkt(9F), buf(9S),
- scsi_address(9S), scsi_hba_tran(9S), scsi_pkt(9S)
+ scsi_address(9S), scsi_hba_tran(9S), scsi_pkt(9S),
+ scsi_hba_pkt_comp(9F)
6. Resources and Schedule
6.4. Steering Committee requested information
6.4.1. Consolidation C-team Name:
ON
6.5. ARC review type: FastTrack
6.6. ARC Exposure: open