Template Version: @(#)sac_nextcase 1.68 02/23/09 SMI
This information is Copyright 2009 Sun Microsystems
1. Introduction
    1.1. Project/Component Working Name:
         bd - generic block device driver
    1.2. Name of Document Author/Supplier:
         Author:  Garrett D'Amore
    1.3  Date of This Document:
        29 November, 2009
4. Technical Description

Background
----------

There are a number of storage devices which express a simple block
oriented architecture, but which are not truly SCSI devices.  Examples
of such devices are various flash media (e.g. SDcard, CF, and Memory
Stick) and more recently storage adapters like the DDRdrive X1
(www.ddrdrive.com).  These devices are not natively SCSI, and don't
understand on their own the SCSI command set.

As part of PSARC 2007/654, we introduced a translation layer
(blk2scsa) which processes SCSI packets and allows these devices to be
presented on a logical SCSI bus so that they can be supported by
sd(7D).  While this approach has so far met with some success, we've
gained some experience and this approach has been found to add
significant additional complexity to the system, with consequent
impacts on performance, diagnosability, and maintainability.  The
action of creating a SCSI packet (done by sd(7d)) only to have to
parse it in software later in the HBA is fairly inefficient.

We therefore would like introduce a new block device driver (bd), to
be used instead of blk2scsa, in order to simplify the system and
increase total performance, with fewer total lines of supporting code.

Because we might in the future like to offers support for some of
these storage adapters on Solaris 10, we are requesting Patch binding,
although we have no specific plans to backport at this time.

Architecture
------------

The "bd" driver will be used as a block-oriented device driver for devices
that need general block device support.

Adapter device drivers will depend on this driver (-N drv/bd), and
using functions supplied by it (described below in the "Block DDI",
act as nexus drivers with bd leaves.

bd itself supports labeling by importing the cmlb common labeling code,
so these devices can support all of the same labeling conventions as
magnetic SCSI disk devices.   bd also supports the necessary dkio(7I)
ioctls.

Additionally, bd exports a new controller type in the dk_cinfo structure
(used with DKIOCINFO), DKC_BD (#defined to value 24 in our current prototype,
although this may change if the value is used by another project before we
integrate).  This new controller type is used to enable the use of a new
plugin for libsmedia, sm_bd.so.1, which provides basic functionality for
bd targets.

bd has support for breaking large transfers up into smaller ones using
partial DMA, or even for PIO style devices, so that adapter drivers need
not concern themselves with this particular complexity.

bd manages DMA mapping (if required) on behalf of the adapter driver,
providing a scatter/gather list of DMA cookies to the driver as part
of each job (if the adapter driver supports DMA.)


Assumptions and Limitations
---------------------------

bd targets may be hotpluggable, and may be removable.  There is no
support in this integration for door lock, media load, or ejection
mechanisms.

bd targets are assumed to have linear addressibility, and a fixed 512
byte block size.  (Adapter drivers that require a different native
block size may use read-modify-write if necessary.)  bd is not
optimized for rotating media.  (sd, ssd, and cmdk are better suited to
such media.)

bd supports devices with an arbitrarily deep queue size, although the
queue size itself is a fixed value determined at device registration.
This allows for a very simple flow control model.

bd provides no reprioritization.  Jobs are submitted to the adapter
device driver in the order received.  However, they may be completed
by the adapter driver in any order that is convenient for the adapter.

bd lacks support for request cancellation.  Once a job is submitted,
it either completes or fails.

bd lacks support for configurable timeouts.  Once a job is submitted,
it stays in the queue until it is serviced by the adapter driver.  The
adapter driver may elect to use a watchdog mechanism to provide
timeouts at its own discretion.  It is responsible for choosing an
appropriate value for the timeout, if any is used.

bd lacks any support in this integration for write cache management.
If the adapter has a write cache, the adapter driver is wholly
responsible for managing it "reasonably" and safely.

bd assumes that if an adapter supports multiple bd targets, a simple
integer index is sufficient to address each one.

bd assumes the adapter driver will manage suspend/resume safely with
respect to job submission.  bd takes no special actions on suspend or
resume -- that's up to the adapter driver to manage.

bd assumes that adapter devices are able to manage their own power
without the need for help from the framework.  Since current bd media
have neglible startup costs (no spin-up time), this is easy enough.
(Although nothing prevents a driver with a higher startup cost from
making use of the power(9e) framework to reduce thrashing on spin-up
or spin-down.)

The current prototype has an API for supporting hosting of crash
dumps, but does not yet implement the dump(9e) support required.  We
may not get to doing this before integration.


Consumers
---------

Initially, the "bd" prototype will deliver with a separate driver for
the DDRdrive X1 solid state storage device.  That driver will be
discussed in a separate PSARC case of its own which will depend on
this case.

As part of our prototype, we have also converted the SDcard memory
card support to use bd instead of blk2scsa, which could potentially
allow blk2scsa itself to be EOF'd.  This effort will be discussed in a
separate case as well, which will depend on this one.


Block DDI
---------

The following describes the DDI used by block device drivers.
The following header must be included by all bd adapter drivers.

        #include <sys/bd.h>

The following type is exposed to adapter drivers, and represents an
opaque handle for a bd_target device.

        typedef struct bd_handle *bd_handle_t;  /* opaque */

Adapter driver entry points are supplied via the following structure:

        typedef struct bd_ops {
                int     o_version;
                void    (*o_drive_info)(void *, bd_drive_t *);
                int     (*o_media_info)(void *, bd_media_t *);
                int     (*o_read)(void *, bd_xfer_t *);
                int     (*o_write)(void *, bd_xfer_t *);
                int     (*o_dump)(void *, bd_xfer_t *);
        } bd_ops_t;

This structure is supplied by the adapter during handle allocation
(see bd_alloc_handle() below.)

The o_version field must be set by the adapter to BD_OPS_VERSION_0,
and may be used to support versioning of the DDI in the future.

The o_drive_info() entry point describes the logical drive.  The first
argument (void *) is a pointer to the driver soft state supplied at
handle allocation time.  The second argument is a pointer to a
structure with the following definition:

        struct bd_drive {
                uint32_t                d_qsize;
                uint32_t                d_maxxfer;
                uint64_t                d_wwn;
                boolean_t               d_removable;
                boolean_t               d_hotpluggable;
        };

        The d_qsize indicates the depth of the job request queue.  The
        d_maxxfer, if non-zero, represents the largest transfer that
        can be processed by the device.  The d_wwn, if non-zero,
        represents a SCSI-3 style WWN for use in creating a devid.
        The remaining elements describe the capabilities of the
        device.

The o_media_info() entry point describes the current media in the
drive.  (Drives with non-removable media will always return the same
values here.)  The media description is a follows:

        typedef struct bd_media {
                uint64_t        m_nblks;
                boolean_t       m_readonly;
        }

        The m_nblks is the total number of addressable blocks for the
        media, and the m_readonly indicates a non-writable media if
        true.

The o_read() and o_write() entry points are used to handle a read or
write transfer.  The o_read or o_write function returns either 0 on
success, or an errno.  If 0 is returned, then the adapter is
responsible for calling bd_xfer_done() asynchronously when the
transfer is finished (whether succesfully or not.)  The adapter driver
MAY NOT call bd_xfer_done() on a request if it returns an errno.  The
adapter driver MAY NOT call bd_xfer_done() synchronously from this
function (else recursive lock entry will result.)

The bd_xfer_t type has the following public members:

        typedef struct bd_xfer {
                daddr_t                 x_blkno;
                size_t                  x_nblks;
                ddi_dma_handle_t        x_dmah;
                ddi_dma_cookie_t        x_dmac;
                unsigned                x_ndmac;
                caddr_t                 x_kaddr;
        } bd_xfer_t;

        The x_blkno is the logical block address that the transfer
        starts at, and the x_nblks member is the total number of
        blocks to be transfered (it will always be a positive value.)

        If the device supports DMA, the x_dmah, x_dmac, and x_ndmac
        describe the DMA transfer.  If x_ndmac is larger than 1, then
        the adapter driver must use ddi_dma_nextcookie(9f) to obtain
        the DMA cookie for the next entry in the scatter/gather list.

        If the adapter device does not support DMA, the x_kaddr is the
        kernel virtual address for the transfer.

The o_dump() entry point is used to write blocks to a disk
synchronously, in support of dump(9e).  It may not block or use
interrupts, and may not call bd_xfer_done().  Instead, it simply
returns 0 or an errno when the transfer is complete.

The following functions may be called by the adapter driver:

bd_handle_t bd_alloc_handle(dev_info_t *dip, unsigned addr,
    void *private, bd_ops_t *ops, ddi_dma_attr_t *attr);

        Allocates a handle for a target bd device.  The dip is for the
        adapter device (parent).  The addr is the address or index of
        the bd target on the adapter.  (Adapters that only support
        single targets should probably supply 0 here.)  The private is
        a pointer to driver state that is supplied to the entry points
        in the ops vector.  The attr describes the DMA capabilities of
        the adapter (for transfers).  If the adapter driver does not
        use DMA, then NULL may be supplied for attr.  This function
        may be called in user or kernel context only.

void bd_free_handle(bd_handle_t handle);

        Frees a previously allocated handle.  Note that it is an error
        to free a handle that is attached.  May be called in user or
        kernel context only.

int bd_attach_handle(bd_handle_t handle);

    Attaches a handle, creating a node in the device tree for the bd
    target device, and attaching its driver.  Returns DDI_SUCCESS on
    success or DDI_FAILURE on failure.  May be called in user or
    kernel context only.

int bd_detach_handle(bd_handle_t handle);

    Detaches a handle from the system, normally as part of a
    DDI_DETACH operation or as part of a hotplug operation.  Returns
    DDI_SUCCESS on success or DDI_FAILURE on failure.  May be called
    in user or kernel context only.

void bd_state_change(bd_handle_t handle);

     Indicates a state change (media removal or insertion) occurred
     for the given handle.  Only useful for removable media.  May be
     called in kernel, user, or interrupt context.  Caller must not
     hold any locks.

void bd_xfer_done(bd_xfer_t *xfer, int result);

     Called by the adapter when the named transfer (xfer) is complete.
     The result is 0 for a successful transfer, or an errno if the
     transfer failed.  May be called in kernel, user, or interrupt
     context.  Caller must not hold any locks.

void bd_mod_init(struct dev_ops *);
void bd_mod_fini(struct dev_ops *);

     Called by the adapter driver to configure its dev_ops structure
     during _init(9e), or to deconfigure it during _fini(9e).


Imported Interfaces
-------------------

Interface       Stability                       Comments

cmlb            Consolidation Private           Disk labelling support. 
                                                Includes the misc/cmlb
                                                module and the cmlb API.

nexus NDI       Consolidation Private           Needed for nexus device
                                                support.

libsmedia       Consolidation Private           Generic storage media library,
                                                we import interfaces to supply
                                                a plugin.

Exported Interfaces
-------------------

bd(7d)          Committed                       bd device driver.

dkio(7I)        Committed                       Standard ioctls for disks.

DKC_BD          Committed                       New dkio controller type.

Block DDI       Consolidation Private           Used by adapter drivers.
                                                See Block DDI above.

sm_bd.so.1      Consolidation Private           libsmedia plugin.  Both 32
                                                and 64-bit versions.


6. Resources and Schedule
    6.4. Steering Committee requested information
        6.4.1. Consolidation C-team Name:
                ON
    6.5. ARC review type: FastTrack
    6.6. ARC Exposure: open

Reply via email to