Copy Reduction Interfaces [PSARC/2009/478 FastTrack timeout 09/16/2009]

2009-09-16 Thread Roch

Filesystems might have some blocksize and alignment constraints
conditioning their ability to loan up buffers (for writes). 
If that is so, we could use an API to query the FS about
those values. For a copy on write & variable block size
filesystem, that natural blocksize might also depend on the
vnode being targetted. Do we know if ZFS will ever be able to
loan up buffers for writes that are not aligned full records ?

-r

Rich.Brown at Sun.COM writes:
 > I'm sponsoring this case on behalf of Mahesh Siddheshwar and Chunli Zhang.
 > This case proposes new interfaces to support copy reduction in the I/O path
 > especially for file sharing services.
 > 
 > Minor binding is requested.
 > 
 > This times out on Wednesday, 16 September, 2009.
 > 
 > 
 > Template Version: @(#)sac_nextcase 1.68 02/23/09 SMI
 > This information is Copyright 2009 Sun Microsystems
 > 1. Introduction
 > 1.1. Project/Component Working Name:
 >   Copy Reduction Interfaces
 > 1.2. Name of Document Author/Supplier:
 >   Author:  Mahesh Siddheshwar, Chunli Zhang
 > 1.3  Date of This Document:
 >  09 September, 2009
 > 4. Technical Description
 > 
 >  == Introduction/Background ==
 > 
 >  Zero-copy (copy avoidance) is essentially buffer sharing
 >  among multiple modules that pass data between the modules. 
 >  This proposal avoids the data copy in the READ/WRITE path 
 >  of filesystems, by providing a mechanism to share data buffers
 >  between the modules. It is intended to be used by network file
 >  sharing services like NFS, CIFS or others.
 > 
 >  Although the buffer sharing can be achieved through a few different
 >  solutions, any such solution must work with File Event Monitors
 >  (FEM monitors)[1] installed on the files. The solution must
 >  allow the underlying filesystem to maintain any existing file 
 >  range locking in the filesystem.
 >  
 >  The proposed solution provides extensions to the existing VOP
 >  interface to request and return buffers from a filesystem. The 
 >  buffers are then used with existing VOP_READ/VOP_WRITE calls with
 >  minimal changes.
 > 
 > 
 >  == Proposed Changes ==
 > 
 >  VOP Extensions for Zero-Copy Support
 >  
 > 
 >  a. Extended struct uio, xuio_t
 > 
 >   The following proposes an extensible uio structure that can be extended for
 >   multiple purposes.  For example, an immediate extension, xu_zc, is to be 
 >   used by the proposed VOP_REQZCBUF/VOP_RETZCBUF interfaces to pass loaned
 >   zero-copy buffers, as well as to be passed to the existing 
 > VOP_READ/VOP_WRITE
 >   calls for normal read/write operations.  Another example of extension,
 >   xu_aio, is intended to replace uioa_t for async I/O.
 > 
 >   This new structure, xuio_t, contains the following:
 > 
 >   - the existing uio structure (embedded) as the first member
 >   - additional fields to support extensibility
 >   - a union of all the defined extensions
 > 
 >   The following uio_extflag is added to indicate that an uio structure is
 >   indeed an xuio_t:
 > 
 >   #defineUIO_XUIO0x004   /* Structure is xuio_t */
 > 
 >   The following uio_extflag will be removed after uioa_t has been converted 
 >   to xuio_t:
 > 
 >   #defineUIO_ASYNC   0x002   /* Structure is xuio_t */
 > 
 >   The project team has commitment from the networking team to remove
 >   the current use of uioa_t and use the proposed extensions (CR 6880095).
 > 
 >   The definition of xuio_t is:
 > 
 >   typedef struct xuio {
 > uio_t xu_uio;/* Embedded UIO structure */
 > 
 > /* Extended uio fields */
 > enum xuio_type xu_type;  /* What kind of uio structure? */
 > 
 > union {
 > 
 >  /* Async I/O Support */
 >  struct {
 > uint32_t xu_a_state; /* state of async i/o */
 > uint32_t xu_a_state; /* state of async i/o */
 > ssize_t xu_a_mbytes; /* bytes that have been uioamove()ed */
 > uioa_page_t *xu_a_lcur;  /* pointer into uioa_locked[] */
 > void **xu_a_lppp;/* pointer into 
 > lcur->uioa_ppp[] */
 > void *xu_a_hwst[4];  /* opaque hardware state */
 > uioa_page_t xu_a_locked[UIOA_IOV_MAX];   /* Per iov locked pages 
 > */
 >  } xu_aio;
 > 
 >  /* Zero Copy Support */
 >  struct {
 > enum uio_rw xu_zc_rw;/* the use of the buffer */
 > void *xu_zc_priv;/* fs specific */
 >  } xu_zc;
 > 
 > } xu_ext;
 >   } xuio_t;
 > 
 >   where xu_type is currently defined as:
 > 
 >   typedef enum xuio_type {
 > UIOTYPE_ASYNCIO,
 > UIOTYPE_ZEROCOPY
 >   } xuio_type_t;
 > 
 >   New uio extensions can be added by defining a new xuio_type_t, and adding a
 >   new member to the xu_ext union.
 > 
 >  b. Requesting zero-copy buffers
 > 
 > #define VOP_REQZCBUF(vp, rwflag, uiozcp, cr, ct) \
 > fop_reqzcbuf(vp, rwflag, uiozcp, cr, ct)
 > 
 > int fop_reqzcbu

Copy Reduction Interfaces [PSARC/2009/478 FastTrack timeout 09/16/2009]

2009-09-16 Thread Roch

My issues have been resolved. Thanks Mahesh.

-r

Rich.Brown at Sun.COM writes:

 > I'm sponsoring this case on behalf of Mahesh Siddheshwar and Chunli Zhang.
 > This case proposes new interfaces to support copy reduction in the I/O path
 > especially for file sharing services.
 > 
 > Minor binding is requested.
 > 
 > This times out on Wednesday, 16 September, 2009.
 > 
 > 
 > Template Version: @(#)sac_nextcase 1.68 02/23/09 SMI
 > This information is Copyright 2009 Sun Microsystems
 > 1. Introduction
 > 1.1. Project/Component Working Name:
 >   Copy Reduction Interfaces
 > 1.2. Name of Document Author/Supplier:
 >   Author:  Mahesh Siddheshwar, Chunli Zhang
 > 1.3  Date of This Document:
 >  09 September, 2009
 > 4. Technical Description
 > 
 >  == Introduction/Background ==
 > 
 >  Zero-copy (copy avoidance) is essentially buffer sharing
 >  among multiple modules that pass data between the modules. 
 >  This proposal avoids the data copy in the READ/WRITE path 
 >  of filesystems, by providing a mechanism to share data buffers
 >  between the modules. It is intended to be used by network file
 >  sharing services like NFS, CIFS or others.
 > 
 >  Although the buffer sharing can be achieved through a few different
 >  solutions, any such solution must work with File Event Monitors
 >  (FEM monitors)[1] installed on the files. The solution must
 >  allow the underlying filesystem to maintain any existing file 
 >  range locking in the filesystem.
 >  
 >  The proposed solution provides extensions to the existing VOP
 >  interface to request and return buffers from a filesystem. The 
 >  buffers are then used with existing VOP_READ/VOP_WRITE calls with
 >  minimal changes.
 > 
 > 
 >  == Proposed Changes ==
 > 
 >  VOP Extensions for Zero-Copy Support
 >  
 > 
 >  a. Extended struct uio, xuio_t
 > 
 >   The following proposes an extensible uio structure that can be extended for
 >   multiple purposes.  For example, an immediate extension, xu_zc, is to be 
 >   used by the proposed VOP_REQZCBUF/VOP_RETZCBUF interfaces to pass loaned
 >   zero-copy buffers, as well as to be passed to the existing 
 > VOP_READ/VOP_WRITE
 >   calls for normal read/write operations.  Another example of extension,
 >   xu_aio, is intended to replace uioa_t for async I/O.
 > 
 >   This new structure, xuio_t, contains the following:
 > 
 >   - the existing uio structure (embedded) as the first member
 >   - additional fields to support extensibility
 >   - a union of all the defined extensions
 > 
 >   The following uio_extflag is added to indicate that an uio structure is
 >   indeed an xuio_t:
 > 
 >   #defineUIO_XUIO0x004   /* Structure is xuio_t */
 > 
 >   The following uio_extflag will be removed after uioa_t has been converted 
 >   to xuio_t:
 > 
 >   #defineUIO_ASYNC   0x002   /* Structure is xuio_t */
 > 
 >   The project team has commitment from the networking team to remove
 >   the current use of uioa_t and use the proposed extensions (CR 6880095).
 > 
 >   The definition of xuio_t is:
 > 
 >   typedef struct xuio {
 > uio_t xu_uio;/* Embedded UIO structure */
 > 
 > /* Extended uio fields */
 > enum xuio_type xu_type;  /* What kind of uio structure? */
 > 
 > union {
 > 
 >  /* Async I/O Support */
 >  struct {
 > uint32_t xu_a_state; /* state of async i/o */
 > uint32_t xu_a_state; /* state of async i/o */
 > ssize_t xu_a_mbytes; /* bytes that have been uioamove()ed */
 > uioa_page_t *xu_a_lcur;  /* pointer into uioa_locked[] */
 > void **xu_a_lppp;/* pointer into 
 > lcur->uioa_ppp[] */
 > void *xu_a_hwst[4];  /* opaque hardware state */
 > uioa_page_t xu_a_locked[UIOA_IOV_MAX];   /* Per iov locked pages 
 > */
 >  } xu_aio;
 > 
 >  /* Zero Copy Support */
 >  struct {
 > enum uio_rw xu_zc_rw;/* the use of the buffer */
 > void *xu_zc_priv;/* fs specific */
 >  } xu_zc;
 > 
 > } xu_ext;
 >   } xuio_t;
 > 
 >   where xu_type is currently defined as:
 > 
 >   typedef enum xuio_type {
 > UIOTYPE_ASYNCIO,
 > UIOTYPE_ZEROCOPY
 >   } xuio_type_t;
 > 
 >   New uio extensions can be added by defining a new xuio_type_t, and adding a
 >   new member to the xu_ext union.
 > 
 >  b. Requesting zero-copy buffers
 > 
 > #define VOP_REQZCBUF(vp, rwflag, uiozcp, cr, ct) \
 > fop_reqzcbuf(vp, rwflag, uiozcp, cr, ct)
 > 
 > int fop_reqzcbuf(vnode_t *, enum uio_rw, xuio_t *, cred_t *,
 >  caller_context_t *);
 >  
 > This function requests buffers associated with file vp in preparation 
 > for a
 > subsequent zero copy read or write. The extended uio_t -- xuio_t is used
 > to pass the parameters and results. Only the following fields of xuio_t 
 > are
 > relevant to this call.
 >  
 > u