Copy Reduction Interfaces [PSARC/2009/478 FastTrack timeout 09/16/2009]

2009-09-16 Thread Roch

Filesystems might have some blocksize and alignment constraints
conditioning their ability to loan up buffers (for writes). 
If that is so, we could use an API to query the FS about
those values. For a copy on write  variable block size
filesystem, that natural blocksize might also depend on the
vnode being targetted. Do we know if ZFS will ever be able to
loan up buffers for writes that are not aligned full records ?

-r

Rich.Brown at Sun.COM writes:
  I'm sponsoring this case on behalf of Mahesh Siddheshwar and Chunli Zhang.
  This case proposes new interfaces to support copy reduction in the I/O path
  especially for file sharing services.
  
  Minor binding is requested.
  
  This times out on Wednesday, 16 September, 2009.
  
  
  Template Version: @(#)sac_nextcase 1.68 02/23/09 SMI
  This information is Copyright 2009 Sun Microsystems
  1. Introduction
  1.1. Project/Component Working Name:
Copy Reduction Interfaces
  1.2. Name of Document Author/Supplier:
Author:  Mahesh Siddheshwar, Chunli Zhang
  1.3  Date of This Document:
   09 September, 2009
  4. Technical Description
  
   == Introduction/Background ==
  
   Zero-copy (copy avoidance) is essentially buffer sharing
   among multiple modules that pass data between the modules. 
   This proposal avoids the data copy in the READ/WRITE path 
   of filesystems, by providing a mechanism to share data buffers
   between the modules. It is intended to be used by network file
   sharing services like NFS, CIFS or others.
  
   Although the buffer sharing can be achieved through a few different
   solutions, any such solution must work with File Event Monitors
   (FEM monitors)[1] installed on the files. The solution must
   allow the underlying filesystem to maintain any existing file 
   range locking in the filesystem.
   
   The proposed solution provides extensions to the existing VOP
   interface to request and return buffers from a filesystem. The 
   buffers are then used with existing VOP_READ/VOP_WRITE calls with
   minimal changes.
  
  
   == Proposed Changes ==
  
   VOP Extensions for Zero-Copy Support
   
  
   a. Extended struct uio, xuio_t
  
The following proposes an extensible uio structure that can be extended for
multiple purposes.  For example, an immediate extension, xu_zc, is to be 
used by the proposed VOP_REQZCBUF/VOP_RETZCBUF interfaces to pass loaned
zero-copy buffers, as well as to be passed to the existing 
  VOP_READ/VOP_WRITE
calls for normal read/write operations.  Another example of extension,
xu_aio, is intended to replace uioa_t for async I/O.
  
This new structure, xuio_t, contains the following:
  
- the existing uio structure (embedded) as the first member
- additional fields to support extensibility
- a union of all the defined extensions
  
The following uio_extflag is added to indicate that an uio structure is
indeed an xuio_t:
  
#defineUIO_XUIO0x004   /* Structure is xuio_t */
  
The following uio_extflag will be removed after uioa_t has been converted 
to xuio_t:
  
#defineUIO_ASYNC   0x002   /* Structure is xuio_t */
  
The project team has commitment from the networking team to remove
the current use of uioa_t and use the proposed extensions (CR 6880095).
  
The definition of xuio_t is:
  
typedef struct xuio {
  uio_t xu_uio;/* Embedded UIO structure */
  
  /* Extended uio fields */
  enum xuio_type xu_type;  /* What kind of uio structure? */
  
  union {
  
   /* Async I/O Support */
   struct {
  uint32_t xu_a_state; /* state of async i/o */
  uint32_t xu_a_state; /* state of async i/o */
  ssize_t xu_a_mbytes; /* bytes that have been uioamove()ed */
  uioa_page_t *xu_a_lcur;  /* pointer into uioa_locked[] */
  void **xu_a_lppp;/* pointer into 
  lcur-uioa_ppp[] */
  void *xu_a_hwst[4];  /* opaque hardware state */
  uioa_page_t xu_a_locked[UIOA_IOV_MAX];   /* Per iov locked pages 
  */
   } xu_aio;
  
   /* Zero Copy Support */
   struct {
  enum uio_rw xu_zc_rw;/* the use of the buffer */
  void *xu_zc_priv;/* fs specific */
   } xu_zc;
  
  } xu_ext;
} xuio_t;
  
where xu_type is currently defined as:
  
typedef enum xuio_type {
  UIOTYPE_ASYNCIO,
  UIOTYPE_ZEROCOPY
} xuio_type_t;
  
New uio extensions can be added by defining a new xuio_type_t, and adding a
new member to the xu_ext union.
  
   b. Requesting zero-copy buffers
  
  #define VOP_REQZCBUF(vp, rwflag, uiozcp, cr, ct) \
  fop_reqzcbuf(vp, rwflag, uiozcp, cr, ct)
  
  int fop_reqzcbuf(vnode_t *, enum uio_rw, xuio_t *, cred_t *,
   caller_context_t *);
   
  This function requests buffers 

Copy Reduction Interfaces [PSARC/2009/478 FastTrack timeout 09/16/2009]

2009-09-16 Thread Roch

My issues have been resolved. Thanks Mahesh.

-r

Rich.Brown at Sun.COM writes:

  I'm sponsoring this case on behalf of Mahesh Siddheshwar and Chunli Zhang.
  This case proposes new interfaces to support copy reduction in the I/O path
  especially for file sharing services.
  
  Minor binding is requested.
  
  This times out on Wednesday, 16 September, 2009.
  
  
  Template Version: @(#)sac_nextcase 1.68 02/23/09 SMI
  This information is Copyright 2009 Sun Microsystems
  1. Introduction
  1.1. Project/Component Working Name:
Copy Reduction Interfaces
  1.2. Name of Document Author/Supplier:
Author:  Mahesh Siddheshwar, Chunli Zhang
  1.3  Date of This Document:
   09 September, 2009
  4. Technical Description
  
   == Introduction/Background ==
  
   Zero-copy (copy avoidance) is essentially buffer sharing
   among multiple modules that pass data between the modules. 
   This proposal avoids the data copy in the READ/WRITE path 
   of filesystems, by providing a mechanism to share data buffers
   between the modules. It is intended to be used by network file
   sharing services like NFS, CIFS or others.
  
   Although the buffer sharing can be achieved through a few different
   solutions, any such solution must work with File Event Monitors
   (FEM monitors)[1] installed on the files. The solution must
   allow the underlying filesystem to maintain any existing file 
   range locking in the filesystem.
   
   The proposed solution provides extensions to the existing VOP
   interface to request and return buffers from a filesystem. The 
   buffers are then used with existing VOP_READ/VOP_WRITE calls with
   minimal changes.
  
  
   == Proposed Changes ==
  
   VOP Extensions for Zero-Copy Support
   
  
   a. Extended struct uio, xuio_t
  
The following proposes an extensible uio structure that can be extended for
multiple purposes.  For example, an immediate extension, xu_zc, is to be 
used by the proposed VOP_REQZCBUF/VOP_RETZCBUF interfaces to pass loaned
zero-copy buffers, as well as to be passed to the existing 
  VOP_READ/VOP_WRITE
calls for normal read/write operations.  Another example of extension,
xu_aio, is intended to replace uioa_t for async I/O.
  
This new structure, xuio_t, contains the following:
  
- the existing uio structure (embedded) as the first member
- additional fields to support extensibility
- a union of all the defined extensions
  
The following uio_extflag is added to indicate that an uio structure is
indeed an xuio_t:
  
#defineUIO_XUIO0x004   /* Structure is xuio_t */
  
The following uio_extflag will be removed after uioa_t has been converted 
to xuio_t:
  
#defineUIO_ASYNC   0x002   /* Structure is xuio_t */
  
The project team has commitment from the networking team to remove
the current use of uioa_t and use the proposed extensions (CR 6880095).
  
The definition of xuio_t is:
  
typedef struct xuio {
  uio_t xu_uio;/* Embedded UIO structure */
  
  /* Extended uio fields */
  enum xuio_type xu_type;  /* What kind of uio structure? */
  
  union {
  
   /* Async I/O Support */
   struct {
  uint32_t xu_a_state; /* state of async i/o */
  uint32_t xu_a_state; /* state of async i/o */
  ssize_t xu_a_mbytes; /* bytes that have been uioamove()ed */
  uioa_page_t *xu_a_lcur;  /* pointer into uioa_locked[] */
  void **xu_a_lppp;/* pointer into 
  lcur-uioa_ppp[] */
  void *xu_a_hwst[4];  /* opaque hardware state */
  uioa_page_t xu_a_locked[UIOA_IOV_MAX];   /* Per iov locked pages 
  */
   } xu_aio;
  
   /* Zero Copy Support */
   struct {
  enum uio_rw xu_zc_rw;/* the use of the buffer */
  void *xu_zc_priv;/* fs specific */
   } xu_zc;
  
  } xu_ext;
} xuio_t;
  
where xu_type is currently defined as:
  
typedef enum xuio_type {
  UIOTYPE_ASYNCIO,
  UIOTYPE_ZEROCOPY
} xuio_type_t;
  
New uio extensions can be added by defining a new xuio_type_t, and adding a
new member to the xu_ext union.
  
   b. Requesting zero-copy buffers
  
  #define VOP_REQZCBUF(vp, rwflag, uiozcp, cr, ct) \
  fop_reqzcbuf(vp, rwflag, uiozcp, cr, ct)
  
  int fop_reqzcbuf(vnode_t *, enum uio_rw, xuio_t *, cred_t *,
   caller_context_t *);
   
  This function requests buffers associated with file vp in preparation 
  for a
  subsequent zero copy read or write. The extended uio_t -- xuio_t is used
  to pass the parameters and results. Only the following fields of xuio_t 
  are
  relevant to this call.
   
  uiozcp-xu_uio.uio_resid: used by the caller to specify the total length
   of the buffer.
  
  uiozcp-xu_uio.uio_loffset: