This case was approved at yesterday's meeting.

--matt

Matthew Ahrens wrote:
> Template Version: @(#)sac_nextcase 1.68 02/23/09 SMI
> This information is Copyright 2009 Sun Microsystems
> 1. Introduction
>     1.1. Project/Component Working Name:
>        ZFS send dedup
>     1.2. Name of Document Author/Supplier:
>        Author:  Lori Alt
>     1.3  Date of This Document:
>       13 October, 2009
> 4. Technical Description
> This case requests micro/patch binding; new interfaces are Comitted.
>
> 4. Technical Description
>
> OVERVIEW:
>
> "Dedup" is an overall term for technologies that eliminate duplicate
> copies of data in storage or memory.  This specific application of
> dedup is for ZFS send streams, i.e., the output of the 'zfs send' command.
> For some kinds of data, much of the content of a send stream consists
> of blocks for which identical copies have already been sent earlier
> in the stream.  This technology replaces later copies of a block with
> a reference to the earlier copy.  This can significantly reduce the
> size of a send stream, which reduces the time it takes to transfer
> such a stream over a communication channel.
>
> PROPOSED SOLUTION:
>
> A new '-D' option to 'zfs send' is proposed.  This option will cause
> dedup processing to be performed on the data being written to a send
> stream.  Dedup processing is optional because it isn't always appropriate
> (some kinds of data have very little duplication) and it has significant
> costs:  the checksumming required to detect duplicate blocks is
> CPU-intensive and the data that must be maintained while the stream
> is being processed can occupy a very large amount of memory.
>
> Duplicate blocks are detected by calculating a cryptographically strong
> checksum on each data block.  Blocks that have the same checksum are
> presumed to be identical.  The checksum type used at this time is SHA256.
> However, the stream format contains a field which identifies the checksum
> type, permitting other checksums to be used in the future.
>
> RELATION TO OTHER ZFS DEDUP WORK
>
> There are several other ongoing ZFS projects that are potentially
> related to this one:  on-disk dedup, in-core dedup, and ZFS
> encryption (PSARC/2007/261).  The relation between this project
> and the other projects is that over-the-wire (OTW) dedup does not depend
> on those projects, but will be able to take advantage of some
> aspects of the other dedup work when it is integrated.
>
> Dedup of send streams can be performed regardless of whether the
> other variants of dedup are operational.  The main way that OTW dedup
> can take advantage of the other varieties of dedup support is that
> if a dedup-capable checksum of the data has already been calculated,
> the 'zfs send' processing will not recalculate it.  It will use the
> already-computed checksum, thereby reducing the CPU usage of the
> stream dedup processing.
>
> The checksum of block send in dedup'ed streams will be included in
> the stream.  This gives the receive side of the code the option
> to work with the in-core and on-disk dedup support to avoid the
> re-computation of the checksum when the data is stored in memory
> or on-disk.  At this time, that option is not being used (because
> in-core and on-disk dedup are still in development), and it might
> not ever be used.  But the interface has been designed in such a
> way to allow that optimization in the future.
>
> SEND STREAM FORMAT COMPATIBILITY IMPACT
>
> Over-the-wire dedup support requires a change to the format of
> a send stream.  A new "write-by-reference" record is used to indicate
> a write operation that references data sent earlier in the stream.
>
> This new record type will only appear in dedup'ed streams.  A feature
> flag indicating the use of dedup will be set in the streams "begin"
> record.  Older version of 'zfs receive' will reject the stream as
> unreadable because of the presense of that feature flag.  However, if
> dedup is not being done on the stream, older version of the zfs software
> will be able to read the stream (assuming that the objects recorded
> in the stream are of a version that can be interpreted by the version
> of zfs on the receiving system, but that is an existing requirement,
> not one added by this project).
>
> CHANGES TO THE ZFS(1M) MANPAGE
>
> 65c62
> <      zfs send [-vR] [-[iI] snapshot] snapshot
> ---
>   
>>>      zfs send [-DvR] [-[iI] snapshot] snapshot
>>>       
>
> 1746c1677
> <      zfs send [-vR] [-[iI] snapshot] snapshot
> ---
>   
>>>      zfs send [-DvR] [-[iI] snapshot] snapshot
>>>       
> 1753a1685,1689
>   
>>>      -D
>>>            Perform dedup processing on the stream. Dedup'ed streams
>>>          cannot be received on systems that do not support the stream
>>>          dedup feature.
>>>
>>>       
>
> ATTRIBUTES
>     See attributes(5) for descriptions of the  following  attributes:
>
>     ____________________________________________________________
>    |       ATTRIBUTE TYPE        |       ATTRIBUTE VALUE       |
>    |_____________________________|_____________________________|
>    | Availability                |          SUNWzfsu           |
>    |_____________________________|_____________________________|
>    | Interface Stability         |           Committed         |
>    |_____________________________|_____________________________|
>
> 6. Resources and Schedule
>     6.4. Steering Committee requested information
>       6.4.1. Consolidation C-team Name:
>               ON
>     6.5. ARC review type: FastTrack
>     6.6. ARC Exposure: open
>   

Reply via email to