Template Version: @(#)sac_nextcase 1.68 02/23/09 SMI
This information is Copyright 2009 Sun Microsystems
1. Introduction
    1.1. Project/Component Working Name:
         ZFS send dedup
    1.2. Name of Document Author/Supplier:
         Author:  Lori Alt
    1.3  Date of This Document:
        13 October, 2009
4. Technical Description
This case requests micro/patch binding; new interfaces are Comitted.

4. Technical Description

OVERVIEW:

"Dedup" is an overall term for technologies that eliminate duplicate
copies of data in storage or memory.  This specific application of
dedup is for ZFS send streams, i.e., the output of the 'zfs send' command.
For some kinds of data, much of the content of a send stream consists
of blocks for which identical copies have already been sent earlier
in the stream.  This technology replaces later copies of a block with
a reference to the earlier copy.  This can significantly reduce the
size of a send stream, which reduces the time it takes to transfer
such a stream over a communication channel.

PROPOSED SOLUTION:

A new '-D' option to 'zfs send' is proposed.  This option will cause
dedup processing to be performed on the data being written to a send
stream.  Dedup processing is optional because it isn't always appropriate
(some kinds of data have very little duplication) and it has significant
costs:  the checksumming required to detect duplicate blocks is
CPU-intensive and the data that must be maintained while the stream
is being processed can occupy a very large amount of memory.

Duplicate blocks are detected by calculating a cryptographically strong
checksum on each data block.  Blocks that have the same checksum are
presumed to be identical.  The checksum type used at this time is SHA256.
However, the stream format contains a field which identifies the checksum
type, permitting other checksums to be used in the future.

RELATION TO OTHER ZFS DEDUP WORK

There are several other ongoing ZFS projects that are potentially
related to this one:  on-disk dedup, in-core dedup, and ZFS
encryption (PSARC/2007/261).  The relation between this project
and the other projects is that over-the-wire (OTW) dedup does not depend
on those projects, but will be able to take advantage of some
aspects of the other dedup work when it is integrated.

Dedup of send streams can be performed regardless of whether the
other variants of dedup are operational.  The main way that OTW dedup
can take advantage of the other varieties of dedup support is that
if a dedup-capable checksum of the data has already been calculated,
the 'zfs send' processing will not recalculate it.  It will use the
already-computed checksum, thereby reducing the CPU usage of the
stream dedup processing.

The checksum of block send in dedup'ed streams will be included in
the stream.  This gives the receive side of the code the option
to work with the in-core and on-disk dedup support to avoid the
re-computation of the checksum when the data is stored in memory
or on-disk.  At this time, that option is not being used (because
in-core and on-disk dedup are still in development), and it might
not ever be used.  But the interface has been designed in such a
way to allow that optimization in the future.

SEND STREAM FORMAT COMPATIBILITY IMPACT

Over-the-wire dedup support requires a change to the format of
a send stream.  A new "write-by-reference" record is used to indicate
a write operation that references data sent earlier in the stream.

This new record type will only appear in dedup'ed streams.  A feature
flag indicating the use of dedup will be set in the streams "begin"
record.  Older version of 'zfs receive' will reject the stream as
unreadable because of the presense of that feature flag.  However, if
dedup is not being done on the stream, older version of the zfs software
will be able to read the stream (assuming that the objects recorded
in the stream are of a version that can be interpreted by the version
of zfs on the receiving system, but that is an existing requirement,
not one added by this project).

CHANGES TO THE ZFS(1M) MANPAGE

65c62
<      zfs send [-vR] [-[iI] snapshot] snapshot
---
> >      zfs send [-DvR] [-[iI] snapshot] snapshot

1746c1677
<      zfs send [-vR] [-[iI] snapshot] snapshot
---
> >      zfs send [-DvR] [-[iI] snapshot] snapshot
1753a1685,1689
> >      -D
> >            Perform dedup processing on the stream. Dedup'ed streams
> >          cannot be received on systems that do not support the stream
> >          dedup feature.
> > 

ATTRIBUTES
    See attributes(5) for descriptions of the  following  attributes:

    ____________________________________________________________
   |       ATTRIBUTE TYPE        |       ATTRIBUTE VALUE       |
   |_____________________________|_____________________________|
   | Availability                |          SUNWzfsu           |
   |_____________________________|_____________________________|
   | Interface Stability         |           Committed         |
   |_____________________________|_____________________________|

6. Resources and Schedule
    6.4. Steering Committee requested information
        6.4.1. Consolidation C-team Name:
                ON
    6.5. ARC review type: FastTrack
    6.6. ARC Exposure: open

Reply via email to