On 10/13/09 15:19, James Carlson wrote:
> Lori Alt wrote:
>   
>> On 10/13/09 13:36, Nicolas Williams wrote:
>>     
>>> Throwing away of cached blocks probably needs to be done synchronously
>>> by both ends, or else the receiver has to at least keep an index of
>>> block checksum to block pointer for all previously seen blocks in the
>>> stream.  Synchronizing the caches may require additional records in the
>>> stream.  But I agree with you: it should be possible to bound the memory
>>> usage of zfs send dedup.
>>>   
>>>       
>> Yes, the memory usage can be bounded.   It was our plan at this time
>> however to regard that as an implementation detail, not part of the
>> interface to be approved by this case.
>>     
>
> It becomes part of the interface if (a) the sender needs to notify the
> recipient of table flushes (as Nico reasonably suggested) or potentially
> (b) it becomes part of the usage considerations for users.  There's
> actually a good bit of prior art to draw on here from other stream
> compression schemes.
>
>   
I  missed Nico's suggestion about notification of the recipient for 
cache flushes.  Actually, there is no need for a cache on the receive 
side.  Or more exactly, the dataset hierarchy constructed by the receive 
IS the cache.  The new write-by-reference record in the send stream 
essentially sends this information:

* identification of where the data can be found already  on the target 
system (i.e. the object set, the object, and the offset and length 
within the object)

* the location where the data is to be written (object set, object, and 
offset). 

During the receive, all datasets being received are "held" and not 
deletable until the receive completes, so the data is guaranteed to be 
present.  There is no need to maintain an index of block checksum to 
block pointer on the receive side. There IS a need to maintain this on 
the send side, which is where memory management is an issue.

As for the send-side  memory management, I agree that we could establish 
a public interface by which a caller can constrain the memory to be 
used.  However, we were thinking that if such an interface turns out to 
be necessary, we could define it and add it later, once we gain more 
experience with how over-the-wire dedup gets used in practice.

I don't know whether the kinds of on-the-fly compression disabling that 
James mentions are relevant for dedup'ing.  For example, in one of my 
test cases, which is a hierarchy of datasets that contain Solaris 
development workspaces, you can go for a long time without finding more 
than a handful of duplicate blocks, but once you've finished with one 
development workspace and started on the next one, then you start 
getting lots of duplicates because now you're seeing identical copies of 
the files you processed in the first dataset.  This is just one  kind of 
data, but in general, it's hard to predict at what point in the stream 
you're going to start getting dedup'ing bang for your memory-hogging buck.

Lori








Reply via email to