Lori Alt wrote: > On 10/13/09 13:36, Nicolas Williams wrote: >> Throwing away of cached blocks probably needs to be done synchronously >> by both ends, or else the receiver has to at least keep an index of >> block checksum to block pointer for all previously seen blocks in the >> stream. Synchronizing the caches may require additional records in the >> stream. But I agree with you: it should be possible to bound the memory >> usage of zfs send dedup. >> > Yes, the memory usage can be bounded. It was our plan at this time > however to regard that as an implementation detail, not part of the > interface to be approved by this case.
It becomes part of the interface if (a) the sender needs to notify the recipient of table flushes (as Nico reasonably suggested) or potentially (b) it becomes part of the usage considerations for users. There's actually a good bit of prior art to draw on here from other stream compression schemes. >> Also, in ZFS today block checksums are used for integrity protection, >> not for block equality comparisons. The fact that here blocks would not >> be compared for actual equality does worry me somewhat >> > The plan is to use a SHA256 checksum, or something comparably strong, so > that the probability of collision becomes too small to worry about. > Perhaps Darren Moffat can weigh in on why this kind of checksum is > adequate, because I'm pretty much taking his word for it. That's the sort of review I was hoping for. ;-} -- James Carlson 42.703N 71.076W <carlsonj at workingcode.com>