On Tue, 2009-04-28 at 16:04 +0200, Thomas Glanzmann wrote:
> Chris,
> what blocksizes can I choose with btrfs? 

Right now the blocksize can only be the same as the page size.  For this
external dedup program you have in mind, you could use any multiple of
the page size.

> Do you think that it is
> possible for an outsider like me to submit patches to btrfs which enable
> dedup in three fulltime days?

Three days is probably not quite enough ;)  I'd honestly prefer the
dedup happen entirely in the kernel in a setup similar to the
compression code.

But, that would use _lots_ of CPU, so an offline dedup is probably a
good feature even if we have transparent dedup.

You'd have to:

Wire up a userland database that stores checksums and points to
file,offset tuples

Make the ioctl to replace a given file extent if and only if the file
contents match a given checksum over a range of bytes.  The ioctl should
be able to optionally do a byte compare of the src and destination pages
to make 100% sure the data is really the same.

Make another ioctl to report on which parts of a file have changed since
a given transaction.  This will greatly reduce the time spent scanning
for new blocks.

It isn't painfully hard, but you're looking at about 3 weeks total time.

-chris


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to