On 03/20/2010 05:24 PM, Boyd Waters wrote:
On Mar 20, 2010, at 9:05 AM, Ric Wheeler<rwhee...@redhat.com>  wrote:

My dataset reported a dedup factor of 1.28 for about 4TB, meaning
that
almost a third of the dataset was duplicated.

It is always interesting to compare this to the rate you would get
with old fashioned compression to see how effective this is. Seems
to be not that aggressive if I understand your results correctly.

Any idea of how compressible your data set was?

Well, of course if I used zip on the whole 4 TB that would deal with
my duplication issues, and give me a useless, static blob with no
checksumming. I haven't tried.

gzip/bzip2 of the block device was not meant to give a best case estimate of what traditional compression can do. Many block devices (including some single spindle disks) can do encryption internally.


One thing that I did do, seven (!) years ago, was to detect duplicate
files (not blocks) and use hard links. I was able to squeeze out all
of the air in a series of backups, and was able to see all of them. I
used a Perl script for all this. It was nuts, but now I understand why
Apple implemented hard links to directories in HFS in order to get
thier Time Machine product.  I didn't have copy-on-write, so btrfs
snapshots completely spank a manual system like this, but I did get 7-
to-1 compression. These days you can use rsync with "--link-target" to
make hard-linked duplicates of large directory trees. Tar, cpio, and
friends tend to break when transferring hundreds of gigabytes with
thousands of hard links. Or they ignore the hard links.

Good times. I'm not sure how this is germane to btrfs, except to point
out pathological file-system usage that I've actually attempted in
real life. I actually use a lot of the ZFS feature set, and I look
forward to btrfs stability. I think btrfs can get there.

File level dedup is something we did in a group I worked with before and can certainly be quite effective. Even better, it is much easier to map into normal user expectations :-)

ric

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to