On Wed, Jan 5, 2011 at 11:46 AM, Josef Bacik <jo...@redhat.com> wrote: > Dedup is only usefull if you _know_ you are going to have duplicate > information, > so the two major usecases that come to mind are > > 1) Mail server. You have small files, probably less than 4k (blocksize) that > you are storing hundreds to thousands of. Using dedup would be good for this > case, and you'd have to have a small dedup blocksize for it to be usefull. > > 2) Virtualized guests. If you have 5 different RHEL5 virt guests, chances are > you are going to share data between them, but unlike with the mail server > example, you are likely to find much larger chunks that are the same, so you'd > want a larger dedup blocksize, say 64k. You want this because if you did just > 4k you'd end up with a ridiculous amount of framentation and performance would > go down the toilet, so you need a larger dedup blocksize to make for better > performance.
You missed out on the most obvious, and useful, use case for dedupe: central backups server. Our current backup server does an rsync backup of 127 servers every night into a single ZFS pool. 90+ of those servers are identical Debian installs (school servers), 20-odd of those are identical FreeBSD installs (firewalls/routers), and the rest are mail/web/db servers (Debian, Ubuntu, RedHat, Windows). Just as a test, we copied a week of backups to a Linux box running ZFS-fuse with dedupe enabled, and had a combined dedupe/compress ration in the low double-digits (11 or 12x, something like that). Now we're just waiting for ZFSv22+ to hit FreeBSD to enable dedupe on the backups server. For backups, and central storage for VMs, online dedupe is a massive win. Offline, maybe. Either way, dedupe is worthwhile. -- Freddie Cash fjwc...@gmail.com -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html