On ke, 2011-01-05 at 14:46 -0500, Josef Bacik wrote: > Blah blah blah, I'm not having an argument about which is better because I > simply do not care. I think dedup is silly to begin with, and online dedup > even > sillier. The only reason I did offline dedup was because I was just toying > around with a simple userspace app to see exactly how much I would save if I > did > dedup on my normal system, and with 107 gigabytes in use, I'd save 300 > megabytes. I'll say that again, with 107 gigabytes in use, I'd save 300 > megabytes. So in the normal user case dedup would have been wholey useless to > me.
I have been thinking a lot about de-duplication for a backup application I am writing. I wrote a little script to figure out how much it would save me. For my laptop home directory, about 100 GiB of data, it was a couple of percent, depending a bit on the size of the chunks. With 4 KiB chunks, I would save about two gigabytes. (That's assuming no MD5 hash collisions.) I don't have VM images, but I do have a fair bit of saved e-mail. So, for backups, I concluded it was worth it to provide an option to do this. I have no opinion on whether it is worthwhile to do in btrfs. (For my script, see find-duplicate-chunks in http://code.liw.fi/debian/pool/main/o/obnam/obnam_0.14.tar.gz or get the current code using "bzr get http://code.liw.fi/obnam/bzr/trunk/". http://braawi.org/obnam/ is the home page of the backup app.) -- Blog/wiki/website hosting with ikiwiki (free for free software): http://www.branchable.com/ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html