On Κυριακή, 1 Απρίλιος 2012 8:07:54 μμ, Norbert Scheibner wrote:
On: Sun, 01 Apr 2012 19:45:13 +0300 Konstantinos Skarlatos wrote
That's my point. This poor man's dedupe would solve my problems here
very well. I don't need a zfs-variant of dedupe. I can implement such a
file-based dedupe with userland tools and would be happy.
do you have any scripts that can search a btrfs filesystem for dupes
and replace them with cp --reflink?
Nothing really working and tested very well. After I get to known the missing
cp --reflink feature I stopped to develop the script any further.
I use btrfs for my backups. Ones a day I rsync --delete --inplace the complete
system to a subvolume, snapshot it, delete some tempfiles in the snapshot.
In my setup I rsync --inplace many servers and workstations, 4-6 times
a day into a 12TB btrfs volume, each one in its own subvolume. After
every backup a new ro snapshot is created.
I have many cross-subvolume duplicate files (OS files, programs, many
huge media files that are copied locally from the servers to the
workstations etc), so a good "dedupe" script could save lots of space,
and allow me to keep snapshots for much longer.
In addition to that I wanted to shrink file-duplicates.
What the script should do:
1. I md5sum every file
2. If the checksums are identical, I compare the files
3. If 2 or more files are really identical:
- move one to a temp-dir
- cp --reflink the second to the position and name of the first
- do a chown --reference, chmod --reference and touch --reference
to copy owner, file mode bits and time from the orginal to the
reflink-copy and then delete the original in temp-dir
Everything could be done with bash. Thinkable is the use of a database for the
md5sums, which could be used for other purposes in the future.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html