On 02/06/2015 06:20 AM, Qu Wenruo wrote:
From: Lutz Vieweg <l...@5t9.de>
use case: You have two huge files on a btrfs, you assume they contain the same 
bytes,
but you do not know for sure.

Is there a way to get a checksum of both files from btrfs with less effort than
reading the whole of both files and computing a hash sum?
For short, NO.

For long:
For current implement, btrfs use calculate 4K sector into 4bytes(32bit) crc32 
and restore it into
csum tree.

So, for large files, e.g. 1G(already quite small for modern storage), its 
checksum will be 1M in size.
Which means even using crc32 (same as kernel and crc32(a+b) = crc32(a) + 
crc32(b)), you still needs to
do crc32 on the all 1M crc32.

And yet, having to read only 1 MB checksums instead of 1 GB data sounds
like a good deal - is there some userspace interface allowing to read
(only) those per-4k checksums for a file?

But there is still some case btrfs can help you determine whether the files are 
the same in a faster
way.
Prerequisite:
The two files are copied using clone(cp --reflink command) or deduplicated

In my case I know for sure that no cloning/deduplication happened when
the files were written.

Regards,

Lutz Vieweg

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to