On Fri, Sep 07, 2018 at 09:27:28AM +0530, Lakshmipathi.G wrote: > > One question: > > Why not ioctl_fideduperange? > > i.e. you kill most of benefits from that ioctl - atomicity. > > > I plan to add fideduperange as an option too. User can > choose between fideduperange and ficlonerange call. > > If I'm not wrong, with fideduperange, kernel performs > comparsion check before dedupe. And it will increase > time to dedupe files.
You already read the files to md5sum them, so you have no speed gain. You get nasty data-losing races, and risk collisions as well. md5sum is safe against random occurences (compared eg. to the chance of lightning hitting you today), but is exploitable by a hostile user. On the other hand, full bit-to-bit comparison is faster and 100% safe. You can't skip verification -- the checksums are only 32-bit. They have a 1:4G chance to mismatch, which means you can expect one false positive with 64K extents, rising quadratically as the number of files grows. Meow! -- ⢀⣴⠾⠻⢶⣦⠀ ⣾⠁⢠⠒⠀⣿⡁ ⢿⡄⠘⠷⠚⠋⠀ Collisions shmolisions, let's see them find a collision or second ⠈⠳⣄⠀⠀⠀⠀ preimage for double rot13!