On May 23, 2014, at 9:48 AM, Konstantinos Skarlatos <k.skarla...@gmail.com> wrote:
> On 21/5/2014 3:58 πμ, Chris Murphy wrote: >> On May 20, 2014, at 4:56 PM, Konstantinos Skarlatos <k.skarla...@gmail.com> >> wrote: >> >>> On 21/5/2014 1:37 πμ, Mark Fasheh wrote: >>>> On Tue, May 20, 2014 at 01:07:50AM +0300, Konstantinos Skarlatos wrote: >>>>>> Duperemove will be shipping as supported software in a major SUSE >>>>>> release so >>>>>> it will be bug fixed, etc as you would expect. At the moment I'm very >>>>>> busy >>>>>> trying to fix qgroup bugs so I haven't had much time to add features, or >>>>>> handle external bug reports, etc. Also I'm not very good at advertising >>>>>> my >>>>>> software which would be why it hasn't really been mentioned on list >>>>>> lately >>>>>> :) >>>>>> >>>>>> I would say that state that it's in is that I've gotten the feature set >>>>>> to a >>>>>> point which feels reasonable, and I've fixed enough bugs that I'd >>>>>> appreciate >>>>>> folks giving it a spin and providing reasonable feedback. >>>>> Well, after having good results with duperemove with a few gigs of data, i >>>>> tried it on a 500gb subvolume. After it scanned all files, it is stuck at >>>>> 100% of one cpu core for about 5 hours, and still hasn't done any >>>>> deduping. >>>>> My cpu is an Intel(R) Xeon(R) CPU E3-1230 V2 @ 3.30GHz, so i guess thats >>>>> not the problem. So I guess the speed of duperemove drops dramatically as >>>>> data volume increases. >>>> Yeah I doubt it's your CPU. Duperemove is right now targeted at smaller >>>> data >>>> sets (a few VMS, iso images, etc) than you threw it at as you undoubtedly >>>> have figured out. It will need a bit of work before it can handle entire >>>> file systems. My guess is that it was spending an enormous amount of time >>>> finding duplicates (it has a very thorough check that could probably be >>>> optimized). >>> It finished after 9 or so hours, so I agree it was checking for duplicates. >>> It does a few GB in just seconds, so time probably scales exponentially >>> with data size. >> I'm going to guess it ran out of memory. I wonder what happens if you take >> an SSD and specify a humongous swap partition on it. Like, 4x, or more, the >> amount of installed memory. > Just tried it again, with 32GiB swap added on an SSD. My test files are > 633GiB. > duperemove -rv /storage/test 19537.67s user 183.86s system 89% cpu 6:06:56.96 > total > > Duperemove was using about 1GiB or RAM, had one core at 100%, and I think > swap was not touched at all. Guess currently it's not as memory intensive as it is cpu intensive while also not threading. Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html