Thanks for the detailed answer. I understand that robinhood doesn't add the indexes by default, which makes sense. In my case, I have a purge policy on last_access, and from what I saw the query for the purge had: ORDER_BY ENTRIES.last_access And I would see the mysql server spend 5-30 minutes "Sorting" the query. So this meant a cycle of, purge, query, wait for the sort, purge. It wouldn't have been an issue if I didn't have millions of files to deal with, which means the cycle runs multiple times over.
I think (but I need to do more testing) that after adding the index, the Sort time went down dramatically. I guess I shouldn't be surprised the stat operation is slow. We've been dealing with performance issues on this filesystem for a while now. Seeing ms/op statistics in the logs is a great feature, I just wish I had a better idea what the normal ranges for some of these operations might be. The mysql server I'm using could certainly perform better as well. I've just got limited resources I'm working with right now. Best, Peter ----- Peter Doherty | Research Systems Administrator | Harvard Medical School HMS Research Computing | https://rc.hms.harvard.edu/ On Apr 25, 2014, at 4:27, DEGREMONT Aurelien <[email protected]> wrote: > Le 24/04/2014 16:42, Doherty, Peter Charles a écrit : >> What's the main limiting factor for purge speed? >> >> I've got one problem user who has millions of small files. Robinhood has >> been diligently working away at purging the files, but it's presently going >> at about 6 deletes per second, which strikes me as pretty slow. Adding an >> index to the last_access column in the ENTRIES table seemed to help boost >> the DB query. (If this seems like good practice, it might be worth >> mentioning in the documentation.) > Please keep in mind that RBH does not set extra indexes in DB schema in > purpose. > Indexes speed up READ requests, but add extra time to write requests (insert, > update, delete). Robinhood is designed to have faster scan/changelog > processing and slower purge/policy processes. But purge could be slower, this > is not a big deal, but if you do not scan or process changelog enough fast, > you could be in trouble. > > Admins are free to add new indexes if they think this is better for their use > case, but be aware that those will slow you scaning/changelog rate. For huge > filesystem (> 100 millions of files), it could be really a problem. > > RBH creates its purge list by batch, first creating a list of thousands of > candidate files, then unlink them in parallel. The candidate list creation > should not be an issue (even if it could be a little bit long), so I do no > think index is a good things here. Looking at your stats, your bottleneck > right know is GET_INFO_FS stage, with 20ms/op is the slowest operation in > your pipeline. RBH is doing a stat() on each file before unlinking it to be > sure the file was not changed, as a safe check. > > > Aurélien ------------------------------------------------------------------------------ Start Your Social Network Today - Download eXo Platform Build your Enterprise Intranet with eXo Platform Software Java Based Open Source Intranet - Social, Extensible, Cloud Ready Get Started Now And Turn Your Intranet Into A Collaboration Platform http://p.sf.net/sfu/ExoPlatform _______________________________________________ robinhood-support mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/robinhood-support
