On Thu, Apr 24, 2014 at 7:42 AM, Doherty, Peter Charles
<[email protected]> wrote:
> What's the main limiting factor for purge speed?
>
> I've got one problem user who has millions of small files.

Only one? Buy some lotto tickets, have some real good luck ;-p


> Robinhood has been diligently working away at purging the files, but it's
> presently going at about 6 deletes per second, which strikes me as pretty 
> slow.
> Adding an index to the last_access column in the ENTRIES table seemed to help
> boost the DB query. (If this seems like good practice, it might be worth 
> mentioning
> in the documentation.)

It depends. By adding another index, inserts have an extra plenty.
depending on how the index was created (as an additional row to a
current index or as an entirely new key) the costs vary. Ideally it
would be an additional row to a current index, so its within the same
look up.

I am not sure how RBH actually performs file deletions but as a
general FYI use rsync. A great write up _use to_ exists however way
back machine was able to retrieve it:

https://web.archive.org/web/20130929001850/http://linuxnote.net/jianingy/en/linux/a-fast-way-to-remove-huge-number-of-files.html

As a note for Thomas and the other RBH devs: This may be a faster way
to purge files: https://gist.github.com/jzwinck/5692534


> Does robinhood batch the unlink operations?  Is there anything else I'm
> missing that would explain why the purge is crawling along?

>From the output, it appears your bottle neck is not the actually file
deletion operation but rather the database. The GET_INFO_DB (select
statements) along with DB_APPLY.

I suggest you run

  wget mysqltuner.pl
  perl mysqltuner.pl

And see if you can improve your database performance on the
GET_INFO_DB operation. You may also want to increase the number of
threads to perform this action (only 7 with none of them idle).

>
> 2014/04/24 09:51:18 [24573/1] STATS | ==== EntryProcessor Pipeline Stats ===
> 2014/04/24 09:51:18 [24573/1] STATS | Idle threads: 0
> 2014/04/24 09:51:18 [24573/1] STATS | Id constraints count: 10000 (hash 
> min=0/max=7/avg=1.3)
> 2014/04/24 09:51:18 [24573/1] STATS | Stage              | Wait | Curr | Done 
> |     Total | ms/op |
> 2014/04/24 09:51:18 [24573/1] STATS |  0: GET_FID        |    0 |    0 |    0 
> |         0 |  0.00 |
> 2014/04/24 09:51:18 [24573/1] STATS |  1: GET_INFO_DB    |  428 |    0 | 8924 
> |   6890996 |  2.58 |
> 2014/04/24 09:51:18 [24573/1] STATS |  2: GET_INFO_FS    |  305 |    7 |   15 
> |   6349673 | 20.29 |
> 2014/04/24 09:51:18 [24573/1] STATS |  3: REPORTING      |    0 |    0 |    0 
> |   6028488 |  0.00 |
> 2014/04/24 09:51:18 [24573/1] STATS |  4: PRE_APPLY      |    0 |    0 |    0 
> |   6322256 |  0.00 |
> 2014/04/24 09:51:18 [24573/1] STATS |  5: DB_APPLY       |  280 |    1 |   40 
> |   6321975 |  5.49 | 53.19% batched (avg batch size: 3.1)
> 2014/04/24 09:51:18 [24573/1] STATS |  6: CHGLOG_CLR     |    0 |    0 |    0 
> |   6881424 |  0.01 |
> 2014/04/24 09:51:18 [24573/1] STATS |  7: RM_OLD_ENTRIES |    0 |    0 |    0 
> |         0 |  0.00 |




--
Adam Brenner
Computer Science, Undergraduate Student
Donald Bren School of Information and Computer Sciences

System Administrator, HPC Cluster
Office of Information Technology
http://hpc.oit.uci.edu/

University of California, Irvine
www.ics.uci.edu/~aebrenne/
[email protected]

------------------------------------------------------------------------------
Start Your Social Network Today - Download eXo Platform
Build your Enterprise Intranet with eXo Platform Software
Java Based Open Source Intranet - Social, Extensible, Cloud Ready
Get Started Now And Turn Your Intranet Into A Collaboration Platform
http://p.sf.net/sfu/ExoPlatform
_______________________________________________
robinhood-support mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/robinhood-support

Reply via email to