We continue to face challenges using robinhood to scan our large lustre
filesystems here at LLNL. Currently, LLNL's lustre filesystems range from 150M
files up to 750M files (and growing). With a recent robinhood update (2.5.0)
we have been forced to perform another initial scan of these filesystems due to
a schema change. We are finding that on our most heavily utilized filesystems
we are only seeing scan rates of about 50-300 entries/sec on average. At this
rate, we are looking at several weeks before the scan completes (>500M files).
The robinhood entry processor pipeline stats often suggests there are no
operations in wait status and all worker threads are idle. I suspect we are
being limited by lustre performance and not robinhood in this case but I wanted
to run things by you, just in case there are any tuning improvements we can
benefit from. We have also been looking into the use of multiple lustre
clients (using robinhood's "partial scan" functionality), splitting up
the namespace and farming out scans using slurm. This would increase our
client count and, presumably, improve our metadata performance.
Below are some high-level details of our robinhood configuration:
Robinhood hardware:
Dual Xeon E5-2670 (Sandy Bridge) w/ 384GB RAM
Database written to 8GB/s fibre-channel attached Netapp array.
Software:
RHEL6-U4
Lustre-2.4.0
robinhood-2.5.0
EntryProcessor
{
# nbr of worker threads for processing pipeline tasks
nb_threads = 12 ;
# Max number of operations in the Entry Processor pipeline.
# If the number of pending operations exceeds this limit,
# info collectors are suspended until this count decreases
max_pending_operations = 10000 ;
max_batch_size = 1000 ;
# Optionnaly specify a maximum thread count for each stage of the pipeline:
# <stagename>_threads_max = <n> (0: use default)
# STAGE_GET_FID_threads_max = 8 ;
# STAGE_GET_INFO_DB_threads_max = 8 ;
STAGE_GET_INFO_FS_threads_max = 8 ;
# STAGE_REPORTING_threads_max = 8 ;
# STAGE_DB_APPLY_threads_max = 8 ;
# if set to FALSE, classes will only be matched
# at policy application time (not during a scan or reading changelog)
match_classes = FALSE;
}
FS_Scan
{
# simple scan interval (fixed)
scan_interval = 2d ;
# min/max for adaptive scan interval:
# the more the filesystem is full, the more frequently it is scanned.
#min_scan_interval = 24h ;
#max_scan_interval = 7d ;
# number of threads used for scanning the filesystem
nb_threads_scan = 12 ;
# when a scan fails, this is the delay before retrying
scan_retry_delay = 1h ;
# timeout for operations on the filesystem
scan_op_timeout = 1h ;
# exit if operation timeout is reached?
exit_on_timeout = TRUE ;
# external command called on scan termination
# special arguments can be specified: {cfg} = config file path,
# {fspath} = path to managed filesystem
#completion_command = "/path/to/my/script.sh -f {cfg} -p {fspath}" ;
# Internal scheduler granularity (for testing and of scan, hangs, ...)
spooler_check_interval = 1min ;
# Memory preallocation parameters
nb_prealloc_tasks = 256 ;
}
Recent stats of scan currently in progress:
2014/03/26 10:28:45 robinhood@locksley[90120/4]: STATS | ====================
Dumping stats at 2014/03/26 10:28:45 =====================
2014/03/26 10:28:45 robinhood@locksley[90120/4]: STATS | ======== General
statistics =========
2014/03/26 10:28:45 robinhood@locksley[90120/4]: STATS | Daemon start time:
2014/03/20 19:40:29
2014/03/26 10:28:45 robinhood@locksley[90120/4]: STATS | Started modules: scan
2014/03/26 10:28:45 robinhood@locksley[90120/4]: STATS | ======== FS scan
statistics =========
2014/03/26 10:28:45 robinhood@locksley[90120/4]: STATS | current scan interval
= 2.0d
2014/03/26 10:28:45 robinhood@locksley[90120/4]: STATS | scan is running:
2014/03/26 10:28:45 robinhood@locksley[90120/4]: STATS | started at :
2014/03/20 19:40:29 (5.6d ago)
2014/03/26 10:28:45 robinhood@locksley[90120/4]: STATS | last action:
2014/03/26 10:28:45 (00s ago)
2014/03/26 10:28:45 robinhood@locksley[90120/4]: STATS | progress :
88563083 entries scanned (225 errors)
2014/03/26 10:28:45 robinhood@locksley[90120/4]: STATS | avg. speed :
43.62 ms/entry/thread -> 183.40 entries/sec
2014/03/26 10:28:45 robinhood@locksley[90120/4]: STATS | inst. speed:
30.44 ms/entry/thread -> 262.77 entries/sec
2014/03/26 10:28:45 robinhood@locksley[90120/4]: STATS | ==== EntryProcessor
Pipeline Stats ===
2014/03/26 10:28:45 robinhood@locksley[90120/4]: STATS | Idle threads: 12
2014/03/26 10:28:45 robinhood@locksley[90120/4]: STATS | Id constraints count:
0 (hash min=0/max=0/avg=0.0)
2014/03/26 10:28:45 robinhood@locksley[90120/4]: STATS | Stage |
Wait | Curr | Done | Total | ms/op |
2014/03/26 10:28:45 robinhood@locksley[90120/4]: STATS | 0: GET_FID |
0 | 0 | 0 | 302725 | 27.93 |
2014/03/26 10:28:45 robinhood@locksley[90120/4]: STATS | 1: GET_INFO_DB |
0 | 0 | 0 | 88742078 | 0.19 |
2014/03/26 10:28:45 robinhood@locksley[90120/4]: STATS | 2: GET_INFO_FS |
0 | 0 | 0 | 88742078 | 3.66 |
2014/03/26 10:28:45 robinhood@locksley[90120/4]: STATS | 3: REPORTING |
0 | 0 | 0 | 88742078 | 0.00 |
2014/03/26 10:28:45 robinhood@locksley[90120/4]: STATS | 4: PRE_APPLY |
0 | 0 | 0 | 88742078 | 0.00 |
2014/03/26 10:28:45 robinhood@locksley[90120/4]: STATS | 5: DB_APPLY |
0 | 0 | 0 | 88742078 | 0.60 | 16.69% batched (avg batch size: 7.0)
2014/03/26 10:28:45 robinhood@locksley[90120/4]: STATS | 6: CHGLOG_CLR |
0 | 0 | 0 | 0 | 0.00 |
2014/03/26 10:28:45 robinhood@locksley[90120/4]: STATS | 7: RM_OLD_ENTRIES |
0 | 0 | 0 | 0 | 0.00 |
2014/03/26 10:28:45 robinhood@locksley[90120/4]: STATS | DB ops:
get=290891/ins=7249559/upd=81492519/rm=0
Just from a robinhood tuning perspective, do you see any improvements we
can/should make in robinhood (i.e. number of scan threads, pipeline/worker
threads, batching vs. threading, etc.)?
Let me know if there is any additional information you would like.
Thanks, in advance, for any suggestions you may have.
Regards,
Jim
=============================
Jim Silva
HPC Systems Engineer
Lawrence Livermore National Laboratory
[email protected]
------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
_______________________________________________
robinhood-support mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/robinhood-support