Hi James,

I appreciate your accurate analysis and synthetic reporting.

Indeed, it looks Robinhood is almost idle, and filesystem metadata 
operation latency looks the limiting point, so it isn't time for tuning 
Robinhood... not yet :-)
I'm affraid to see such a latency of 28ms for a simple operation like 
"path2fid" (GET_FID stage) which is a simple entry "lookup". Is your MDS 
overloaded?

Splitting the scan can be a good idea, especially if the performance 
issue is due to the Lustre client.
In this case, I strongly advise you to run the partial scan with the 
--no-gc option. no-gc will avoid a very expensive DB operation at the 
end of the scan, which is moreover useless for an initial scan.

Regards,
Thomas


On 03/26/14 19:11, Jim Silva wrote:
> We continue to face challenges using robinhood to scan our large lustre 
> filesystems here at LLNL.  Currently, LLNL's lustre filesystems range from 
> 150M files up to 750M files (and growing).  With a recent robinhood update 
> (2.5.0) we have been forced to perform another initial scan of these 
> filesystems due to a schema change.  We are finding that on our most heavily 
> utilized filesystems we are only seeing scan rates of about 50-300 
> entries/sec on average.  At this rate, we are looking at several weeks before 
> the scan completes (>500M files).  The robinhood entry processor pipeline 
> stats often suggests there are no operations in wait status and all worker 
> threads are idle.  I suspect we are being limited by lustre performance and 
> not robinhood in this case but I wanted to run things by you, just in case 
> there are any tuning improvements we can benefit from.  We have also been 
> looking into the use of multiple lustre clients (using robinhood's "partial 
> scan" functionality), spli!
>   tting up the namespace and farming out scans using slurm.  This would 
> increase our client count and, presumably, improve our metadata performance.
>
> Below are some high-level details of our robinhood configuration:
>
> Robinhood hardware:
> Dual Xeon E5-2670 (Sandy Bridge) w/ 384GB RAM
> Database written to 8GB/s fibre-channel attached Netapp array.
>
> Software:
> RHEL6-U4
> Lustre-2.4.0
> robinhood-2.5.0
>
> EntryProcessor
> {
>    # nbr of worker threads for processing pipeline tasks
>    nb_threads = 12 ;
>
>    # Max number of operations in the Entry Processor pipeline.
>    # If the number of pending operations exceeds this limit,
>    # info collectors are suspended until this count decreases
>    max_pending_operations = 10000 ;
>
>    max_batch_size = 1000 ;
>
>    # Optionnaly specify a maximum thread count for each stage of the pipeline:
>    # <stagename>_threads_max = <n> (0: use default)
>    # STAGE_GET_FID_threads_max = 8 ;
>    # STAGE_GET_INFO_DB_threads_max = 8 ;
>    STAGE_GET_INFO_FS_threads_max = 8 ;
>    # STAGE_REPORTING_threads_max = 8 ;
>    # STAGE_DB_APPLY_threads_max = 8 ;
>
>    # if set to FALSE, classes will only be matched
>    # at policy application time (not during a scan or reading changelog)
>    match_classes = FALSE;
> }
>
>
> FS_Scan
> {
>    # simple scan interval (fixed)
>    scan_interval = 2d ;
>
>    # min/max for adaptive scan interval:
>    # the more the filesystem is full, the more frequently it is scanned.
>    #min_scan_interval = 24h ;
>    #max_scan_interval = 7d ;
>
>    # number of threads used for scanning the filesystem
>    nb_threads_scan = 12 ;
>
>    # when a scan fails, this is the delay before retrying
>    scan_retry_delay = 1h ;
>
>    # timeout for operations on the filesystem
>    scan_op_timeout = 1h ;
>    # exit if operation timeout is reached?
>    exit_on_timeout = TRUE ;
>    # external command called on scan termination
>    # special arguments can be specified: {cfg} = config file path,
>    # {fspath} = path to managed filesystem
>    #completion_command = "/path/to/my/script.sh -f {cfg} -p {fspath}" ;
>
>    # Internal scheduler granularity (for testing and of scan, hangs, ...)
>    spooler_check_interval = 1min ;
>
>    # Memory preallocation parameters
>    nb_prealloc_tasks = 256 ;
> }
>
>
>
> Recent stats of scan currently in progress:
>
> 2014/03/26 10:28:45 robinhood@locksley[90120/4]: STATS | ==================== 
> Dumping stats at 2014/03/26 10:28:45 =====================
> 2014/03/26 10:28:45 robinhood@locksley[90120/4]: STATS | ======== General 
> statistics =========
> 2014/03/26 10:28:45 robinhood@locksley[90120/4]: STATS | Daemon start time: 
> 2014/03/20 19:40:29
> 2014/03/26 10:28:45 robinhood@locksley[90120/4]: STATS | Started modules: scan
> 2014/03/26 10:28:45 robinhood@locksley[90120/4]: STATS | ======== FS scan 
> statistics =========
> 2014/03/26 10:28:45 robinhood@locksley[90120/4]: STATS | current scan 
> interval = 2.0d
> 2014/03/26 10:28:45 robinhood@locksley[90120/4]: STATS | scan is running:
> 2014/03/26 10:28:45 robinhood@locksley[90120/4]: STATS |      started at : 
> 2014/03/20 19:40:29 (5.6d ago)
> 2014/03/26 10:28:45 robinhood@locksley[90120/4]: STATS |      last action: 
> 2014/03/26 10:28:45 (00s ago)
> 2014/03/26 10:28:45 robinhood@locksley[90120/4]: STATS |      progress   : 
> 88563083 entries scanned (225 errors)
> 2014/03/26 10:28:45 robinhood@locksley[90120/4]: STATS |      avg. speed : 
> 43.62 ms/entry/thread -> 183.40 entries/sec
> 2014/03/26 10:28:45 robinhood@locksley[90120/4]: STATS |      inst. speed: 
> 30.44 ms/entry/thread -> 262.77 entries/sec
> 2014/03/26 10:28:45 robinhood@locksley[90120/4]: STATS | ==== EntryProcessor 
> Pipeline Stats ===
> 2014/03/26 10:28:45 robinhood@locksley[90120/4]: STATS | Idle threads: 12
> 2014/03/26 10:28:45 robinhood@locksley[90120/4]: STATS | Id constraints 
> count: 0 (hash min=0/max=0/avg=0.0)
> 2014/03/26 10:28:45 robinhood@locksley[90120/4]: STATS | Stage              | 
> Wait | Curr | Done |     Total | ms/op |
> 2014/03/26 10:28:45 robinhood@locksley[90120/4]: STATS |  0: GET_FID        | 
>    0 |    0 |    0 |    302725 | 27.93 |
> 2014/03/26 10:28:45 robinhood@locksley[90120/4]: STATS |  1: GET_INFO_DB    | 
>    0 |    0 |    0 |  88742078 |  0.19 |
> 2014/03/26 10:28:45 robinhood@locksley[90120/4]: STATS |  2: GET_INFO_FS    | 
>    0 |    0 |    0 |  88742078 |  3.66 |
> 2014/03/26 10:28:45 robinhood@locksley[90120/4]: STATS |  3: REPORTING      | 
>    0 |    0 |    0 |  88742078 |  0.00 |
> 2014/03/26 10:28:45 robinhood@locksley[90120/4]: STATS |  4: PRE_APPLY      | 
>    0 |    0 |    0 |  88742078 |  0.00 |
> 2014/03/26 10:28:45 robinhood@locksley[90120/4]: STATS |  5: DB_APPLY       | 
>    0 |    0 |    0 |  88742078 |  0.60 | 16.69% batched (avg batch size: 7.0)
> 2014/03/26 10:28:45 robinhood@locksley[90120/4]: STATS |  6: CHGLOG_CLR     | 
>    0 |    0 |    0 |         0 |  0.00 |
> 2014/03/26 10:28:45 robinhood@locksley[90120/4]: STATS |  7: RM_OLD_ENTRIES | 
>    0 |    0 |    0 |         0 |  0.00 |
> 2014/03/26 10:28:45 robinhood@locksley[90120/4]: STATS | DB ops: 
> get=290891/ins=7249559/upd=81492519/rm=0
>
> Just from a robinhood tuning perspective, do you see any improvements we 
> can/should make in robinhood (i.e. number of scan threads, pipeline/worker 
> threads, batching vs. threading, etc.)?
> Let me know if there is any additional information you would like.
>
> Thanks, in advance, for any suggestions you may have.
>
> Regards,
> Jim
>
> =============================
> Jim Silva
> HPC Systems Engineer
> Lawrence Livermore National Laboratory
> [email protected]
>
> ------------------------------------------------------------------------------
> Learn Graph Databases - Download FREE O'Reilly Book
> "Graph Databases" is the definitive new guide to graph databases and their
> applications. Written by three acclaimed leaders in the field,
> this first edition is now available. Download your free book today!
> http://p.sf.net/sfu/13534_NeoTech
> _______________________________________________
> robinhood-support mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/robinhood-support



------------------------------------------------------------------------------
_______________________________________________
robinhood-support mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/robinhood-support

Reply via email to