2016/09/09 09:12:46 [21484/2] STATS | 1: GET_INFO_DB | 9990 | 0 |
0 | 1819630 | 0.20 |
2016/09/09 09:12:46 [21484/2] STATS | 2: GET_INFO_FS | 0 | 10 |
0 | 1819613 | 3.75 |
It shows robinhood threads are busy running FS operations, that's why
there are incoming records waiting to pass the GET_INFO_DB step, but
doesn't means there is a problem about the DB performance.
FS latency is a little high (3.75ms), but not catastrophic is it runs 10
in parallel.
If not already done, you can tune Lustre osc and mdc's max_rpcs_in_flight:
/proc/fs/lustre/*/*/max_rpcs_in_flight
Also consider tuning peer_credits of ko2iblnd module, but this requires
to tune it on servers and lnet routers too.
avg-cpu: %user %nice %system %iowait %steal %idle
0.02 0.00 80.22 0.00 0.00 19.76
This indicates most of the CPU load is consumed by the system.
It seams there is a lot of activity in the lustre client.
A possible explanation is that it manages a too large MD cache.
This can be reduced by tuning lru_size (usually between 100 and 400):
/proc/fs/lustre/ldlm/namespaces/*/lru_size
However, I think the situation will become better once robinhood will
have dequeued all changelog accumulated during your initial scan.
It is currently processing records emitted 1day 1/2 ago, but process
them 4 times faster than their accumulation speed, so it should finally
process them all.
2016/09/09 09:12:46 [21484/2] STATS | last read record time =
2016/09/07 20:28:30.099225
2016/09/09 09:12:46 [21484/2] STATS | processing speed ratio = 4.00
Regards,
Thomas
On 09/09/16 09:24, Marcin Stolarek wrote:
OK, I've tried a few times. Did some mysql tuning and what I have now is:
2016/09/09 09:12:46 [21484/2] STATS | ==================== Dumping
stats at 2016/09/09 09:12:46 =====================
2016/09/09 09:12:46 [21484/2] STATS | ======== General statistics
=========
2016/09/09 09:12:46 [21484/2] STATS | Daemon start time: 2016/09/09
08:42:45
2016/09/09 09:12:46 [21484/2] STATS | Started modules: log_reader
2016/09/09 09:12:46 [21484/2] STATS | ChangeLog reader #0:
2016/09/09 09:12:46 [21484/2] STATS | fs_name = plkra
2016/09/09 09:12:46 [21484/2] STATS | mdt_name = MDT0000
2016/09/09 09:12:46 [21484/2] STATS | reader_id = cl1
2016/09/09 09:12:46 [21484/2] STATS | records read = 1830902
2016/09/09 09:12:46 [21484/2] STATS | interesting records = 1830620
2016/09/09 09:12:46 [21484/2] STATS | suppressed records = 282
2016/09/09 09:12:46 [21484/2] STATS | records pending = 1000
2016/09/09 09:12:46 [21484/2] STATS | last received =
2016/09/09 09:04:13
2016/09/09 09:12:46 [21484/2] STATS | last read record time =
2016/09/07 20:28:30.099225
2016/09/09 09:12:46 [21484/2] STATS | last read record id =
18391116
2016/09/09 09:12:46 [21484/2] STATS | last pushed record id =
18390117
2016/09/09 09:12:46 [21484/2] STATS | last committed record id =
18380083
2016/09/09 09:12:46 [21484/2] STATS | last cleared record id =
18379864
2016/09/09 09:12:46 [21484/2] STATS | read speed = 677.18
record/sec
2016/09/09 09:12:46 [21484/2] STATS | processing speed ratio = 4.00
2016/09/09 09:12:46 [21484/2] STATS | status = busy
2016/09/09 09:12:46 [21484/2] STATS | ChangeLog stats:
2016/09/09 09:12:46 [21484/2] STATS | MARK: 0, CREAT: 262, MKDIR:
9, HLINK: 0, SLINK: 0, MKNOD: 0, UNLNK: 7, RMDIR: 0, RENME: 0
2016/09/09 09:12:46 [21484/2] STATS | RNMTO: 0, OPEN: 0, CLOSE: 0,
LYOUT: 0, TRUNC: 48, SATTR: 1830576, XATTR: 0, HSM: 0
2016/09/09 09:12:46 [21484/2] STATS | MTIME: 0, CTIME: 0, ATIME: 0
2016/09/09 09:12:46 [21484/2] STATS | ==== EntryProcessor Pipeline
Stats ===
2016/09/09 09:12:46 [21484/2] STATS | Idle threads: 0
2016/09/09 09:12:46 [21484/2] STATS | Id constraints count: 10000
(hash min=0/max=6/avg=0.6)
2016/09/09 09:12:46 [21484/2] STATS | Name constraints count: 18 (hash
min=0/max=1/avg=0.0)
2016/09/09 09:12:46 [21484/2] STATS | Stage | Wait | Curr | Done |
Total | ms/op |
2016/09/09 09:12:46 [21484/2] STATS | 0: GET_FID | 0 | 0 |
0 | 0 | 0.00 |
2016/09/09 09:12:46 [21484/2] STATS | 1: GET_INFO_DB | 9990 | 0 |
0 | 1819630 | 0.20 |
2016/09/09 09:12:46 [21484/2] STATS | 2: GET_INFO_FS | 0 | 10 |
0 | 1819613 | 3.75 |
2016/09/09 09:12:46 [21484/2] STATS | 3: PRE_APPLY | 0 | 0 |
0 | 1819606 | 0.00 |
2016/09/09 09:12:46 [21484/2] STATS | 4: DB_APPLY | 0 | 0 |
0 | 1819606 | 0.17 | 99.76% batched (avg batch size: 17.0)
2016/09/09 09:12:46 [21484/2] STATS | 5: CHGLOG_CLR | 0 | 0 |
0 | 1819620 | 0.01 |
2016/09/09 09:12:46 [21484/2] STATS | 6: RM_OLD_ENTRIES | 0 | 0
| 0 | 0 | 0.00 |
2016/09/09 09:12:46 [21484/2] STATS | DB ops:
get=1819360/ins=246/upd=1819360/rm=0
2016/09/09 09:12:46 [21484/2] STATS | --- Pipeline stage details ---
2016/09/09 09:12:46 [21484/2] STATS | GET_INFO_DB : first: changelog
record #18380115, fid=[0x2000291ab:0xdb7b:0x0], status=waiting
2016/09/09 09:12:46 [21484/2] STATS | GET_INFO_DB : last: changelog
record #18390116, fid=[0x200002eae:0x1b148:0x0], status=waiting
2016/09/09 09:12:46 [21484/2] STATS | GET_INFO_FS : first: changelog
record #18380085, fid=[0x200025ab0:0x1a350:0x0], status=processing
2016/09/09 09:12:46 [21484/2] STATS | GET_INFO_FS : last: changelog
record #18380112, fid=[0x2000291ab:0xdb7a:0x0], status=processing
Whay do someone experienced with robinhood can read from this? I see
that I have a lot of waiting GET_INFO_DB operations - does it mean my
database is responding slow? Should I check slow queries?
My server responsiveness is limited :)
What I have from I/O stat is:
avg-cpu: %user %nice %system %iowait %steal %idle
0.02 0.00 80.22 0.00 0.00 19.76
Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn
sda 1.88 0.00 0.02 0 0
cheers,
Marcin
2016-09-07 13:31 GMT+02:00 LEIBOVICI Thomas <[email protected]
<mailto:[email protected]>>:
On 09/07/16 13:03, Marcin Stolarek wrote:
Hi,
I installed robinhood 3.0-rc1 with lustre 2.5. Initial scan of
the filesystem took around 2 days. For this time the changelog
was enabled.
Today I started:
#robinhood --readlog --once
and after a few hours robinhood process started to use a lot of
CPU (around 700% - everything available). Is this normal?
Why not... if it's for doing useful things, such as processing
bursts of changelogs :)
But I'd say it may also be the sign of something wrong...
You can check if the changelog processing is going well and how it
evolved:
grep STATS /var/log/robinhood.log
Thomas
cheers,
Marcin
------------------------------------------------------------------------------
_______________________________________________
robinhood-support mailing list
[email protected]
<mailto:[email protected]>
https://lists.sourceforge.net/lists/listinfo/robinhood-support
<https://lists.sourceforge.net/lists/listinfo/robinhood-support>
------------------------------------------------------------------------------
_______________________________________________
robinhood-support mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/robinhood-support