Hi Mike,

On 06/18/14 21:26, Mike Hanby wrote:
The last file system scan was the result of the database crashing due to memory over consumption. When I run a manual scan (sudo /usr/sbin/robinhood -L stdout --scan --once) the database due to a steady memory consumption until all memory and swap is consumed (see attached graph). I tried bumping the system RAM from 4G to 16G the issue persists, it just takes longer to consume all memory (approx 3 hours). The robinhood vm is running it's own MySQL server.

I can't see in the "ps auxf" or top output what's using the memory. Everything shows as using , only that free shows it being consumed and eventually the database dies. (See attached graph image).

The following output is with 10G of 16G consumed
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START TIME COMMAND
root 1561 0.0 0.0 108204 1196 ? S Jun13 0:00 /bin/sh /usr/bin/mysqld_safe --datadir=/var/lib/mysql --socket=/var/lib/mysql/mysql.sock --pid-file=/var/run/mysqld/mysqld.pid --basedir=/usr --user=mysql mysql 1695 3.5 0.8 1606356 145168 ? Sl Jun13 261:09 \_ /usr/libexec/mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-error=/var/lib/mysql/robinhood.uabgrid.uab.edu.err --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock --port=3306 root 3914 20.3 0.0 748796 11164 ? Ssl Jun16 587:18 /usr/sbin/robinhood -d -f /etc/robinhood.d/tmpfs/scratch.conf -p /var/run/rbh.scratch

$ free -m
             total       used       free     shared buffers     cached
Mem:         15949      15623        326 0        164       5447
-/+ buffers/cache:      10011       5938
Swap:         3967         17       3950

Indeed, Robinhood and Mysql don't use a lot of memory.

'free' output indicates that buffers and cache use most of the memory, so I suspect a memory management issue in the Lustre client (e.g. inode caching?). You can try "cat /proc/slabinfo". It shows allocated kernel structures, so it may indicate the guilty object.

Maybe I just need to wipe the database and start over.
I don't think your issue has something to do with MySQL.

And here's the latest from the logs:
2014/06/18 09:25:51 [3914/1] STATS | ==================== Dumping stats at 2014/06/18 09:25:51 =====================
2014/06/18 09:25:51 [3914/1] STATS | ======== General statistics =========
2014/06/18 09:25:51 [3914/1] STATS | Daemon start time: 2014/06/16 14:10:04
2014/06/18 09:25:51 [3914/1] STATS | Started modules: log_reader
2014/06/18 09:25:51 [3914/1] STATS | ChangeLog reader #0:
2014/06/18 09:25:51 [3914/1] STATS |    fs_name    = lustre
2014/06/18 09:25:51 [3914/1] STATS |    mdt_name   = MDT0000
2014/06/18 09:25:51 [3914/1] STATS |    reader_id  = cl1
2014/06/18 09:25:51 [3914/1] STATS |    records read        = 9468782
2014/06/18 09:25:51 [3914/1] STATS |    interesting records = 5737340
2014/06/18 09:25:51 [3914/1] STATS |    suppressed records  = 3731442
2014/06/18 09:25:51 [3914/1] STATS |    records pending     = 0
2014/06/18 09:25:51 [3914/1] STATS | last received = 2014/06/18 09:25:19 2014/06/18 09:25:51 [3914/1] STATS | last read record time = 2014/06/18 09:25:18.258686 2014/06/18 09:25:51 [3914/1] STATS | last read record id = 35953796 2014/06/18 09:25:51 [3914/1] STATS | last pushed record id = 35953796 2014/06/18 09:25:51 [3914/1] STATS | last committed record id = 35953796 2014/06/18 09:25:51 [3914/1] STATS | last cleared record id = 35953796 2014/06/18 09:25:51 [3914/1] STATS | read speed = 6.00 record/sec (0.05 incl. idle time)
2014/06/18 09:25:51 [3914/1] STATS |    processing speed ratio   = 0.98
2014/06/18 09:25:51 [3914/1] STATS |    ChangeLog stats:
2014/06/18 09:25:51 [3914/1] STATS | MARK: 0, CREAT: 1882493, MKDIR: 99656, HLINK: 26, SLINK: 49497, MKNOD: 334, UNLNK: 1045468 2014/06/18 09:25:51 [3914/1] STATS | RMDIR: 1737, RENME: 634549, RNMTO: 0, OPEN: 0, CLOSE: 2236913, LYOUT: 0, TRUNC: 0 2014/06/18 09:25:51 [3914/1] STATS | SATTR: 3426566, XATTR: 1, HSM: 0, MTIME: 9250, CTIME: 82292, ATIME: 0 2014/06/18 09:25:51 [3914/1] STATS | ==== EntryProcessor Pipeline Stats ===
2014/06/18 09:25:51 [3914/1] STATS | Idle threads: 8
2014/06/18 09:25:51 [3914/1] STATS | Id constraints count: 0 (hash min=0/max=0/avg=0.0) 2014/06/18 09:25:51 [3914/1] STATS | Stage | Wait | Curr | Done | Total | ms/op | 2014/06/18 09:25:51 [3914/1] STATS | 0: GET_FID | 0 | 0 | 0 | 0 | 0.00 | 2014/06/18 09:25:51 [3914/1] STATS | 1: GET_INFO_DB | 0 | 0 | 0 | 6374430 | 3.60 | 2014/06/18 09:25:51 [3914/1] STATS | 2: GET_INFO_FS | 0 | 0 | 0 | 4690135 | 8.62 | 2014/06/18 09:25:51 [3914/1] STATS | 3: REPORTING | 0 | 0 | 0 | 3610533 | 0.01 | 2014/06/18 09:25:51 [3914/1] STATS | 4: PRE_APPLY | 0 | 0 | 0 | 4269877 | 0.00 | 2014/06/18 09:25:51 [3914/1] STATS | 5: DB_APPLY | 0 | 0 | 0 | 4269877 | 8.25 | 51.81% batched (avg batch size: 9.3) 2014/06/18 09:25:51 [3914/1] STATS | 6: CHGLOG_CLR | 0 | 0 | 0 | 6374430 | 0.07 | 2014/06/18 09:25:51 [3914/1] STATS | 7: RM_OLD_ENTRIES | 0 | 0 | 0 | 0 | 0.00 | 2014/06/18 09:25:51 [3914/1] STATS | DB ops: get=3252948/ins=1017541/upd=2592992/rm=659344


This looks sane.

Regards,
Thomas
------------------------------------------------------------------------------
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://p.sf.net/sfu/hpccsystems
_______________________________________________
robinhood-support mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/robinhood-support

Reply via email to