Thomas,

Touching a file every 5 minutes contained the slab memory usage and permitted 
the scan to complete.


Thank you once again,

Dan Milroy

From: Daniel Milroy
Sent: Wednesday, December 11, 2013 12:00 PM
To: 'LEIBOVICI Thomas'
Cc: [email protected]; Peter A Ruprecht
Subject: RE: [robinhood-support] Slab memory usage during initial robinhood scan

Thomas,

LU-2613 looks like a good candidate for the cause of the problem we're seeing.  
I've restarted the robinhood scan and included a cron job that touches a file 
every 5 minutes.  I'll report back as soon as I have more information as to the 
effect the cron job has on slab memory usage.


Thank you,

Dan Milroy

From: LEIBOVICI Thomas [mailto:[email protected]]
Sent: Wednesday, December 11, 2013 3:11 AM
To: Daniel Milroy
Cc: [email protected]<mailto:[email protected]>; 
Peter A Ruprecht
Subject: Re: [robinhood-support] Slab memory usage during initial robinhood scan

Hello Daniel,

It looks like a wrong behavior in the Lustre client, that seams to be related 
to this issue: https://jira.hpdd.intel.com/browse/LU-2613.
According to this JIRA, this issue is currently fixed in 2.6.0 and 2.5.1 and it 
is currently being backported to 2.4.

There is a long discussion in this ticket. Taking a brief long at it, Andreas 
suggests the following workaround:
"periodically touch any file on the client to force a transaction so that the 
last_committed value is updated, and the saved RPCs will be flushed."
I am interested in your feedback, to know if this workaround fixed your issue. 
Thanks.

Regards,
Thomas

On 12/09/13 19:38, Daniel Milroy wrote:
Hello,

I've been attempting to populate the robinhood database with a complete initial 
scan, but have run into problems with slab memory.  The scan will run for 
nearly 48 hours, completing approximately 2/3 (1000) of our users' directories. 
 Then within 10 minutes the memory usage quadruples and the host begins 
swapping.  Shortly thereafter OOM killer stops mysql, robinhood, httpd, and 
others.

Robinhood 2.4.3-1 is installed on a host running RHEL 6.4, kernel 
2.6.32-358.23.2.el6.x86_64 and Lustre client 2.1.6, with 48GB of RAM, 12 
logical CPUs, and QDR InfiniBand connectivity to our storage cluster.  Our 
Lustre system is comprised of 2 MDSes and 8 OSSes with QDR InfiniBand 
connectivity, running server 2.1.6.  The file system is 850TB and currently 
contains 433 million files.

When the near discrete transition between steady-state memory usage and rapid 
increase occurs, slab consumes greater than 48GB.  Preceding this, the largest 
process memory consumer is mysql (~13%).  I've take the following recommended 
steps to restrict slab:

                Set /proc/sys/vm/vfs_cache_pressure to 10,000
                overcommit_ratio to 2
                dirty_background_ratio to 5
                dirty_ratio to 20
                drop_caches to 2
                Cronned an echo to drop_caches every 10 minutes.  This crashed 
the host after several hours.

In my.cnf:

                Set innodb_buffer_pool_size to 6GB
                innodb_thread_concurrency to 24
                innodb_max_dirty_pages_pct to 15
                max_connections to 512

Note that I experimented with smaller parameters in my.cnf to see if they would 
reduce memory usage, to no effect.  Prior to changing the values in /proc/sys, 
used memory would be slightly over 30GB before the period of rapid increase in 
consumption.  The values contained usage to 10GB until the discrete transition.

What can I do to permit the scan to complete?


Thank you in advance for any advice,

Dan Milroy




------------------------------------------------------------------------------

Rapidly troubleshoot problems before they affect your business. Most IT

organizations don't have a clear picture of how application performance

affects their revenue. With AppDynamics, you get 100% visibility into your

Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!

http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk



_______________________________________________

robinhood-support mailing list

[email protected]<mailto:[email protected]>

https://lists.sourceforge.net/lists/listinfo/robinhood-support

------------------------------------------------------------------------------
Rapidly troubleshoot problems before they affect your business. Most IT 
organizations don't have a clear picture of how application performance 
affects their revenue. With AppDynamics, you get 100% visibility into your 
Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
_______________________________________________
robinhood-support mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/robinhood-support

Reply via email to