Hi Carmelo, What I found is once the local vfs cache is full, robinhood GET_INFO_FS speed drops. You can monitor (via /proc/meminfo) buffer, cached & slab memory usage when starting robinhood, you will see these values reach threshold after a time. Once this state is reached, parameters lru_max_age and vfs_cache_pressure are important.
You can also "flush" lustre cached ldlm locks by echoing "clear" to lru_size parameter. regards, chris hunter yale hpc group On 05/07/2015 11:34 AM, Carmelo Ponti (CSCS) wrote: > Chris > > Thank you very much for your advices. > > First I tried changing "open files" limits (file locks is already > unlimited) but I didn't see a real improve. I tested also different > values of vfs_cache_pressure but also here I didn't see much difference. > > When I started testing max_rpcs_in_flight (only the mdc parameters as > you advice) I got impressed: from 1700 to 4900 record/s. > > 2015/05/07 13:56:22 robinhood@daintrbh01[4199/1] STATS | read speed > = 1707.75 record/sec > 2015/05/07 13:57:22 robinhood@daintrbh01[4199/1] STATS | read speed > = 3823.17 record/sec > 2015/05/07 13:58:22 robinhood@daintrbh01[4199/1] STATS | read speed > = 4903.52 record/sec > 2015/05/07 13:59:22 robinhood@daintrbh01[4199/1] STATS | read speed > = 3986.62 record/sec > 2015/05/07 14:00:22 robinhood@daintrbh01[4199/1] STATS | read speed > = 5474.05 record/sec > 2015/05/07 14:01:22 robinhood@daintrbh01[4199/1] STATS | read speed > = 4990.08 record/sec > > Unfortunately this is not constant and after a while the speed went down > again. In this moment I'm seeing more or less values between 1454 to > 3059.43 but in average is faster than before: > > 2015/05/07 17:14:01 robinhood@daintrbh01[21325/1] STATS | read speed > = 2426.47 record/sec > 2015/05/07 17:15:01 robinhood@daintrbh01[21325/1] STATS | read speed > = 1454.42 record/sec > 2015/05/07 17:16:01 robinhood@daintrbh01[21325/1] STATS | read speed > = 2312.18 record/sec > 2015/05/07 17:17:02 robinhood@daintrbh01[21325/1] STATS | read speed > = 2692.42 record/sec > 2015/05/07 17:18:02 robinhood@daintrbh01[21325/1] STATS | read speed > = 1810.07 record/sec > 2015/05/07 17:19:02 robinhood@daintrbh01[21325/1] STATS | read speed > = 3059.43 record/sec > > Also here I tested different values but I think 32 is the best for my > environment. > > I set up also /proc/sys/lnet/debug to 0 following the advices I found in > this site > https://urldefense.proofpoint.com/v2/url?u=https-3A__blogs.oracle.com_atulvid_entry_improving-5Fperformance-5Fof-5Fsmall-5Ffiles&d=AwICaQ&c=-dg2m7zWuuDZ0MUcV7Sdqw&r=d_G2h_sZYG4xtHMeKo8QgjDmOcMVdQvYgM-5Dri1AOY&m=gYAAGt33vLSjPpCjTYt5TObr43Q--RejLi8E3i5rpHg&s=-U8VRYqPuM8pTCydrw0nAVatR3Ggjp5aaYWtEJNgykw&e= > > lru_size is 400 and lru_max_age 36000000 and for the moment I'm keeping > this two values. > > Thank you again > Carmelo > > On Wed, 2015-05-06 at 14:45 -0400, Chris Hunter wrote: >> If re-mounting lustre temporarily improves GET_INFO_FS, you may be >> hitting client cache or parameter limit. A few suggestions to try on the >> lustre client: >> - check ulimit settings before starting robinhood, especially number of >> open files & file locks >> - set sysctl vfs_cache pressure for more aggressive reclaim of dentries >> & inodes >> - on your lustre client, check mdc parameters: max_rpcs_in_flight, >> lru_size & lru_max_age. FYI, its common to tune the lustre osc >> parameters but rbh is more sensitive to mdc parameters >> >> regards, >> chris hunter >> yale hpc group > > ------------------------------------------------------------------------------ One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y _______________________________________________ robinhood-support mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/robinhood-support
