Wojciech Turek wrote: > Hi Cliff, > > On 7 Nov 2007, at 17:58, Cliff White wrote: > >> Wojciech Turek wrote: >>> Hi, >>> Our lustre environment is: >>> 2.6.9-55.0.9.EL_lustre.1.6.3smp >>> I would like to change recovery timeout from default value 250s to >>> something longer >>> I tried example from manual: >>> set_timeout <secs> Sets the timeout (obd_timeout) for a server >>> to wait before failing recovery. >>> We performed that experiment on our test lustre installation with one >>> OST. >>> storage02 is our OSS >>> [EMAIL PROTECTED] ~]# lctl dl >>> 0 UP mgc [EMAIL PROTECTED] 31259d9b-e655-cdc4-c760-45d3df426d86 5 >>> 1 UP ost OSS OSS_uuid 3 >>> 2 UP obdfilter home-md-OST0001 home-md-OST0001_UUID 7 >>> [EMAIL PROTECTED] ~]# lctl --device 2 set_timeout 600 >>> set_timeout has been deprecated. Use conf_param instead. >>> e.g. conf_param lustre-MDT0000 obd_timeout=50 >>> usage: conf_param obd_timeout=<secs> >>> run <command> after connecting to device <devno> >>> --device <devno> <command [args ...]> >>> [EMAIL PROTECTED] ~]# lctl --device 1 conf_param obd_timeout=600 >>> No device found for name MGS: Invalid argument >>> error: conf_param: No such device >>> It looks like I need to run this command from MGS node so I moved >>> then to MGS server called storage03 >>> [EMAIL PROTECTED] ~]# lctl dl >>> 0 UP mgs MGS MGS 9 >>> 1 UP mgc [EMAIL PROTECTED] f51a910b-a08e-4be6-5ada-b602a5ca9ab3 5 >>> 2 UP mdt MDS MDS_uuid 3 >>> 3 UP lov home-md-mdtlov home-md-mdtlov_UUID 4 >>> 4 UP mds home-md-MDT0000 home-md-MDT0000_UUID 5 >>> 5 UP osc home-md-OST0001-osc home-md-mdtlov_UUID 5 >>> [EMAIL PROTECTED] ~]# lctl device 5 >>> [EMAIL PROTECTED] ~]# lctl conf_param obd_timeout=600 >>> error: conf_param: Function not implemented >>> [EMAIL PROTECTED] ~]# lctl --device 5 conf_param obd_timeout=600 >>> error: conf_param: Function not implemented >>> [EMAIL PROTECTED] ~]# lctl help conf_param >>> conf_param: set a permanent config param. This command must be run on >>> the MGS node >>> usage: conf_param <target.keyword=val> ... >>> [EMAIL PROTECTED] ~]# lctl conf_param home-md-MDT0000.obd_timeout=600 >>> error: conf_param: Invalid argument >>> [EMAIL PROTECTED] ~]# >>> I searched whole /proc/*/lustre for file that can store this timeout >>> value but nothing were found. >>> Could someone advise how to change value for recovery timeout? >>> Cheers, >>> Wojciech Turek >> >> It looks like your file system is named 'home' - you can confirm with >> tunefs.lustre --print <MDS device> | grep "Lustre FS" >> >> The correct command (Run on the MGS) would be >> # lctl conf_param home.sys.timeout=<val> >> >> Example: >> [EMAIL PROTECTED] ~]# tunefs.lustre --print /dev/sdb |grep "Lustre FS" >> Lustre FS: lustre >> [EMAIL PROTECTED] ~]# cat /proc/sys/lustre/timeout >> 130 >> [EMAIL PROTECTED] ~]# lctl conf_param lustre.sys.timeout=150 >> [EMAIL PROTECTED] ~]# cat /proc/sys/lustre/timeout >> 150 > Thanks for your email. I am afraid your tips aren't very helpful in this > case. As stated in the subject I am asking about recovery timeout. > You can find it for example in > /proc/fs/lustre/obdfilter/<OST>/recovery_status whilst one of your OST's > is in recovery state. By default this timeout is 250s. > Whereas you are talking about system obd timeout (according to CFS > documentation chapter 4.1.2 ) which is not a subject of my concern. > > Any way I tried your example just to see if it works and again I am > afraid it doesn't work for me, see below: > I have combined mgs and mds configuration. > > [EMAIL PROTECTED] ~]# df > Filesystem 1K-blocks Used Available Use% Mounted on > /dev/sda1 10317828 3452824 6340888 36% / > /dev/sda6 7605856 49788 7169708 1% /local > /dev/sda3 4127108 41000 3876460 2% /tmp > /dev/sda2 4127108 753668 3163792 20% /var > /dev/dm-2 1845747840 447502120 1398245720 25% /mnt/sdb > /dev/dm-1 6140723200 4632947344 1507775856 76% /mnt/sdc > /dev/dm-3 286696376 1461588 268850900 1% /mnt/home-md/mdt > [EMAIL PROTECTED] ~]# tunefs.lustre --print /dev/dm-3 |grep "Lustre FS" > Lustre FS: home-md > Lustre FS: home-md > [EMAIL PROTECTED] ~]# cat /proc/sys/lustre/timeout > 100 > [EMAIL PROTECTED] ~]# lctl conf_param home-md.sys.timeout=150 > error: conf_param: Invalid argument > [EMAIL PROTECTED] ~]# >
Hmm, not sure why that isn't working for you, I tested the example I gave. Sorry about the mis-read. The obd recovery timeout is defined in relation to obd_timeout, and afaik not changeable at runtime: lustre/include/lustre_lib.h #define OBD_RECOVERY_TIMEOUT (obd_timeout * 5 / 2) ...which gives the default 250 seconds for the default obd_timeout (100 seconds) cliffw > Cheers, > > Wojciech Turek > > > >> >> cliffw >> >>> ------------------------------------------------------------------------ >>> _______________________________________________ >>> Lustre-discuss mailing list >>> [email protected] >>> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss >> > > Mr Wojciech Turek > Assistant System Manager > University of Cambridge > High Performance Computing service > email: [EMAIL PROTECTED] > tel. +441223763517 > > > > > > ------------------------------------------------------------------------ > > _______________________________________________ > Lustre-discuss mailing list > [email protected] > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss _______________________________________________ Lustre-discuss mailing list [email protected] https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
