Re: [lustre-discuss] trouble mounting after a tunefs

2015-06-14 Thread Cowe, Malcolm J
I believe that this message is benign, and is presented when first starting the 
MDS. It has something to do with the OSTs not being online, IIRC. I get a 
similar warning on any system I run, for example:

May 31 20:53:56 ie2-mds1.lfs.intl kernel: LustreError: 11-0: 
demo-MDT-lwp-MDT: Communicating with 0@lo, operation mds_connect failed 
with -11.

This is from one of our lab systems. If the MDT shows up as mounted, there may 
not be a case to answer, although you will still need to verify that your 
connectivity works as expected :).

Check that the storage target is mounted, that service is started (kernel 
threads are running), and that the content of /proc/fs/lustre/health_check says 
healthy, etc. lctl dl on the MDS should list the services that are up 
including the MDT, and  lfs check servers on the client should return with a 
positive outlook (all targets active).


Malcolm Cowe
Intel High Performance Data Division


-Original Message-
From: lustre-discuss [mailto:lustre-discuss-boun...@lists.lustre.org] On Behalf 
Of John White
Sent: Saturday, June 13, 2015 1:07 AM
To: lustre-discuss@lists.lustre.org
Subject: [lustre-discuss] trouble mounting after a tunefs

Good Morning Folks,
We recently had to add TCP NIDs to an existing o2ib FS.  We added the 
nid to the modprobe.d stuff and tossed the definition of the NID in the 
failnode and mgsnode params on all OSTs and the MGS + MDT.  When either an o2ib 
or tcp client try to mount, the mount command hangs and dmesg repeats:
LustreError: 11-0: brc-MDT-mdc-881036879c00: Communicating with 
10.4.250.10@o2ib, operation mds_connect failed with -11.

I fear we may have over-done the parameters, could anyone take a look here and 
let me know if we need to fix things up (remove params, etc)?

MGS:
Read previous values:
Target: MGS
Index:  unassigned
Lustre FS:  
Mount type: ldiskfs
Flags:  0x4
  (MGS )
Persistent mount opts: user_xattr,errors=remount-ro
Parameters:

MDT:
 Read previous values:
Target: brc-MDT
Index:  0
Lustre FS:  brc
Mount type: ldiskfs
Flags:  0x1001
  (MDT no_primnode )
Persistent mount opts: user_xattr,errors=remount-ro
Parameters:  
mgsnode=10.4.250.11@o2ib,10.0.250.11@tcp:10.4.250.10@o2ib,10.0.250.10@tcp  
failover.node=10.4.250.10@o2ib,10.0.250.10@tcp:10.4.250.11@o2ib,10.0.250.11@tcp 
mdt.quota_type=ug

OST(sample):
Read previous values:
Target: brc-OST0002
Index:  2
Lustre FS:  brc
Mount type: ldiskfs
Flags:  0x1002
  (OST no_primnode )
Persistent mount opts: errors=remount-ro
Parameters:  
mgsnode=10.4.250.10@o2ib,10.0.250.10@tcp:10.4.250.11@o2ib,10.0.250.11@tcp  
failover.node=10.4.250.12@o2ib,10.0.250.12@tcp:10.4.250.13@o2ib,10.0.250.13@tcp 
ost.quota_type=ug
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] trouble mounting after a tunefs

2015-06-12 Thread John White
Good Morning Folks,
We recently had to add TCP NIDs to an existing o2ib FS.  We added the 
nid to the modprobe.d stuff and tossed the definition of the NID in the 
failnode and mgsnode params on all OSTs and the MGS + MDT.  When either an o2ib 
or tcp client try to mount, the mount command hangs and dmesg repeats:
LustreError: 11-0: brc-MDT-mdc-881036879c00: Communicating with 
10.4.250.10@o2ib, operation mds_connect failed with -11.

I fear we may have over-done the parameters, could anyone take a look here and 
let me know if we need to fix things up (remove params, etc)?

MGS:
Read previous values:
Target: MGS
Index:  unassigned
Lustre FS:  
Mount type: ldiskfs
Flags:  0x4
  (MGS )
Persistent mount opts: user_xattr,errors=remount-ro
Parameters:

MDT:
 Read previous values:
Target: brc-MDT
Index:  0
Lustre FS:  brc
Mount type: ldiskfs
Flags:  0x1001
  (MDT no_primnode )
Persistent mount opts: user_xattr,errors=remount-ro
Parameters:  
mgsnode=10.4.250.11@o2ib,10.0.250.11@tcp:10.4.250.10@o2ib,10.0.250.10@tcp  
failover.node=10.4.250.10@o2ib,10.0.250.10@tcp:10.4.250.11@o2ib,10.0.250.11@tcp 
mdt.quota_type=ug

OST(sample):
Read previous values:
Target: brc-OST0002
Index:  2
Lustre FS:  brc
Mount type: ldiskfs
Flags:  0x1002
  (OST no_primnode )
Persistent mount opts: errors=remount-ro
Parameters:  
mgsnode=10.4.250.10@o2ib,10.0.250.10@tcp:10.4.250.11@o2ib,10.0.250.11@tcp  
failover.node=10.4.250.12@o2ib,10.0.250.12@tcp:10.4.250.13@o2ib,10.0.250.13@tcp 
ost.quota_type=ug
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] trouble mounting after a tunefs

2015-06-12 Thread Martin Hecht
Hi John,

on the Parameters line the different nodes should not be separated by
:. Each node should be specified by a separate mgsnode=... or
failover.node=... statement. I'm not sure if separating the two
interfaces of each node by , is correct here, or if this should be
splitted again in two separate statements.

best regards,
Martin

On 06/12/2015 05:07 PM, John White wrote:
 Good Morning Folks,
   We recently had to add TCP NIDs to an existing o2ib FS.  We added the 
 nid to the modprobe.d stuff and tossed the definition of the NID in the 
 failnode and mgsnode params on all OSTs and the MGS + MDT.  When either an 
 o2ib or tcp client try to mount, the mount command hangs and dmesg repeats:
 LustreError: 11-0: brc-MDT-mdc-881036879c00: Communicating with 
 10.4.250.10@o2ib, operation mds_connect failed with -11.

 I fear we may have over-done the parameters, could anyone take a look here 
 and let me know if we need to fix things up (remove params, etc)?

 MGS:
 Read previous values:
 Target: MGS
 Index:  unassigned
 Lustre FS:  
 Mount type: ldiskfs
 Flags:  0x4
   (MGS )
 Persistent mount opts: user_xattr,errors=remount-ro
 Parameters:

 MDT:
  Read previous values:
 Target: brc-MDT
 Index:  0
 Lustre FS:  brc
 Mount type: ldiskfs
 Flags:  0x1001
   (MDT no_primnode )
 Persistent mount opts: user_xattr,errors=remount-ro
 Parameters:  
 mgsnode=10.4.250.11@o2ib,10.0.250.11@tcp:10.4.250.10@o2ib,10.0.250.10@tcp  
 failover.node=10.4.250.10@o2ib,10.0.250.10@tcp:10.4.250.11@o2ib,10.0.250.11@tcp
  mdt.quota_type=ug

 OST(sample):
 Read previous values:
 Target: brc-OST0002
 Index:  2
 Lustre FS:  brc
 Mount type: ldiskfs
 Flags:  0x1002
   (OST no_primnode )
 Persistent mount opts: errors=remount-ro
 Parameters:  
 mgsnode=10.4.250.10@o2ib,10.0.250.10@tcp:10.4.250.11@o2ib,10.0.250.11@tcp  
 failover.node=10.4.250.12@o2ib,10.0.250.12@tcp:10.4.250.13@o2ib,10.0.250.13@tcp
  ost.quota_type=ug
 ___
 lustre-discuss mailing list
 lustre-discuss@lists.lustre.org
 http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org