Hi Lustre community,

I think i messed up my lustre config while setting up my new filesystem. The 
server version is 2.15.1
While adding OST's to my new lustre filesystem, I ran into an issue when adding 
OST0005. After formatting the zpool, I could not mount the OST, it failed with 
the message below.

[root@oss6 ~]# mount -t lustre -v lustre/ost1 /mnt/ost1
arg[0] = /sbin/mount.lustre
arg[1] = -v
arg[2] = -o
arg[3] = rw
arg[4] = lustre/ost1
arg[5] = /mnt/ost1
source = lustre/ost1 (lustre/ost1), target = /mnt/ost1
options = rw
checking for existing Lustre data: found
Writing lustre/ost1 properties
  lustre:mgsnode=192.168.6.1@tcp
  lustre:version=1
  lustre:flags=2
  lustre:index=5
  lustre:fsname=lustre
  lustre:svname=lustre-OST0005
mounting device lustre/ost1 at /mnt/ost1, flags=0x1000000 
options=osd=osd-zfs,mgsnode=192.168.6.1@tcp,update,param=mgsnode=192.168.6.1@tcp,svname=lustre-OST0005,device=lustre/ost1
mount.lustre: mount -t lustre lustre/ost1 at /mnt/ost1 failed: Input/output 
error retries left: 0
mount.lustre: mount lustre/ost1 at /mnt/ost1 failed: Input/output error
Is the MGS running?

I could not add this particular server. I added OST0006 without any problem.
At this point I reconnected the clients and started using the lustre fs again.

After reinstalling the OS on the server, replacing the network card and cable, 
I still was unable to add the OST.
It turned out to be a misconfiguration on the switch, jumbo frames were not 
enabled on the port used for that server.
After fixing that, I still could not add this server as OST0005 because i got 
the message that OST0005 already existed.
Reformatting with the --replace option with index 5 did not work. I was able to 
add this server with a new index, so now it is added with index 7 as OST0007.

At this point I thought everything was fine. All the clients could see all 
OST's and all servers were being used.

When I needed to reboot a client, it was no longer able to mount the lustre 
filesystem. I get the following error.

# mount -v -t lustre 192.168.6.1@tcp:/lustre /lustre
arg[0] = /sbin/mount.lustre
arg[1] = -v
arg[2] = -o
arg[3] = rw
arg[4] = 192.168.6.1@tcp:/lustre
arg[5] = /lustre
source = 192.168.6.1@tcp:/lustre (192.168.6.1@tcp:/lustre), target = /lustre
options(2/4096) = rw
mounting device 192.168.6.1@tcp:/lustre at /lustre, flags=0x1000000 
options=device=192.168.6.1@tcp:/lustre
mount.lustre: mount -t lustre 192.168.6.1@tcp:/lustre at /lustre failed: 
Invalid argument retries left: 0
mount.lustre: mount 192.168.6.1@tcp:/lustre at /lustre failed: Invalid argument
This may have multiple causes.
Is 'lustre' the correct filesystem name?
Are the mount options correct?
Check the syslog for more info.

The logs show:
Aug 27 11:36:09 trinityx kernel: LustreError: 
110360:0:(obd_config.c:1557:class_process_config()) no device for: 
lustre-OST0005-osc-ffff921d73617800
Aug 27 11:36:09 trinityx kernel: LustreError: 
110360:0:(obd_config.c:2029:class_config_llog_handler()) MGC192.168.6.1@tcp: 
cfg command failed: rc = -22
Aug 27 11:36:09 trinityx kernel: Lustre:    cmd=cf00f 0:lustre-OST0005-osc  
1:osc.active=0
Aug 27 11:36:09 trinityx kernel: LustreError: MGC192.168.6.1@tcp: Configuration 
from log lustre-client failed from MGS -22. Check client and MGS are on 
compatible version.
Aug 27 11:36:09 trinityx kernel: Lustre: Unmounted lustre-client
Aug 27 11:36:09 trinityx kernel: LustreError: 
110343:0:(super25.c:188:lustre_fill_super()) llite: Unable to mount <unknown>: 
rc = -22

I tried disabling OST0005 on the MDS, but that also gave an error in the logs.

# lctl conf_param lustre-OST0005.osc.active=0

[Aug26 17:07] Lustre: Permanently deactivating lustre-OST0005
[  +0.001950] Lustre: Modifying parameter lustre-OST0005-osc.osc.active in log 
lustre-client
[  +0.001193] Lustre: Skipped 1 previous similar message
[  +7.429669] LustreError: 4158100:0:(obd_config.c:1499:class_process_config()) 
no device for: lustre-OST0005-osc-MDT0000
[  +0.001253] LustreError: 
4158100:0:(obd_config.c:2001:class_config_llog_handler()) MGC192.168.6.1@tcp: 
cfg command failed: rc = -22
[  +0.000829] Lustre:    cmd=cf00f 0:lustre-OST0005-osc-MDT0000  1:osc.active=0
[  +0.001275] LustreError: 60246:0:(mgc_request.c:612:do_requeue()) failed 
processing log: -22

I am now in the situation that all currently connected clients can use the 
filesystem, but as soon as I need to reboot them, they cannot reconnect.

Is there a way to fix this, preferably without taking everything offline?

Thanks in advance,
Martin Balvers

Aditional info.
[root@mds ~]# lctl dl
  0 UP osd-zfs MGS-osd MGS-osd_UUID 4
  1 UP mgs MGS MGS 64
  2 UP mgc MGC192.168.6.1@tcp ab5db231-f023-4acb-8896-0cfb93c5ed25 4
  3 UP osd-zfs lustre-MDT0000-osd lustre-MDT0000-osd_UUID 13
  4 UP mds MDS MDS_uuid 2
  5 UP lod lustre-MDT0000-mdtlov lustre-MDT0000-mdtlov_UUID 3
  6 UP mdt lustre-MDT0000 lustre-MDT0000_UUID 94
  7 UP mdd lustre-MDD0000 lustre-MDD0000_UUID 3
  8 UP qmt lustre-QMT0000 lustre-QMT0000_UUID 3
  9 UP osp lustre-OST0000-osc-MDT0000 lustre-MDT0000-mdtlov_UUID 4
 10 UP osp lustre-OST0002-osc-MDT0000 lustre-MDT0000-mdtlov_UUID 4
 11 UP osp lustre-OST0001-osc-MDT0000 lustre-MDT0000-mdtlov_UUID 4
 12 UP osp lustre-OST0003-osc-MDT0000 lustre-MDT0000-mdtlov_UUID 4
 13 UP osp lustre-OST0004-osc-MDT0000 lustre-MDT0000-mdtlov_UUID 4
 14 UP lwp lustre-MDT0000-lwp-MDT0000 lustre-MDT0000-lwp-MDT0000_UUID 4
 15 UP osp lustre-OST0006-osc-MDT0000 lustre-MDT0000-mdtlov_UUID 4
 16 UP osp lustre-OST0007-osc-MDT0000 lustre-MDT0000-mdtlov_UUID 4
[root@mds ~]# lctl --device MGS llog_print lustre-client
- { index: 2, event: attach, device: lustre-clilov, type: lov, UUID: 
lustre-clilov_UUID }
- { index: 3, event: setup, device: lustre-clilov, UUID:  }
- { index: 6, event: attach, device: lustre-clilmv, type: lmv, UUID: 
lustre-clilmv_UUID }
- { index: 7, event: setup, device: lustre-clilmv, UUID:  }
- { index: 10, event: add_uuid, nid: 192.168.6.1@tcp(0x20000c0a80601), node: 
192.168.6.1@tcp }
- { index: 11, event: attach, device: lustre-MDT0000-mdc, type: mdc, UUID: 
lustre-clilmv_UUID }
- { index: 12, event: setup, device: lustre-MDT0000-mdc, UUID: 
lustre-MDT0000_UUID, node: 192.168.6.1@tcp }
- { index: 13, event: add_mdc, device: lustre-clilmv, mdt: lustre-MDT0000_UUID, 
index: 0, gen: 1, UUID: lustre-MDT0000-mdc_UUID }
- { index: 16, event: new_profile, name: lustre-client, lov: lustre-clilov, 
lmv: lustre-clilmv }
- { index: 19, event: add_uuid, nid: 192.168.6.2@tcp(0x20000c0a80602), node: 
192.168.6.2@tcp }
- { index: 20, event: attach, device: lustre-OST0000-osc, type: osc, UUID: 
lustre-clilov_UUID }
- { index: 21, event: setup, device: lustre-OST0000-osc, UUID: 
lustre-OST0000_UUID, node: 192.168.6.2@tcp }
- { index: 22, event: add_osc, device: lustre-clilov, ost: lustre-OST0000_UUID, 
index: 0, gen: 1 }
- { index: 25, event: add_uuid, nid: 192.168.6.4@tcp(0x20000c0a80604), node: 
192.168.6.4@tcp }
- { index: 26, event: attach, device: lustre-OST0002-osc, type: osc, UUID: 
lustre-clilov_UUID }
- { index: 27, event: setup, device: lustre-OST0002-osc, UUID: 
lustre-OST0002_UUID, node: 192.168.6.4@tcp }
- { index: 28, event: add_osc, device: lustre-clilov, ost: lustre-OST0002_UUID, 
index: 2, gen: 1 }
- { index: 31, event: add_uuid, nid: 192.168.6.3@tcp(0x20000c0a80603), node: 
192.168.6.3@tcp }
- { index: 32, event: attach, device: lustre-OST0001-osc, type: osc, UUID: 
lustre-clilov_UUID }
- { index: 33, event: setup, device: lustre-OST0001-osc, UUID: 
lustre-OST0001_UUID, node: 192.168.6.3@tcp }
- { index: 34, event: add_osc, device: lustre-clilov, ost: lustre-OST0001_UUID, 
index: 1, gen: 1 }
- { index: 37, event: add_uuid, nid: 192.168.6.5@tcp(0x20000c0a80605), node: 
192.168.6.5@tcp }
- { index: 38, event: attach, device: lustre-OST0003-osc, type: osc, UUID: 
lustre-clilov_UUID }
- { index: 39, event: setup, device: lustre-OST0003-osc, UUID: 
lustre-OST0003_UUID, node: 192.168.6.5@tcp }
- { index: 40, event: add_osc, device: lustre-clilov, ost: lustre-OST0003_UUID, 
index: 3, gen: 1 }
- { index: 46, event: add_uuid, nid: 192.168.6.6@tcp(0x20000c0a80606), node: 
192.168.6.6@tcp }
- { index: 47, event: attach, device: lustre-OST0004-osc, type: osc, UUID: 
lustre-clilov_UUID }
- { index: 48, event: setup, device: lustre-OST0004-osc, UUID: 
lustre-OST0004_UUID, node: 192.168.6.6@tcp }
- { index: 49, event: add_osc, device: lustre-clilov, ost: lustre-OST0004_UUID, 
index: 4, gen: 1 }
- { index: 53, event: conf_param, device: lustre-OST0003-osc, parameter: 
osc.active=1 }
- { index: 56, event: add_uuid, nid: 192.168.6.8@tcp(0x20000c0a80608), node: 
192.168.6.8@tcp }
- { index: 57, event: attach, device: lustre-OST0006-osc, type: osc, UUID: 
lustre-clilov_UUID }
- { index: 58, event: setup, device: lustre-OST0006-osc, UUID: 
lustre-OST0006_UUID, node: 192.168.6.8@tcp }
- { index: 59, event: add_osc, device: lustre-clilov, ost: lustre-OST0006_UUID, 
index: 6, gen: 1 }
- { index: 68, event: add_uuid, nid: 192.168.6.7@tcp(0x20000c0a80607), node: 
192.168.6.7@tcp }
- { index: 69, event: attach, device: lustre-OST0007-osc, type: osc, UUID: 
lustre-clilov_UUID }
- { index: 70, event: setup, device: lustre-OST0007-osc, UUID: 
lustre-OST0007_UUID, node: 192.168.6.7@tcp }
- { index: 71, event: add_osc, device: lustre-clilov, ost: lustre-OST0007_UUID, 
index: 7, gen: 1 }
- { index: 74, event: conf_param, device: lustre-OST0005-osc, parameter: 
osc.active=0 }

Ce message électronique et tous les fichiers attachés qu'il contient sont 
confidentiels et destinés exclusivement à l'usage de la personne à laquelle ils 
sont adressés. Si vous avez reçu ce message par erreur, merci de le retourner à 
son émetteur. Les idées et opinions présentées dans ce message sont celles de 
son auteur, et ne représentent pas nécessairement celles de DANONE ou d'une 
quelconque de ses filiales. La publication, l'usage, la distribution, 
l'impression ou la copie non autorisée de ce message et des attachements qu'il 
contient sont strictement interdits. 

This e-mail and any files transmitted with it are confidential and intended 
solely for the use of the individual to whom it is addressed. If you have 
received this email in error please send it back to the person that sent it to 
you. Any views or opinions presented are solely those of its author and do not 
necessarily represent those of DANONE or any of its subsidiary companies. 
Unauthorized publication, use, dissemination, forwarding, printing or copying 
of this email and its associated attachments is strictly prohibited.
_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to