Hi Wojciech, Here is more infos :
[EMAIL PROTECTED] ~]# multipath -l mpath0 mpath0 (360001ff00fd4922302000800001d1c17) [size=3726 GB][features="0"][hwhandler="0"] \_ round-robin 0 [active] \_ 2:0:0:2 sdaa 65:160 [active] \_ round-robin 0 [enabled] \_ 1:0:0:2 sdc 8:32 [active] [EMAIL PROTECTED] ~]# tunefs.lustre --print /dev/sdaa checking for existing Lustre data: found CONFIGS/mountdata Reading CONFIGS/mountdata Read previous values: Target: lustre-OST0030 Index: 48 Lustre FS: lustre Mount type: ldiskfs Flags: 0x2 (OST ) Persistent mount opts: errors=remount-ro,extents,mballoc Parameters: [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] sys.timeout=80 [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] sys.timeout=80 Permanent disk data: Target: lustre-OST0030 Index: 48 Lustre FS: lustre Mount type: ldiskfs Flags: 0x2 (OST ) Persistent mount opts: errors=remount-ro,extents,mballoc Parameters: [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] sys.timeout=80 [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] sys.timeout=80 exiting before disk write. [EMAIL PROTECTED] ~]# tunefs.lustre --print /dev/sdc checking for existing Lustre data: found CONFIGS/mountdata Reading CONFIGS/mountdata Read previous values: Target: lustre-OST0030 Index: 48 Lustre FS: lustre Mount type: ldiskfs Flags: 0x2 (OST ) Persistent mount opts: errors=remount-ro,extents,mballoc Parameters: [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] sys.timeout=80 [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] sys.timeout=80 Permanent disk data: Target: lustre-OST0030 Index: 48 Lustre FS: lustre Mount type: ldiskfs Flags: 0x2 (OST ) Persistent mount opts: errors=remount-ro,extents,mballoc Parameters: [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] sys.timeout=80 [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] sys.timeout=80 exiting before disk write. Any ideas ? Regards Franck Le 13 déc. 07 à 12:53, Wojciech Turek a écrit : > Hi, > > I just would like add that you could do very simple test to see if > mpath is working correctly. On your server oss1 run tunefs.lustre -- > print /dev/<all_mpath_devices> then write down target name for each > mpath device. Reboot the server and do the same and compare if the > mpath -> target map is the same as it was before reboot. > > Cheers > > Wojciech > On 13 Dec 2007, at 10:55, Ludovic Francois wrote: > >> On 12 déc, 17:51, Oleg Drokin <[EMAIL PROTECTED]> wrote: >>> Hello! >>> >>> On Dec 12, 2007, at 11:39 AM, Franck Martinaux wrote: >>> >>>> After a power outage, I get some difficulties to mount a OST. >>>> I am running a lustre 1.6.3 and I get a panic on the OSS when I >>>> try to >>>> mount a OST. >>> >>> It would greatly help us if you show us panic message and possibly >>> stacktrace. >> >> >> Hi, >> >> Please find below all information we got this morning >> >> Environment >> =========== >> >> ,---- >> | [EMAIL PROTECTED] ~]# uname -a >> | Linux oss01.data.cluster 2.6.9-55.0.9.EL_lustre.1.6.3smp #1 SMP Sun >> Oct 7 20:08:31 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux >> | [EMAIL PROTECTED] ~]# >> `---- >> >> Mount of this specific OST >> ========================== >> >> ,---- >> | [EMAIL PROTECTED] ~]# mount -t lustre /dev/mpath/mpath0 /mnt/lustre/ost1 >> | Read from remote host oss01: Connection reset by peer >> | Connection to oss01 closed. >> | [EMAIL PROTECTED] ~]$ >> `---- >> >> /var/log/messages during the operation >> ====================================== >> >> --8<---------------cut here---------------start------------->8--- >> Dec 13 08:36:04 oss01 sshd(pam_unix)[13469]: session opened for user >> root by root(uid=0)Dec 13 08:36:20 oss01 kernel: kjournald starting. >> Commit interval 5 seconds >> Dec 13 08:36:20 oss01 kernel: LDISKFS FS on dm-1, internal journal >> Dec 13 08:36:20 oss01 kernel: LDISKFS-fs: recovery complete.Dec 13 >> 08:36:20 oss01 kernel: LDISKFS-fs: mounted filesystem with ordered >> data mode. >> Dec 13 08:36:20 oss01 kernel: kjournald starting. Commit interval 5 >> seconds >> Dec 13 08:36:20 oss01 kernel: LDISKFS FS on dm-1, internal journalDec >> 13 08:36:20 oss01 kernel: LDISKFS-fs: mounted filesystem with ordered >> data mode. >> Dec 13 08:36:20 oss01 kernel: LDISKFS-fs: file extents enabled >> Dec 13 08:36:20 oss01 kernel: LDISKFS-fs: mballoc enabled >> Dec 13 08:36:20 oss01 kernel: Lustre: OST lustre-OST0002 now serving >> dev (lustre-OST0002/0258906d-8eca-ba98-4e3d-19adfa472914) with >> recovery enabled >> Dec 13 08:36:20 oss01 kernel: Lustre: Server lustre-OST0002 on >> device / >> dev/mpath/mpath1 has started >> Dec 13 08:36:21 oss01 kernel: LustreError: 137-5: UUID 'lustre- >> OST0030_UUID' is not available for connect (no target) >> Dec 13 08:36:21 oss01 kernel: LustreError: Skipped 4 previous similar >> messages >> Dec 13 08:36:21 oss01 kernel: LustreError: 13664:0:(ldlm_lib.c: >> 1437:target_send_reply_msg()) @@@ processing error (-19) >> [EMAIL PROTECTED] x146203/t0 o8-><?>@<?>:-1 lens 240/0 ref 0 fl >> Interpret:/0/0 rc -19/0 >> Dec 13 08:36:21 oss01 kernel: LustreError: 13664:0:(ldlm_lib.c: >> 1437:target_send_reply_msg()) Skipped 4 previous similar messages >> Dec 13 08:36:41 oss01 kernel: LustreError: 137-5: UUID 'lustre- >> OST0030_UUID' is not available for connect (no target) >> Dec 13 08:36:41 oss01 kernel: LustreError: 13665:0:(ldlm_lib.c: >> 1437:target_send_reply_msg()) @@@ processing error (-19) >> [EMAIL PROTECTED] >> 00 x146233/t0 o8-><?>@<?>:-1 lens 240/0 ref 0 fl Interpret:/0/0 rc >> -19/0 >> Dec 13 08:37:01 oss01 kernel: LustreError: 137-5: UUID 'lustre- >> OST0030_UUID' is not available for connect (no target) >> Dec 13 08:37:01 oss01 kernel: LustreError: 13666:0:(ldlm_lib.c: >> 1437:target_send_reply_msg()) @@@ processing error (-19) >> [EMAIL PROTECTED] >> 00 x146264/t0 o8-><?>@<?>:-1 lens 240/0 ref 0 fl Interpret:/0/0 rc >> -19/0 >> Dec 13 08:37:21 oss01 kernel: LustreError: 137-5: UUID 'lustre- >> OST0030_UUID' is not available for connect (no target) >> Dec 13 08:37:21 oss01 kernel: LustreError: 13667:0:(ldlm_lib.c: >> 1437:target_send_reply_msg()) @@@ processing error (-19) >> [EMAIL PROTECTED] >> 00 x146300/t0 o8-><?>@<?>:-1 lens 240/0 ref 0 fl Interpret:/0/0 rc >> -19/0 >> Dec 13 08:37:41 oss01 kernel: LustreError: 137-5: UUID 'lustre- >> OST0030_UUID' is not available for connect (no target) >> Dec 13 08:37:41 oss01 kernel: LustreError: Skipped 5 previous similar >> messages >> Dec 13 08:37:41 oss01 kernel: LustreError: 13668:0:(ldlm_lib.c: >> 1437:target_send_reply_msg()) @@@ processing error (-19) >> [EMAIL PROTECTED] >> 00 x146373/t0 o8-><?>@<?>:-1 lens 240/0 ref 0 fl Interpret:/0/0 rc >> -19/0 >> Dec 13 08:37:41 oss01 kernel: LustreError: 13668:0:(ldlm_lib.c: >> 1437:target_send_reply_msg()) Skipped 5 previous similar messages >> Dec 13 08:37:47 oss01 kernel: Lustre: Failing over lustre-OST0002 >> Dec 13 08:37:47 oss01 kernel: Lustre: *** setting obd lustre-OST0002 >> device 'unknown-block(253,1)' read-only *** >> Dec 13 08:37:47 oss01 kernel: Turning device dm-1 (0xfd00001) read- >> only >> Dec 13 08:37:47 oss01 kernel: Lustre: lustre-OST0002: shutting down >> for failover; client state will be preserved. >> Dec 13 08:37:47 oss01 kernel: Lustre: OST lustre-OST0002 has stopped. >> Dec 13 08:37:47 oss01 kernel: LDISKFS-fs: mballoc: 1 blocks 1 reqs (0 >> success) >> Dec 13 08:37:47 oss01 kernel: LDISKFS-fs: mballoc: 1 extents scanned, >> 1 goal hits, 0 2^N hits, 0 breaks, 0 lost >> Dec 13 08:37:47 oss01 kernel: LDISKFS-fs: mballoc: 1 generated and it >> took 12560 >> Dec 13 08:37:47 oss01 kernel: LDISKFS-fs: mballoc: 256 >> preallocated, 0 >> discarded >> Dec 13 08:37:47 oss01 kernel: Removing read-only on dm-1 (0xfd00001) >> Dec 13 08:37:47 oss01 kernel: Lustre: server umount lustre-OST0002 >> complete >> Dec 13 08:37:57 oss01 sshd(pam_unix)[13946]: session opened for user >> root by root(uid=0) >> Dec 13 08:38:18 oss01 kernel: kjournald starting. Commit interval 5 >> seconds >> Dec 13 08:38:18 oss01 kernel: LDISKFS FS on dm-0, internal journal >> Dec 13 08:38:18 oss01 kernel: LDISKFS-fs: mounted filesystem with >> ordered data mode. >> Dec 13 08:38:18 oss01 kernel: kjournald starting. Commit interval 5 >> seconds >> Dec 13 08:38:18 oss01 kernel: LDISKFS FS on dm-0, internal journal >> Dec 13 08:38:18 oss01 kernel: LDISKFS-fs: mounted filesystem with >> ordered data mode. >> Dec 13 08:38:18 oss01 kernel: LDISKFS-fs: file extents enabled >> Dec 13 08:38:18 oss01 kernel: LDISKFS-fs: mballoc enabled >> Dec 13 08:43:52 oss01 syslogd 1.4.1: restart. >> Dec 13 08:43:52 oss01 syslog: syslogd startup succeeded >> Dec 13 08:43:52 oss01 kernel: klogd 1.4.1, log source = /proc/kmsg >> started. >> --8<---------------cut here---------------end--------------->8--- >> >> We have to do a power cycle to connect again >> ============================================ >> >> ,---- >> | # ipmitool -I lan -H 192.168.99.101 -U $login -P $password power >> cycle >> `---- >> >> >> The OST fsck seems correct >> ========================== >> >> ,---- >> | [EMAIL PROTECTED] log]# fsck.ext2 /dev/mpath/mpath0 >> | e2fsck 1.40.2.cfs1 (12-Jul-2007) >> | lustre-OST0030: recovering journal >> | lustre-OST0030: clean, 227/244195328 files, 15614685/976760320 >> blocks >> | [EMAIL PROTECTED] log]# >> `---- >> >> tunefs.lustre reads correctly mpath0 information >> ================================================ >> >> ,---- >> | [EMAIL PROTECTED] log]# tunefs.lustre /dev/mpath/mpath0 >> | checking for existing Lustre data: found CONFIGS/mountdata >> | Reading CONFIGS/mountdata >> | >> | Read previous values: >> | Target: lustre-OST0030 >> | Index: 48 >> | Lustre FS: lustre >> | Mount type: ldiskfs >> | Flags: 0x142 >> | (OST update writeconf ) >> | Persistent mount opts: errors=remount-ro,extents,mballoc >> | Parameters: [EMAIL PROTECTED] [EMAIL PROTECTED] >> [EMAIL PROTECTED] sys.timeout=80 [EMAIL PROTECTED] >> [EMAIL PROTECTED] [EMAIL PROTECTED] sys.timeout=80 >> | >> | >> | Permanent disk data: >> | Target: lustre-OST0030 >> | Index: 48 >> | Lustre FS: lustre >> | Mount type: ldiskfs >> | Flags: 0x142 >> | (OST update writeconf ) >> | Persistent mount opts: errors=remount-ro,extents,mballoc >> | Parameters: [EMAIL PROTECTED] [EMAIL PROTECTED] >> [EMAIL PROTECTED] sys.timeout=80 [EMAIL PROTECTED] >> [EMAIL PROTECTED] [EMAIL PROTECTED] sys.timeout=80 >> | >> | Writing CONFIGS/mountdata >> | [EMAIL PROTECTED] log]# >> `---- >> >> >> DDN lun is ready and working correctly >> ====================================== >> >> ,----[ OSS view ] >> | [EMAIL PROTECTED] log]# multipath -l | grep mpath0 >> | mpath0 (360001ff00fd4922302000800001d1c17) >> | [EMAIL PROTECTED] log]# >> `---- >> >> ,----[ S2A9550 view ] >> | [EMAIL PROTECTED] ~]$ s2a -h 10.141.0.92 -e "lun list" | grep -i >> 0fd492230200 >> | 2 1 Ready 3815470 0FD492230200 >> | [EMAIL PROTECTED] ~]$ >> `---- >> >> Stack trace (We got it from OSS02 via the serial line during a >> mounting try) >> = >> = >> = >> = >> = >> = >> = >> ===================================================================== >> >> --8<---------------cut here---------------start------------->8--- >> LDISKFS-fs: file extents enabled >> LDISKFS-fs: mballoc enabled >> LustreError: 134-6: Trying to start OBD lustre-OST0030_UUID using the >> wrong disk lustre-OST0000_UUID. Were the /dev/ assignments rearranged >> ? >> LustreError: 10203:0:(filter.c:1022:filter_prep()) cannot read >> last_rcvd: rc = -22 >> LustreEr<ro4>re:i p10:2 f0f3f:0ff:f(fobfad_0c3aon2f12ig1. >> c:325:class_setup()) set--up- -l-u--s-t-re--- OS[Tcu00t 30h >> efreai ]le >> d- -(----22--)- >> [please bite here ] -L-u-s--tr--eE-- >> r - >> or: 10203:0:(obd_config.Kcer:n1e06l2 B:cUlG asast >> _csponifnilogc_lkl:o1g19_h >> ndler()) Err -22 on cfign cvaolmimdan >> odp: a >> and: 0000 [1] SLMuPs tre: cmd=cf003 0:lu<s4tr> >> OST0030 1:dev 2:type CP 3U: 3f >> ustreError: 15b-f: MGC1M0o.d1ul4e3s. [EMAIL PROTECTED] : inT:he >> configuration from l ogo bd'lfuisltterer-OST0030' failed (-22). >> ( U)Make sure th >> is client a ndf stfihlet _MlGSdi askrfes running compatible >> ver(siUo)ns of >> Lustre. >> oLussttreError: 15c-8: MGC10.<144>(3.U0)[EMAIL PROTECTED]: The configurati >> omng >> cfrom log 'lustre-OST003(0U') failed (-22). This may l dbies tkhfes r >> esult of communicatio(nU) errors between this nod el usantdr ethe >> MGS, >> a bad configur(aU)tion, or other errors. Seloev the syslog for more >> >> inf(oU)rmation. >> LlqusutotraeError: 10203:0:(obd_mo<u4nt>.(cU): >> 1082:server_start_targe >> tmds(c)) failed to start serv(eUr) lustre-OST0030: -22 >> ksocklnd(LU)ustreError: 10203:0:(obd<4_>m oupnttl.rpcc: >> 1573:server_fill_super((U))) Unable to start target so:bd -c2l2a >> (ULu)streError: 10203:0:(obd<_4c>o nlfneitg.c:392:class_cleanup()) >> ( U)Device 2 not setup ss >> lvfs(U) libcfs(U) md5(U) ipv6(U) parport_pc(U) lp(U) parport(U) >> autofs4(U) i2c_dev(U) i2c_core(U) sunrpc(U) ds(U) yenta_socket(U) >> pcmcia_c >> ore(U) dm_mirror(U) dm_round_robin(U) dm_multipath(U) joydev(U) >> button(U) battery(U) ac(U) uhci_hcd(U) ehci_hcd(U) hw_random(U) >> myri10ge(U) >> bnx2(U) ext3(U) jbd(U) dm_mod(U) qla2400(U) ata_piix(U) >> megaraid_sas(U) qla2xxx(U) scsi_transport_fc(U) sd_mod(U) >> multipath(U) >> Pid: 10286, comm: ptlrpcd Tainted: GF 2.6.9-55.0.9.EL_lustre. >> 1.6.3smp >> RIP: 0010:[<ffffffff80321465>] <ffffffff80321465>{__lock_text_start >> +32} >> RSP: 0018:0000010218cd9bc8 EFLAGS: 00010216 >> RAX: 0000000000000016 RBX: 000001021654e4bc RCX: 0000000000020000 >> RDX: 000000000000baa7 RSI: 0000000000000246 RDI: ffffffff80396fc0 >> RBP: 000001021654e4a0 R08: 00000000fffffffe R09: 000001021654e4bc >> R10: 0000000000000000 R11: 0000000000000000 R12: 00000102196e6058 >> R13: 00000102196e6000 R14: 0000010218cd9eb8 R15: 0000010218cd9e58 >> FS: 0000002a9557ab00(0000) GS:ffffffff804a6880(0000) knlGS: >> 0000000000000000 >> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b >> CR2: 0000002a95557000 CR3: 0000000228514000 CR4: 00000000000006e0 >> Process ptlrpcd (pid: 10286, threadinfo 0000010218cd8000, task >> 00000102170b4030) >> Stack: 000001021654e4bc ffffffffa03a2121 000001021a99304e >> ffffffffa03b32a0 >> 000001021654e0b0 ffffffffa04d6510 0000008000000000 >> 0000000000000000 >> 0000000000000000 00000102203920c0 >> Call Trace:<ffffffffa03a2121>{:lquota:filter_quota_clearinfo+49} >> <ffffffffa04d6510>{:obdfilter:filter_destroy_export+560} >> <ffffffff80131923>{recalc_task_prio+337} >> <ffffffffa02586fd>{:obdclass:class_export_destroy+381} >> <ffffffffa025c336>{:obdclass:obd_zombie_impexp_cull+150} >> <ffffffffa0318345>{:ptlrpc:ptlrpcd_check+229} >> <ffffffffa031883a>{:ptlrpc:ptlrpcd+874} >> <ffffffff80133566>{default_wake_function+0} >> <ffffffffa02eb450>{:ptlrpc:ptlrpc_expired_set+0} >> <ffffffffa02eb450>{:ptlrpc:ptlrpc_expired_set+0} >> <ffffffff80133566>{default_wake_function+0} >> <ffffffff80110de3>{child_rip+8} >> <ffffffffa03184d0>{:ptlrpc:ptlrpcd+0} >> <ffffffff80110ddb>{child_rip+0} >> >> Code: 0f 0b 04 c2 33 80 ff ff ff ff 77 00 f0 ff 0b 0f 88 8b 03 00 >> RIP <ffffffff80321465>{__lock_text_start+32} RSP <0000010218cd9bc8> >> <0>Kernel pani<4c> -L DnIoStKF sSy-nfcs:i nmg:b aOlolopsc >> 1 blocks 1 reqs (0 succ ess) : >> --8<---------------cut here---------------end--------------->8--- >> >> If you need more information or debug, feel free to request us. The >> problem occurs only with this OST. >> >> Thanks, Ludo >> >> -- >> Ludovic Francois +33 (0)6 14 77 26 93 >> System Engineer DataDirect Networks >> >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss@clusterfs.com >> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss > > Mr Wojciech Turek > Assistant System Manager > University of Cambridge > High Performance Computing service > email: [EMAIL PROTECTED] > tel. +441223763517 > > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss@clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss _______________________________________________ Lustre-discuss mailing list Lustre-discuss@clusterfs.com https://mail.clusterfs.com/mailman/listinfo/lustre-discuss