Thanks for the quick response. The logs on the problem server indicates the ldiskfs RPM was not installed for the first mount attempt. Lustre rejected the attempt here:
un 26 17:43:58 puppy7 kernel: LustreError: 3358:0:(obd_mount.c:1290:server_kernel_mount()) premount /dev/sdc:0x0 ldiskfs failed: -19, ldiskfs2 failed: -19. Is the ldiskfs module available? Jun 26 17:43:58 puppy7 kernel: LustreError: 3358:0:(obd_mount.c:1616:server_fill_super()) Unable to mount device /dev/sdc: -19 Jun 26 17:43:58 puppy7 kernel: LustreError: 3358:0:(obd_mount.c:2045:lustre_fill_super()) Unable to mount (-19) Jun 26 17:44:10 puppy7 ntpd[3082]: synchronized to 172.16.2.254, stratum 3 Jun 26 17:44:19 puppy7 kernel: LustreError: 3368:0:(obd_mount.c:1290:server_kernel_mount()) premount /dev/sdc:0x0 ldiskfs failed: -19, ldiskfs2 failed: -19. Is the ldiskfs module available? Jun 26 17:44:19 puppy7 kernel: LustreError: 3368:0:(obd_mount.c:1616:server_fill_super()) Unable to mount device /dev/sdc: -19 Jun 26 17:44:19 puppy7 kernel: LustreError: 3368:0:(obd_mount.c:2045:lustre_fill_super()) Unable to mount (-19) Jun 26 17:53:39 puppy7 kernel: LustreError: 3430:0:(obd_mount.c:1290:server_kernel_mount()) premount /dev/sdc:0x0 ldiskfs failed: -19, ldiskfs2 failed: -19. Is the ldiskfs module available? Jun 26 17:53:39 puppy7 kernel: LustreError: 3430:0:(obd_mount.c:1616:server_fill_super()) Unable to mount device /dev/sdc: -19 Jun 26 17:53:39 puppy7 kernel: LustreError: 3430:0:(obd_mount.c:2045:lustre_fill_super()) Unable to mount (-19) I then installed the ldiskfs RPM on all the Lustre nodes (and fixed my kickstart config), modprobe'd lustre and attempted again: Jun 26 17:54:30 puppy7 kernel: init dynlocks cache Jun 26 17:54:30 puppy7 kernel: ldiskfs created from ext4-2.6-rhel5 Jun 26 17:54:30 puppy7 kernel: LDISKFS-fs: barriers enabled Jun 26 17:54:33 puppy7 kernel: kjournald2 starting: pid 3457, dev sdc:8, commit interval 5 seconds Jun 26 17:54:33 puppy7 kernel: LDISKFS-fs warning: checktime reached, running e2fsck is recommended Jun 26 17:54:33 puppy7 kernel: LDISKFS FS on sdc, internal journal on sdc:8 Jun 26 17:54:33 puppy7 kernel: LDISKFS-fs: delayed allocation enabled Jun 26 17:54:33 puppy7 kernel: LDISKFS-fs: file extents enabled Jun 26 17:54:33 puppy7 kernel: LDISKFS-fs: mballoc enabled Jun 26 17:54:33 puppy7 kernel: LDISKFS-fs: recovery complete. Jun 26 17:54:33 puppy7 kernel: LDISKFS-fs: mounted filesystem sdc with ordered data mode Jun 26 17:54:33 puppy7 kernel: LDISKFS-fs: mballoc: 0 blocks 0 reqs (0 success) Jun 26 17:54:33 puppy7 kernel: LDISKFS-fs: mballoc: 0 extents scanned, 0 goal hits, 0 2^N hits, 0 breaks, 0 lost Jun 26 17:54:33 puppy7 kernel: LDISKFS-fs: mballoc: 0 generated and it took 0 Jun 26 17:54:33 puppy7 kernel: LDISKFS-fs: mballoc: 0 preallocated, 0 discarded Jun 26 17:54:33 puppy7 kernel: LDISKFS-fs: barriers enabled Jun 26 17:54:33 puppy7 kernel: kjournald2 starting: pid 3460, dev sdc:8, commit interval 5 seconds Jun 26 17:54:33 puppy7 kernel: LDISKFS-fs warning: checktime reached, running e2fsck is recommended Jun 26 17:54:33 puppy7 kernel: LDISKFS FS on sdc, internal journal on sdc:8 Jun 26 17:54:33 puppy7 kernel: LDISKFS-fs: delayed allocation enabled Jun 26 17:54:33 puppy7 kernel: LDISKFS-fs: file extents enabled Jun 26 17:54:33 puppy7 kernel: LDISKFS-fs: mballoc enabled Jun 26 17:54:33 puppy7 kernel: LDISKFS-fs: mounted filesystem sdc with ordered data mode Jun 26 17:54:38 puppy7 kernel: Lustre: 2725:0:(client.c:1463:ptlrpc_expire_one_request()) @@@ Request x1339651978690561 sent from mgc172.17....@o2ib to NID 172.17....@o2ib 5s ago has timed out (5s prior to deadline). Jun 26 17:54:38 puppy7 kernel: r...@ffff810067706400 x1339651978690561/t0 o250->m...@mgc172.17.2.5@o2ib_0:26/25 lens 368/584 e 0 to 1 dl 1277592878 ref 1 fl Rpc:N/0/0 rc 0/0 Jun 26 17:54:38 puppy7 kernel: LustreError: 3445:0:(client.c:858:ptlrpc_import_delay_req()) @@@ IMP_INVALID r...@ffff81013d553c00 x1339651978690563/t0 o101->m...@mgc172.17.2.5@o2ib_0:26/25 lens 296/544 e 0 to 1 dl 0 ref 1 fl Rpc:/0/0 rc 0/0 Jun 26 17:54:38 puppy7 kernel: Lustre: Filtering OBD driver; http://www.lustre.org/ Jun 26 17:54:38 puppy7 kernel: Lustre: lustre1-OST0001: Now serving lustre1-OST0001 on /dev/sdc with recovery enabled Jun 26 17:55:03 puppy7 kernel: Lustre: 2725:0:(client.c:1463:ptlrpc_expire_one_request()) @@@ Request x1339651978690564 sent from mgc172.17....@o2ib to NID 172.17....@o2ib 5s ago has timed out (5s prior to deadline). ----------- a few timeout messages later .... Jun 26 17:55:43 puppy7 kernel: LustreError: 3649:0:(client.c:858:ptlrpc_import_delay_req()) @@@ IMP_INVALID r...@ffff810060569800 x1339651978690572/t0 o101->m...@mgc172.17.2.5@o2ib_0:26/25 lens 296/544 e 0 to 1 dl 0 ref 1 fl Rpc:/0/0 rc 0/0 Jun 26 17:55:52 puppy7 kernel: Lustre: mgc172.17....@o2ib: Reactivating import Jun 26 17:55:52 puppy7 kernel: Lustre: lustre1-OST0001: received MDS connection from 172.17....@o2ib Jun 26 17:59:11 puppy7 ntpd[3082]: kernel time sync enabled 0001 Jun 26 18:03:51 puppy7 kernel: Lustre: 2724:0:(client.c:1463:ptlrpc_expire_one_request()) @@@ Request x1339651978690598 sent from mgc172.17....@o2ib to NID 172.17....@o2ib 17s ago has timed out (17s prior to deadline). Jun 26 18:03:51 puppy7 kernel: r...@ffff810060bcd800 x1339651978690598/t0 o400->m...@mgc172.17.2.5@o2ib_0:26/25 lens 192/384 e 0 to 1 dl 1277593431 ref 1 fl Rpc:N/0/0 rc 0/0 Jun 26 18:03:51 puppy7 kernel: Lustre: 2724:0:(client.c:1463:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Jun 26 18:03:51 puppy7 kernel: LustreError: 166-1: mgc172.17....@o2ib: Connection to service MGS via nid 172.17....@o2ib was lost; in progress operations using this service will fail. -------------------------------------------- According to the above it looked like everything worked. But, after waiting a while, I still couldn't mount lustre on a client. I found a similar problem on the list,in that case, the fix was to mount the device as type ldiskfs and remove CONFIGS/<targetname>. I hope that didn't permanently corrupt lustre? Thanks, Roger S. Wojciech Turek wrote: > Hi, > > Could you please post system logs that were generated during first mount > after the upgrade? > Did you run writeconf on MDT and all OSTs? > > > > > > > On 12 July 2010 16:51, Roger Sersted <r...@aps.anl.gov > <mailto:r...@aps.anl.gov>> wrote: > > > > This is a small development system with a combined MDS/MGS on a > single node with > a SCSI interface to a disk array. There are two OSSes, each with a > single OST > of 1.4TB comprised of a SATA array. In all cases, the entire device > (/dev/sdc) > is used with no partitioning. > > I upgraded my Lustre MDS and OSS servers from 1.6.6 to 1.8.3. I did > this via a > complete OS install and then performing a writeconf on each of the > nodes. > > Unfortunately, each of the OSSes thinks it's Lustre "Target" is > "lustre1-OST0000". I've mounted the partitions via ldiskfs and the > underlying > data is still there. I know which OSS is supposed to be > "lustre1-OST0001", but > I can't find any docs that explain how to set that. > > Thanks, > > Roger S. > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss@lists.lustre.org <mailto:Lustre-discuss@lists.lustre.org> > http://lists.lustre.org/mailman/listinfo/lustre-discuss > > > > > -- > -- > Wojciech Turek > > Assistant System Manager > > High Performance Computing Service > University of Cambridge > Email: wj...@cam.ac.uk <mailto:wj...@cam.ac.uk> > Tel: (+)44 1223 763517 _______________________________________________ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss