We are experiencing the same problem with 1.6.4.2. We thought it was the statahead problems. After turning off the statahead code, we experienced the same problem again. I had hoped going to 1.6.5 would resolve the issue. If you open a bug, would you mind sending the bug number to the list? I would like to get on the CC list.
> -----Original Message----- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of > Heiko Schroeter > Sent: Thursday, July 10, 2008 2:25 AM > To: [EMAIL PROTECTED] > Subject: [Lustre-discuss] lustre client 1.6.5.1 hangs > > Hello, > > we have a _test_ setup for a lustre 1.6.5.1 installation with > 2 Raid Systems > (64 Bit Systems) counting for 4 OSTs with 6TB each. One > combined MDS and MDT > server (32 Bit system , for testing only). > > OST lustre mkfs: > "mkfs.lustre --param="failover.mode=failout" --fsname > scia --ost --mkfsoptions='-i 2097152 -E stride=16 -b > 4096' [EMAIL PROTECTED] /dev/sdb" > (Our files are quite large 100MB+ on the system) > > Kernel: Vanilla Kernel 2.6.22.19, lustre compiled from the > sources on Gentoo > 2008.0 > > The client mount point is /misc/testfs via automount. > The access can be done through a link from /mnt/testfs -> /misc/testfs > > The following procedure hangs a client: > 1) copy files to the lustre system > 2) do a 'du -sh /mnt/testfs/willi' while copying > 3) unmount an OST (here OST0003) while copying > > The 'du' job hangs and the lustre file system cannot be > acessed any longer on > this client even from other logins. The only way to restore > normal op is IMHO > a hard reset of the machine. A reboot hangs because the > filesystem is still > active. > Other clients and there mount points are not affected as long > as they do not > access the file system with 'du' 'ls' or so. > I know that this is drastic but may happen in production by our users. > > Deactivating/Reactivating or remounting the OST does not have > any effect on > the 'du' job. The 'du' job (#29665 see process list below) and the > correpsonding lustre thread (#29694) cannot be killed manually. > > This behaviour is reproducable. The OST0003 is not > reactivated on the client > side though the MDS does so. It seems that this info does not > propagate to > the client. See last lines of dmesg below. > > What is the proper way (besides avoiding the use of 'du') to > reactivate the > client file system ? > > Thanks and Regards > Heiko > > > > > The process list on the CLIENT: > <snip> > root 29175 5026 0 08:36 ? 00:00:00 sshd: laura [priv] > laura 29177 29175 0 08:36 ? 00:00:01 sshd: [EMAIL PROTECTED]/0 > laura 29178 29177 0 08:36 pts/0 00:00:00 -bash > laura 29665 29178 0 09:15 pts/0 00:00:03 du -sh > /mnt/testfs/foo/fam/ > schell 29694 2 0 09:15 ? 00:00:00 [ll_sa_29665] > root 29695 4846 0 09:15 ? 00:00:00 > /usr/sbin/automount --timeout > 60 --pid-file /var/run/autofs.misc.pid /misc yp auto.misc > <snap> > > and CLIENT dmesg: > Lustre: 5361:0:(import.c:395:import_select_connection()) > scia-OST0003-osc-ffff8100ea24a000: tried all connections, > increasing latency > to 6s > Lustre: 5361:0:(import.c:395:import_select_connection()) > Skipped 10 previous > similar messages > LustreError: 11-0: an error occurred while communicating with > [EMAIL PROTECTED] The ost_connect operation failed with -19 > LustreError: Skipped 20 previous similar messages > Lustre: 5361:0:(import.c:395:import_select_connection()) > scia-OST0003-osc-ffff8100ea24a000: tried all connections, > increasing latency > to 51s > Lustre: 5361:0:(import.c:395:import_select_connection()) > Skipped 20 previous > similar messages > LustreError: 11-0: an error occurred while communicating with > [EMAIL PROTECTED] The ost_connect operation failed with -19 > LustreError: Skipped 24 previous similar messages > Lustre: 5361:0:(import.c:395:import_select_connection()) > scia-OST0003-osc-ffff8100ea24a000: tried all connections, > increasing latency > to 51s > Lustre: 5361:0:(import.c:395:import_select_connection()) > Skipped 24 previous > similar messages > LustreError: 167-0: This client was evicted by scia-OST0003; > in progress > operations using this service will fail. > > The MDS dmesg: > <snip> > Lustre: 6108:0:(import.c:395:import_select_connection()) > scia-OST0003-osc: > tried all connections, increasing latency to 51s > Lustre: 6108:0:(import.c:395:import_select_connection()) > Skipped 10 previous > similar messages > LustreError: 11-0: an error occurred while communicating with > [EMAIL PROTECTED] The ost_connect operation failed with -19 > LustreError: Skipped 10 previous similar messages > Lustre: 6108:0:(import.c:395:import_select_connection()) > scia-OST0003-osc: > tried all connections, increasing latency to 51s > Lustre: 6108:0:(import.c:395:import_select_connection()) > Skipped 20 previous > similar messages > Lustre: Permanently deactivating scia-OST0003 > Lustre: Setting parameter scia-OST0003-osc.osc.active in log > scia-client > Lustre: Skipped 3 previous similar messages > Lustre: setting import scia-OST0003_UUID INACTIVE by > administrator request > Lustre: scia-OST0003-osc.osc: set parameter active=0 > Lustre: Skipped 2 previous similar messages > Lustre: scia-MDT0000: haven't heard from client > 9111f740-b7a7-e2ff-b672-288a66decfab (at [EMAIL PROTECTED]) > in 1269 seconds. > I think it's dead, and I am evicting it. > Lustre: Permanently reactivating scia-OST0003 > Lustre: Modifying parameter scia-OST0003-osc.osc.active in > log scia-client > Lustre: Skipped 1 previous similar message > Lustre: 15406:0:(import.c:395:import_select_connection()) > scia-OST0003-osc: > tried all connections, increasing latency to 51s > Lustre: 15406:0:(import.c:395:import_select_connection()) > Skipped 2 previous > similar messages > LustreError: 167-0: This client was evicted by scia-OST0003; > in progress > operations using this service will fail. > Lustre: scia-OST0003-osc: Connection restored to service > scia-OST0003 using > nid [EMAIL PROTECTED] > Lustre: scia-OST0003-osc.osc: set parameter active=1 > Lustre: MDS scia-MDT0000: scia-OST0003_UUID now active, > resetting orphans > <snap> > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss@lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss > _______________________________________________ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss