Forget the file.. sorry -----Original Message----- From: lustre-discuss-boun...@lists.lustre.org [mailto:lustre-discuss-boun...@lists.lustre.org] On Behalf Of Rafael David Tinoco Sent: Wednesday, September 09, 2009 7:30 PM To: 'Brian J. Murrell'; lustre-discuss@lists.lustre.org Subject: Re: [Lustre-discuss] OSTs hanging while running IOR
Im attaching the messages (only the error part) file so we don't have these mail formatting problems. ------ Can you provide a bit more of the log before the above so we can see what the stack trace is in reference to? Also, try to eliminate the white-space between lines. Are you getting any other errors or messages from Lustre prior to that? Perhaps you are getting some messages saying that various operations are "slow"? >> Even beeing slow, the OST should respond right ? It "hangs". Have you tuned these OSSes with respect to the number of OST threads needed to drive (and not over-drive) your disks? The lustre iokit is useful for that tuning. >> Ok, tuning for performance is okay, but hanging with 20 nodes (IOR MPI).. >> strange right ? b. ----- I'm using 3 raid 5 with 8 disks each and 256 OST threads on each OSS. r...@a02n00:~# cat /etc/mdadm.conf ARRAY /dev/md10 level=raid5 num-devices=8 devices=/dev/dm-0,/dev/dm-1,/dev/dm-2,/dev/dm-3,/dev/dm-4,/dev/dm-5,/dev/dm-6,/dev/dm-7 ARRAY /dev/md11 level=raid5 num-devices=8 devices=/dev/dm-8,/dev/dm-9,/dev/dm-10,/dev/dm-11,/dev/dm-12,/dev/dm-13,/dev/dm-14,/dev/dm-15 ARRAY /dev/md12 level=raid5 num-devices=8 devices=/dev/dm-16,/dev/dm-17,/dev/dm-18,/dev/dm-19,/dev/dm-20,/dev/dm-21,/dev/dm-22,/dev/dm-23 All my OSTs were created with internal journal (for test pourposes). mkfs.lustre --r --ost --fsname=work --mkfsoptions="-b 4096 -E stride=32,stripe-width=224 -m 0" --mgsnid=a03...@o2ib --mgsnid=b03...@o2ib /dev/md[10|11|12] Im using separete mdt and mgs: # MGS mkfs.lustre --fsname=work --r --mgs --mkfsoptions="-b 4096 -E stride=4,stripe-width=4 -m 0" --mountfsoptions=acl --failnode=b03...@o2ib /dev/sdb1 # MDT mkfs.lustre --fsname=work --r --mgsnid=a03...@o2ib --mgsnid=b03...@o2ib --mdt --mkfsoptions="-b 4096 -E stride=4,stripe-width=40 -m 0" --mountfsoptions=acl --failnode=b03...@o2ib /dev/sdc1 I'm using these packages on server: ---------- r...@a03n00:~# rpm -aq | grep -i lustre lustre-modules-1.8.1-2.6.18_128.1.14.el5_lustre.1.8.1 lustre-client-modules-1.8.1-2.6.18_128.1.14.el5_lustre.1.8.1 lustre-ldiskfs-3.0.9-2.6.18_128.1.14.el5_lustre.1.8.1 kernel-lustre-headers-2.6.18-128.1.14.el5_lustre.1.8.1 kernel-lustre-2.6.18-128.1.14.el5_lustre.1.8.1 lustre-client-1.8.1-2.6.18_128.1.14.el5_lustre.1.8.1 kernel-lustre-devel-2.6.18-128.1.14.el5_lustre.1.8.1 lustre-1.8.1-2.6.18_128.1.14.el5_lustre.1.8.1 kernel-ib-1.4.1-2.6.18_128.1.14.el5_lustre.1.8.1 ---------- On client Ive compiled kernel 2.6.18-128.el5 without INFINIBAND support. Then compiled OFED 1.4.1 and after that compile patchless client. For the patchless client, compiled with: --ofa-kernel=/usr/src/ofa_kernel ---------- * THE ERROR Using: r...@b00n00:~# mpirun -hostfile ./lustre.hosts -np 20 /hpc/IOR -w -r -C -i 2 -b 1G -t 512k -F -o /work/stripe12/teste for example starts "hanging" the OSTs and the filesystem "hangs". Any atempt to rm or read a file (or df -kh) hangs and keeps forever (not even kill -9 solves). With that.. I cannot umount my OSTs on the OSSs. And I have to "reboot" the server, and my raids starts resyncing. Tinoco _______________________________________________ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
messages
Description: Binary data
_______________________________________________ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss