Re: [Lustre-discuss] RPC limitation
Andreas and Oleg, Sorry to bother you again. Is there any way to force an RPC page size of 256kb and pack several 4kb random operations into a 256k page so the final throughput can increase? Thanks Jeffrey A. Bennett HPC Systems Engineer San Diego Supercomputer Center http://users.sdsc.edu/~jab -Original Message- From: andreas.dil...@sun.com [mailto:andreas.dil...@sun.com] On Behalf Of Andreas Dilger Sent: Friday, March 05, 2010 2:05 AM To: Jeffrey Bennett Cc: oleg.dro...@sun.com; lustre-discuss@lists.lustre.org Subject: Re: [Lustre-discuss] One or two OSS, no difference? On 2010-03-04, at 14:18, Jeffrey Bennett wrote: I just noticed the sequential performance is ok, but the random IO (which is what I am measuring) is not. Is there any way to increase random IO performance on Lustre? We have LUNs that can provide around 250.000 random read 4kb IOPS but we are only seeing 3.000 to 10.000 on Lustre. There is work currently underway to improve the SMP scaling performance for the RPC handling layer in Lustre. Currently that limits the delivered RPC rate to 10-15k/sec or so. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] One or two OSS, no difference?
Andreas, if we are using 4kb blocks I understand we only transfer 1 page per RPC call, so are we limited to 10-15K RPC per second or what's the same, 10-15.000 IOPS? jab -Original Message- From: andreas.dil...@sun.com [mailto:andreas.dil...@sun.com] On Behalf Of Andreas Dilger Sent: Friday, March 05, 2010 2:05 AM To: Jeffrey Bennett Cc: oleg.dro...@sun.com; lustre-discuss@lists.lustre.org Subject: Re: [Lustre-discuss] One or two OSS, no difference? On 2010-03-04, at 14:18, Jeffrey Bennett wrote: I just noticed the sequential performance is ok, but the random IO (which is what I am measuring) is not. Is there any way to increase random IO performance on Lustre? We have LUNs that can provide around 250.000 random read 4kb IOPS but we are only seeing 3.000 to 10.000 on Lustre. There is work currently underway to improve the SMP scaling performance for the RPC handling layer in Lustre. Currently that limits the delivered RPC rate to 10-15k/sec or so. -Original Message- From: oleg.dro...@sun.com [mailto:oleg.dro...@sun.com] Sent: Thursday, March 04, 2010 12:49 PM To: Jeffrey Bennett Cc: lustre-discuss@lists.lustre.org Subject: Re: [Lustre-discuss] One or two OSS, no difference? Hello! This is pretty strange. Are there any differences in network topology that can explain this? If you remove the first client, does the second one shows performance at the level of of the first, but as soon as you start the load on the first again, the second client performance drops? Bye, Oleg On Mar 4, 2010, at 1:45 PM, Jeffrey Bennett wrote: Hi Oleg, thanks for your reply I was actually testing with only one client. When adding a second client using a different file, one client gets all the performance and the other one gets very low performance, any recommendation? Thanks in advance jab -Original Message- From: oleg.dro...@sun.com [mailto:oleg.dro...@sun.com] Sent: Wednesday, March 03, 2010 5:20 PM To: Jeffrey Bennett Cc: lustre-discuss@lists.lustre.org Subject: Re: [Lustre-discuss] One or two OSS, no difference? Hello! On Mar 3, 2010, at 6:35 PM, Jeffrey Bennett wrote: We are building a very small Lustre cluster with 32 clients (patchless) and two OSS servers. Each OSS server has 1 OST with 1 TB of Solid State Drives. All is connected using dual-port DDR IB. For testing purposes, I am enabling/disabling one of the OSS/OST by using the lfs setstripe command. I am running XDD and vdbench benchmarks. Does anybody have an idea why there is no difference in MB/sec or random IOPS when using one OSS or two OSS? A quick test with dd also shows the same MB/sec when using one or two OSTs. I wonder if you just don't saturate even one OST (both backend SSD and IB interconnect) with this number of clients? Does the total throughput decreases as you decrease number of active clients and increases as you increase it even further? Increasing maximum number of in-flight rpcs might help in that case. Also are all of your clients writing to the same file or each client does io to a separate file (I hope)? Bye, Oleg ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] One or two OSS, no difference?
Hi Oleg, I just noticed the sequential performance is ok, but the random IO (which is what I am measuring) is not. Is there any way to increase random IO performance on Lustre? We have LUNs that can provide around 250.000 random read 4kb IOPS but we are only seeing 3.000 to 10.000 on Lustre. jab -Original Message- From: oleg.dro...@sun.com [mailto:oleg.dro...@sun.com] Sent: Thursday, March 04, 2010 12:49 PM To: Jeffrey Bennett Cc: lustre-discuss@lists.lustre.org Subject: Re: [Lustre-discuss] One or two OSS, no difference? Hello! This is pretty strange. Are there any differences in network topology that can explain this? If you remove the first client, does the second one shows performance at the level of of the first, but as soon as you start the load on the first again, the second client performance drops? Bye, Oleg On Mar 4, 2010, at 1:45 PM, Jeffrey Bennett wrote: Hi Oleg, thanks for your reply I was actually testing with only one client. When adding a second client using a different file, one client gets all the performance and the other one gets very low performance, any recommendation? Thanks in advance jab -Original Message- From: oleg.dro...@sun.com [mailto:oleg.dro...@sun.com] Sent: Wednesday, March 03, 2010 5:20 PM To: Jeffrey Bennett Cc: lustre-discuss@lists.lustre.org Subject: Re: [Lustre-discuss] One or two OSS, no difference? Hello! On Mar 3, 2010, at 6:35 PM, Jeffrey Bennett wrote: We are building a very small Lustre cluster with 32 clients (patchless) and two OSS servers. Each OSS server has 1 OST with 1 TB of Solid State Drives. All is connected using dual-port DDR IB. For testing purposes, I am enabling/disabling one of the OSS/OST by using the lfs setstripe command. I am running XDD and vdbench benchmarks. Does anybody have an idea why there is no difference in MB/sec or random IOPS when using one OSS or two OSS? A quick test with dd also shows the same MB/sec when using one or two OSTs. I wonder if you just don't saturate even one OST (both backend SSD and IB interconnect) with this number of clients? Does the total throughput decreases as you decrease number of active clients and increases as you increase it even further? Increasing maximum number of in-flight rpcs might help in that case. Also are all of your clients writing to the same file or each client does io to a separate file (I hope)? Bye, Oleg ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Lustre Monitoring Tools
Last time I checked, LMT was designed for Lustre 1.4. LLNL stopped development of LMT some time ago. Not sure if LMT will work with Lustre 1.8. If somebody has tried, please let everyone know. jab -Original Message- From: lustre-discuss-boun...@lists.lustre.org [mailto:lustre-discuss-boun...@lists.lustre.org] On Behalf Of Cliff White Sent: Wednesday, January 06, 2010 11:12 AM To: Jagga Soorma Cc: lustre-discuss@lists.lustre.org Subject: Re: [Lustre-discuss] Lustre Monitoring Tools Jagga Soorma wrote: Hi Guys, I would like to monitor the performance and usage of my Lustre filesystem and was wondering what are the commonly used monitoring tools for this? Cacti? Nagios? Any input would be greatly appreciated. Regards, -Simran LLNL's LMT tool is very good. It's available on Sourceforge, afaik. cliffw ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] MGT of 128 MB - already out of space
Hi Andreas, This turned out to be a bug on a script setting the timeout value with lctl every minute or so, thus filling the logs. Hopefully a tune2fs --writeconf on the MGT will remove the logs, am I correct? jab -Original Message- From: andreas.dil...@sun.com [mailto:andreas.dil...@sun.com] On Behalf Of Andreas Dilger Sent: Friday, December 18, 2009 10:26 PM To: Jeffrey Bennett Cc: lustre-discuss@lists.lustre.org Subject: Re: [Lustre-discuss] MGT of 128 MB - already out of space On 2009-12-18, at 18:13, Jeffrey Bennett wrote: Scenario is the following: - Lustre 1.8.1.1 - 3 Lustre filesystems, fully redundant (two networks, OSSs on active/active, MDSs on active/passive) - 1 MGS, 1 MDT, 2 OSTs - For the MGT, 128MB were allocated, following Lustre's manual recommendations - The MGT is already out of space, and a ls of the MGT is showing files are 8MB, like: -rw-r--r-- 1 root root 8.0M Dec 2 15:11 devfs-client -rw-r--r-- 1 root root 8.0M Dec 2 15:11 devfs-MDT -rw-r--r-- 1 root root 8.0M Dec 2 16:42 devfs-OST How many OSTs do you have? Is this consuming all of the space? Other lustre filesystems I have worked on show much smaller files. A dumpe2fs on this MGT does not show anything strange like huge block sizes, etc. Are these files sparse by some chance? What does ls -ls show? It may be that your journal is consuming a lot of space? Try running: debugfs -c -R stat 8 /dev/{MGTdev} You really don't need more than the absolute minimum of space for the MGT, which is 4MB. You can remove the journal via tune2fs -O ^has_journal on an umounted filesystem, then tune2fs -j -J size=4 to recreate it at the minimum size (maybe -J size=5 if it complains). Question is, why are these files so big and how can we shrink them? Is it possible to run --writeconf to fix this? If all of the space is really consumed by the config files, are you using a lot of lctl conf_param commands, ost pools, or something else that would put a lot of records into the config logs? Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] MGT of 128 MB - already out of space
Hi, Scenario is the following: - Lustre 1.8.1.1 - 3 Lustre filesystems, fully redundant (two networks, OSSs on active/active, MDSs on active/passive) - 1 MGS, 1 MDT, 2 OSTs - For the MGT, 128MB were allocated, following Lustre's manual recommendations - The MGT is already out of space, and a ls of the MGT is showing files are 8MB, like: -rw-r--r-- 1 root root 8.0M Dec 2 15:11 devfs-client -rw-r--r-- 1 root root 8.0M Dec 2 15:11 devfs-MDT -rw-r--r-- 1 root root 8.0M Dec 2 16:42 devfs-OST Other lustre filesystems I have worked on show much smaller files. A dumpe2fs on this MGT does not show anything strange like huge block sizes, etc. Question is, why are these files so big and how can we shrink them? Is it possible to run --writeconf to fix this? Thanks, Jeffrey A. Bennett HPC Systems Engineer San Diego Supercomputer Center http://users.sdsc.edu/~jab ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] question about failnode with mixed networks
Hi John, Yes, you can use multiple MGS, but you have to tell the OSTs, in this way (example): mkfs.lustre --fsname testfs --ost --mgsnode=m...@tcp0 --mgsnode=m...@tcp0 /dev/sda Whenever you mount the filesystem, mount it this way: mount -t lustre m...@tcp0:m...@tcp0:/testfs /mnt/testfs Jeffrey A. Bennett HPC Systems Engineer San Diego Supercomputer Center http://users.sdsc.edu/~jab -Original Message- From: lustre-discuss-boun...@lists.lustre.org [mailto:lustre-discuss-boun...@lists.lustre.org] On Behalf Of John White Sent: Tuesday, November 24, 2009 1:20 PM To: Brian J. Murrell Cc: lustre-discuss@lists.lustre.org Subject: Re: [Lustre-discuss] question about failnode with mixed networks Excellent, thanks for the replies. One more question: Is there a --failnode corollary for MGTs...? Does lustre support MGT/S failover? On Nov 16, 2009, at 6:51 AM, Brian J. Murrell wrote: On Fri, 2009-11-13 at 14:34 -0800, John White wrote: In a failover situation, it would appear that tcp connected clients do not get the hint to switch over to the secondary MDS Clients don't (yet) get hints to switch servers. Clients continue to use a server until they don't get a response, at which time they cycle through their list of NIDs for the unresponsive service. When I initially set up the file system, I specified --failnode for the @o2ib interfaces, Only the @o2ib interfaces? should I have also specified NIDs for the @tcp0 during the fs construction? Yes. You specify the NIDs for all servers that should be considered for that service. If so, is it possible to add this as an afterthought? You want tunefs.lustre. b. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss John White High Performance Computing Services (HPCS) (510) 486-7307 One Cyclotron Rd, MS: 50B-3209C Lawrence Berkeley National Lab Berkeley, CA 94720 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Lustre-1.9.260 mkfs.lustre errors
Do you have same version of e2fsprogs on both? jab -Original Message- From: lustre-discuss-boun...@lists.lustre.org [mailto:lustre-discuss-boun...@lists.lustre.org] On Behalf Of Josephine Palencia Sent: Wednesday, September 16, 2009 6:43 AM To: lustre-discuss@lists.lustre.org Subject: [Lustre-discuss] Lustre-1.9.260 mkfs.lustre errors HEAD (Lustre-1.9.260) built on both archs (i386, x86_64). mkfs.lustre, mount works on the i386. But I get this error for the x86_64 on 2 different machines: [r...@mds00w x86_64]# mkfs.lustre --verbose --reformat --fsname=jwan --mdt --mgsnode=mgs.jwan.teragrid@tcp0 /dev/sda8 Permanent disk data: Target: jwan-MDT Index: unassigned Lustre FS: jwan Mount type: ldiskfs Flags: 0x71 (MDT needs_index first_time update ) Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr Parameters: mgsnode=128.182.112@tcp device size = 11538MB 2 6 18 formatting backing filesystem ldiskfs on /dev/sda8 target name jwan-MDT 4k blocks 0 options-J size=400 -i 4096 -I 512 -O dir_index,extents,uninit_groups -F mkfs_cmd = mke2fs -j -b 4096 -L jwan-MDT -J size=400 -i 4096 -I 512 -O dir_index,extents,uninit_groups -F /dev/sda8 cmd: mke2fs -j -b 4096 -L jwan-MDT -J size=400 -i 4096 -I 512 -O dir_index,extents,uninit_groups -F /dev/sda8 mke2fs 1.40.7.sun3 (28-Feb-2008) Invalid filesystem option set: dir_index,extents,uninit_groups -? mkfs.lustre FATAL: Unable to build fs /dev/sda8 (256) mkfs.lustre FATAL: mkfs failed 256 [r...@mds00w x86_64]# clear Machine 1 to serve as mdt: -- [r...@mds00w x86_64]# mkfs.lustre --verbose --reformat --fsname=jwan --mdt --mgsnode=mgs.jwan.teragrid@tcp0 /dev/sda8 Permanent disk data: Target: jwan-MDT Index: unassigned Lustre FS: jwan Mount type: ldiskfs Flags: 0x71 (MDT needs_index first_time update ) Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr Parameters: mgsnode=128.182.112@tcp device size = 11538MB 2 6 18 formatting backing filesystem ldiskfs on /dev/sda8 target name jwan-MDT 4k blocks 0 options-J size=400 -i 4096 -I 512 -O dir_index,extents,uninit_groups -F mkfs_cmd = mke2fs -j -b 4096 -L jwan-MDT -J size=400 -i 4096 -I 512 -O dir_index,extents,uninit_groups -F /dev/sda8 cmd: mke2fs -j -b 4096 -L jwan-MDT -J size=400 -i 4096 -I 512 -O dir_index,extents,uninit_groups -F /dev/sda8 mke2fs 1.40.7.sun3 (28-Feb-2008) Invalid filesystem option set: dir_index,extents,uninit_groups - mkfs.lustre FATAL: Unable to build fs /dev/sda8 (256) mkfs.lustre FATAL: mkfs failed 256 Machine 2 to serve as ost: - [r...@oss01w ~]# mkfs.lustre --reformat --fsname=jwan --ost --mgsnode=mgs.jwan.teragrid@tcp0 /dev/sda8 Permanent disk data: Target: jwan-OST Index: unassigned Lustre FS: jwan Mount type: ldiskfs Flags: 0x72 (OST needs_index first_time update ) Persistent mount opts: errors=remount-ro,extents,mballoc Parameters: mgsnode=128.182.112@tcp mkfs.lustre FATAL: loop device requires a --device-size= param mkfs.lustre FATAL: Loop device setup for /dev/s --- For now, I combined the mgs/mdt on the i386 machines and that created the fs and mounted without problems. I'd appreciate feedback on the 2 other machines with mkfs.lustre errors. Thanks, josephine ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Hastening lustrefs recovery
For 4 OSTS each with 7TB, ~40 connected clients , recovery time is 48min. Is that reasonable or is that too long? Wow. That seems long. That is recovery of what? A single OST or single OSS, or something other? Recovery times I have been seeing on similar systems are around 2-3 minutes. That's what it takes clients to replay their transactions or time out. This is in the event of failover to a new MDS. jab ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] gridftp and Lustre
The gridftp servers need to be on lustre clients. No way you can send your data directly from the gridftp client to the OSS if this is what you're asking. jab -Original Message- From: lustre-discuss-boun...@lists.lustre.org [mailto:lustre-discuss-boun...@lists.lustre.org] On Behalf Of Yujun Wu Sent: Friday, May 22, 2009 8:24 AM To: lustre-discuss@lists.lustre.org Subject: [Lustre-discuss] gridftp and Lustre Hello, Could somebody give me some advice on how to improve the gridftp performance with Lustre? Currently, we are putting files onto Lustre through Lustre file system mounted gridftp server. I noticed the network traffic goes in this way (use putting data as an example): remote clientgridftp server---Lustre OSS And the gridftp server is busy with receiving and sending packets all the time. Is there a way for the control info goes to gridftp server, but the data go to Lustre OSS directly? Thanks in advance for answering my question. Regards, Yujun ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] gridftp and Lustre
I guess you could implement some sort of gridftp plugin, a DSI module is what they call it, to write data directly to the OSS and then you could install gridftp servers on the OSS. This is basically what the HPSS DSI module for Gridftp does, it writes data directly to the HPSS system using the HPSS API. However, I am not sure it's technically possible with Lustre. I'll take a look at the dCache thing, thanks! jab -Original Message- From: Yujun Wu [mailto:yu...@phys.ufl.edu] Sent: Friday, May 22, 2009 10:18 AM To: Jeffrey Bennett Cc: lustre-discuss@lists.lustre.org Subject: RE: [Lustre-discuss] gridftp and Lustre Hi Jeffrey, Thanks for your e-mail. Then extra traffic is moving from gridftp servers to OSSs, which I really don't like---as you may imagine. I know a product called dCache. The gridftp servers don't handle the data traffic directly, but re-directing the data to the data server(dCache pools). Regards, Yujun On Fri, 22 May 2009, Jeffrey Bennett wrote: The gridftp servers need to be on lustre clients. No way you can send your data directly from the gridftp client to the OSS if this is what you're asking. jab -Original Message- From: lustre-discuss-boun...@lists.lustre.org [mailto:lustre-discuss-boun...@lists.lustre.org] On Behalf Of Yujun Wu Sent: Friday, May 22, 2009 8:24 AM To: lustre-discuss@lists.lustre.org Subject: [Lustre-discuss] gridftp and Lustre Hello, Could somebody give me some advice on how to improve the gridftp performance with Lustre? Currently, we are putting files onto Lustre through Lustre file system mounted gridftp server. I noticed the network traffic goes in this way (use putting data as an example): remote clientgridftp server---Lustre OSS And the gridftp server is busy with receiving and sending packets all the time. Is there a way for the control info goes to gridftp server, but the data go to Lustre OSS directly? Thanks in advance for answering my question. Regards, Yujun ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] lustre failover pairs
Also note that you will need third-party software to do this failover, unlike GPFS. jab The difference versus GPFS (where LUNs are active on both servers all the time, even though one is the primary server) is that the secondary server does NOT serve the OSTs being served by the primary, unless the primary is down and the OST has been failed over. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] client mount entry for fstab with Active/Passive MGS/MDS
Andrew, This is how I use it: # mount -t lustre mds-...@tcp0:mds-...@tcp0:/npfs /mnt/npfs So /etc/fstab would be something like: mds-...@tcp0:mds-...@tcp0:/npfs /mnt/npfs lustre ...whatever... Note that mds-0-0 and mds-0-1 are also MGS, despite their name. jab From: lustre-discuss-boun...@lists.lustre.org [mailto:lustre-discuss-boun...@lists.lustre.org] On Behalf Of Lundgren, Andrew Sent: Thursday, April 02, 2009 1:36 PM To: Lustre discuss Subject: [Lustre-discuss] client mount entry for fstab with Active/Passive MGS/MDS I have a single MDT/MGT that is hosted on a pair of MGS/MDS machines acting as a failover pair. Both MDS/MGS machines have visibility to the shared MDT/MGT. One of the two machines has the FS mounted, the other does not. On my client side I have the following in my fstab: 10.248.58@tcp0:10.248.58@tcp0:/content /content lustre defaults,_netdev 0 0 I have also tried separating the machines with a , but it hasn't made any run difference. There is a difference in /var/log/messages, but either way, it doesn't work. Should I be using the , or the :? Is my syntax correct? Thanks! -- Andrew ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Failover recovery issues / questions
Hi, I am not familiar with using heartbeat with the OSS, I have only used it on the MDS for failover, since you can't have an active/active configuration on the MDS. However, you can have active/active on the OSS, I can't understand why would you want to use heartbeat to unmount the OSTs on one system if you can have them mounted on both? Now when you say you kill heartbeat, what do you mean by that? You can't test heartbeat functionality by killing it, you have to use the provided tools for failing over to the other node. The tool usage and parameters depend on what version of heartbeat you are using. Do you have a serial connection between these machines or a crossover cable for heartbeat or do you use the regular network? jab -Original Message- From: lustre-discuss-boun...@lists.lustre.org [mailto:lustre-discuss-boun...@lists.lustre.org] On Behalf Of Adam Gandelman Sent: Monday, March 30, 2009 4:38 PM To: lustre-discuss@lists.lustre.org Subject: [Lustre-discuss] Failover recovery issues / questions Hi- I'm new to Lustre and am running into some issues with fail over and recovery that I can't seem to find answers to in the Lustre manual (v1.14). If anyone can fill me in as to what is going on (or not going on), or point me toward some documentation that goes into more detail it would be greatly appreciated. It's a simple cluster at the moment: MDT/MGS data is collocated on node LUS-MDT LUS-OSS0 and LUS-OSS1 are set up in an active/active failover setup,. LUS-OSS0 is primary for /dev/drbd1 and backup for /dev/drbd2, LUS-OSS1 is primary for /dev/drbd2 and backup for /dev/drbd1. I have heartbeat configured to monitor and handle fail over, however, I run into the same problems when manually testing fail over. When heartbeat is killed on either OSS and resources failed over to the backup, or when the filesystem is manually unmounted and remounted on the backup node, the migrated OST either 1, goes into a state of endless recovery or 2, doesn't seem to go into recovery at all. It becomes inactive on the cluster entirely. If I bring the OST's primary back up and fail back the resources, the OST goes into recovery, completes and comes back up online as it should. For example, if I take down OSS0, the OST fails over to it's back up, however, it never makes it past this and never recovers: [r...@lus-oss0 ~]# cat /proc/fs/lustre/obdfilter/lustre-OST/recovery_status status: RECOVERING recovery_start: 0 time_remaining: 0 connected_clients: 0/4 completed_clients: 0/4 replayed_requests: 0/?? queued_requests: 0 next_transno: 2002 In some instances, /proc/fs/lustre/obdfilter/lustre-OST/ is empty. Like I said, when the primary node comes back online and resources are migrated back, the OST goes into recovery fine, completes and comes back up online. Here are log output on the secondary node after fail over. Lustre: 13290:0:(filter.c:867:filter_init_server_data()) RECOVERY: service lustre-OST, 4 recoverable clients, last_rcvd 2001 Lustre: lustre-OST: underlying device drbd2 should be tuned for larger I/O requests: max_sectors = 64 could be up to max_hw_sectors=255 Lustre: OST lustre-OST now serving dev (lustre-OST/1ff44d23-d13a-b0c6-48e1-36c104ea6752), but will be in recovery for at least 5:00, or until 4 clients reconnect. During this time new clients will not be allowed to connect. Recovery progress can be monitored by watching /proc/fs/lustre/obdfilter/lustre-OST/recovery_status. Lustre: Server lustre-OST on device /dev/drbd2 has started Lustre: Request x8184 sent from lustre-OST-osc-c6cedc00 to NID 192.168.10...@tcp 100s ago has timed out (limit 100s). Lustre: lustre-OST-osc-c6cedc00: Connection to service lustre-OST via nid 192.168.10...@tcp was lost; in progress operations using this service will wait for recovery to complete. Lustre: 3983:0:(import.c:410:import_select_connection()) lustre-OST-osc-c6cedc00: tried all connections, increasing latency to 6s ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] Lustre with 10GbE or Infiniband?
Hi, Has anybody done any performance comparison between Lustre with 10GbE and Lustre with Infiniband 4X SDR? I wonder if they perform similarly. Thanks, Jeffrey A. Bennett HPC Data Engineer San Diego Supercomputer Center http://users.sdsc.edu/~jab ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] Autoconf problem when compiling HEAD
Hi, We get the following error when compiling the HEAD version of Lustre: [EMAIL PROTECTED] lustre]# sh ./autogen.sh Checking for a complete tree... checking for automake-1.9 1.7.8... found 1.9.6 checking for autoconf 2.57... found 2.59 Running aclocal-1.9 -I /root/lustre-cvs-HEAD/lustre/build/autoconf -I /root/lustre-cvs-HEAD/lustre/libcfs/autoconf -I /root/lustre-cvs-HEAD/lustre/lnet/autoconf -I /root/lustre-cvs-HEAD/lustre/lustre/autoconf -I /root/lustre-cvs-HEAD/lustre/snmp/autoconf... /root/lustre-cvs-HEAD/lustre/lustre/autoconf/lustre-core.m4:730: error: m4_defn: undefined macro: _m4_divert_diversion /root/lustre-cvs-HEAD/lustre/lustre/autoconf/kerberos5.m4:115: AC_KERBEROS_V5 is expanded from... /root/lustre-cvs-HEAD/lustre/lustre/autoconf/lustre-core.m4:730: the top level autom4te: /usr/bin/m4 failed with exit status: 1 aclocal-1.9: autom4te failed with exit status: 1 We run CentOS 5. All kerberos libraries and development RPMs are installed and have been updated to the latest. Autoconf version is 2.59. Automake version is 1.9.6. Not sure why we get this error message. Any help will be greatly appreciated. Jeff ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Autoconf problem when compiling HEAD
Actually, I found the solution. Automake 1.7 needs to be installed, since automake 1.9 does not work with Lustre CVS, seems like a bug in either Lustre or automake. Jeff -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Jeffrey Bennett Sent: Monday, October 20, 2008 2:41 PM To: lustre-discuss@lists.lustre.org Subject: [Lustre-discuss] Autoconf problem when compiling HEAD Hi, We get the following error when compiling the HEAD version of Lustre: [EMAIL PROTECTED] lustre]# sh ./autogen.sh Checking for a complete tree... checking for automake-1.9 1.7.8... found 1.9.6 checking for autoconf 2.57... found 2.59 Running aclocal-1.9 -I /root/lustre-cvs-HEAD/lustre/build/autoconf -I /root/lustre-cvs-HEAD/lustre/libcfs/autoconf -I /root/lustre-cvs-HEAD/lustre/lnet/autoconf -I /root/lustre-cvs-HEAD/lustre/lustre/autoconf -I /root/lustre-cvs-HEAD/lustre/snmp/autoconf... /root/lustre-cvs-HEAD/lustre/lustre/autoconf/lustre-core.m4:73 0: error: m4_defn: undefined macro: _m4_divert_diversion /root/lustre-cvs-HEAD/lustre/lustre/autoconf/kerberos5.m4:115: AC_KERBEROS_V5 is expanded from... /root/lustre-cvs-HEAD/lustre/lustre/autoconf/lustre-core.m4:73 0: the top level autom4te: /usr/bin/m4 failed with exit status: 1 aclocal-1.9: autom4te failed with exit status: 1 We run CentOS 5. All kerberos libraries and development RPMs are installed and have been updated to the latest. Autoconf version is 2.59. Automake version is 1.9.6. Not sure why we get this error message. Any help will be greatly appreciated. Jeff ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss