Re: [Lustre-discuss] RPC limitation

2010-03-09 Thread Jeffrey Bennett
Andreas and Oleg,

Sorry to bother you again.

Is there any way to force an RPC page size of 256kb and pack several 4kb 
random operations into a 256k page so the final throughput can increase?

Thanks

Jeffrey A. Bennett
HPC Systems Engineer
San Diego Supercomputer Center
http://users.sdsc.edu/~jab

-Original Message-
From: andreas.dil...@sun.com [mailto:andreas.dil...@sun.com] On Behalf Of 
Andreas Dilger
Sent: Friday, March 05, 2010 2:05 AM
To: Jeffrey Bennett
Cc: oleg.dro...@sun.com; lustre-discuss@lists.lustre.org
Subject: Re: [Lustre-discuss] One or two OSS, no difference?

On 2010-03-04, at 14:18, Jeffrey Bennett wrote:
 I just noticed the sequential performance is ok, but the random IO  
 (which is what I am measuring) is not. Is there any way to increase  
 random IO performance on Lustre? We have LUNs that can provide  
 around 250.000 random read 4kb IOPS but we are only seeing 3.000 to  
 10.000 on Lustre.

There is work currently underway to improve the SMP scaling  
performance for the RPC handling layer in Lustre.  Currently that  
limits the delivered RPC rate to 10-15k/sec or so.

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] One or two OSS, no difference?

2010-03-05 Thread Jeffrey Bennett
Andreas, if we are using 4kb blocks I understand we only transfer 1 page per 
RPC call, so are we limited to 10-15K RPC per second or what's the same, 
10-15.000 IOPS?

jab


-Original Message-
From: andreas.dil...@sun.com [mailto:andreas.dil...@sun.com] On Behalf Of 
Andreas Dilger
Sent: Friday, March 05, 2010 2:05 AM
To: Jeffrey Bennett
Cc: oleg.dro...@sun.com; lustre-discuss@lists.lustre.org
Subject: Re: [Lustre-discuss] One or two OSS, no difference?

On 2010-03-04, at 14:18, Jeffrey Bennett wrote:
 I just noticed the sequential performance is ok, but the random IO  
 (which is what I am measuring) is not. Is there any way to increase  
 random IO performance on Lustre? We have LUNs that can provide  
 around 250.000 random read 4kb IOPS but we are only seeing 3.000 to  
 10.000 on Lustre.

There is work currently underway to improve the SMP scaling  
performance for the RPC handling layer in Lustre.  Currently that  
limits the delivered RPC rate to 10-15k/sec or so.

 -Original Message-
 From: oleg.dro...@sun.com [mailto:oleg.dro...@sun.com]
 Sent: Thursday, March 04, 2010 12:49 PM
 To: Jeffrey Bennett
 Cc: lustre-discuss@lists.lustre.org
 Subject: Re: [Lustre-discuss] One or two OSS, no difference?

 Hello!

   This is pretty strange. Are there any differences in network  
 topology that can explain this?
   If you remove the first client, does the second one shows  
 performance
   at the level of of the first, but as soon as you start the load on  
 the first again, the second
   client performance drops?

 Bye,
Oleg
 On Mar 4, 2010, at 1:45 PM, Jeffrey Bennett wrote:

 Hi Oleg, thanks for your reply

 I was actually testing with only one client. When adding a second  
 client using a different file, one client gets all the performance  
 and the other one gets very low performance, any recommendation?

 Thanks in advance

 jab


 -Original Message-
 From: oleg.dro...@sun.com [mailto:oleg.dro...@sun.com]
 Sent: Wednesday, March 03, 2010 5:20 PM
 To: Jeffrey Bennett
 Cc: lustre-discuss@lists.lustre.org
 Subject: Re: [Lustre-discuss] One or two OSS, no difference?

 Hello!

 On Mar 3, 2010, at 6:35 PM, Jeffrey Bennett wrote:
 We are building a very small Lustre cluster with 32 clients  
 (patchless) and two OSS servers. Each OSS server has 1 OST with 1  
 TB of Solid State Drives. All is connected using dual-port DDR IB.

 For testing purposes, I am enabling/disabling one of the OSS/OST  
 by using the lfs setstripe command. I am running XDD and vdbench  
 benchmarks.

 Does anybody have an idea why there is no difference in MB/sec or  
 random IOPS when using one OSS or two OSS? A quick test with dd  
 also shows the same MB/sec when using one or two OSTs.

 I wonder if you just don't saturate even one OST (both backend SSD  
 and IB interconnect) with this number of clients? Does the total  
 throughput decreases as you decrease
 number of active clients and increases as you increase it even  
 further?
 Increasing maximum number of in-flight rpcs might help in that case.
 Also are all of your clients writing to the same file or each  
 client does io to a separate file (I hope)?

 Bye,
   Oleg

 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss


Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] One or two OSS, no difference?

2010-03-04 Thread Jeffrey Bennett
Hi Oleg,

I just noticed the sequential performance is ok, but the random IO (which is 
what I am measuring) is not. Is there any way to increase random IO performance 
on Lustre? We have LUNs that can provide around 250.000 random read 4kb IOPS 
but we are only seeing 3.000 to 10.000 on Lustre.

jab


-Original Message-
From: oleg.dro...@sun.com [mailto:oleg.dro...@sun.com] 
Sent: Thursday, March 04, 2010 12:49 PM
To: Jeffrey Bennett
Cc: lustre-discuss@lists.lustre.org
Subject: Re: [Lustre-discuss] One or two OSS, no difference?

Hello!

   This is pretty strange. Are there any differences in network topology that 
can explain this?
   If you remove the first client, does the second one shows performance
   at the level of of the first, but as soon as you start the load on the first 
again, the second
   client performance drops?

Bye,
Oleg
On Mar 4, 2010, at 1:45 PM, Jeffrey Bennett wrote:

 Hi Oleg, thanks for your reply
 
 I was actually testing with only one client. When adding a second client 
 using a different file, one client gets all the performance and the other one 
 gets very low performance, any recommendation?
 
 Thanks in advance
 
 jab
 
 
 -Original Message-
 From: oleg.dro...@sun.com [mailto:oleg.dro...@sun.com] 
 Sent: Wednesday, March 03, 2010 5:20 PM
 To: Jeffrey Bennett
 Cc: lustre-discuss@lists.lustre.org
 Subject: Re: [Lustre-discuss] One or two OSS, no difference?
 
 Hello!
 
 On Mar 3, 2010, at 6:35 PM, Jeffrey Bennett wrote:
 We are building a very small Lustre cluster with 32 clients (patchless) and 
 two OSS servers. Each OSS server has 1 OST with 1 TB of Solid State Drives. 
 All is connected using dual-port DDR IB.
 
 For testing purposes, I am enabling/disabling one of the OSS/OST by using 
 the lfs setstripe command. I am running XDD and vdbench benchmarks.
 
 Does anybody have an idea why there is no difference in MB/sec or random 
 IOPS when using one OSS or two OSS? A quick test with dd also shows the 
 same MB/sec when using one or two OSTs.
 
 I wonder if you just don't saturate even one OST (both backend SSD and IB 
 interconnect) with this number of clients? Does the total throughput 
 decreases as you decrease
 number of active clients and increases as you increase it even further?
 Increasing maximum number of in-flight rpcs might help in that case.
 Also are all of your clients writing to the same file or each client does io 
 to a separate file (I hope)?
 
 Bye,
Oleg

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Lustre Monitoring Tools

2010-01-06 Thread Jeffrey Bennett
Last time I checked, LMT was designed for Lustre 1.4. LLNL stopped development 
of LMT some time ago. Not sure if LMT will work with Lustre 1.8. If somebody 
has tried, please let everyone know.

jab


-Original Message-
From: lustre-discuss-boun...@lists.lustre.org 
[mailto:lustre-discuss-boun...@lists.lustre.org] On Behalf Of Cliff White
Sent: Wednesday, January 06, 2010 11:12 AM
To: Jagga Soorma
Cc: lustre-discuss@lists.lustre.org
Subject: Re: [Lustre-discuss] Lustre Monitoring Tools

Jagga Soorma wrote:
 Hi Guys,
 
 I would like to monitor the performance and usage of my Lustre 
 filesystem and was wondering what are the commonly used monitoring tools 
 for this?  Cacti? Nagios?  Any input would be greatly appreciated.
 
 Regards,
 -Simran
 

LLNL's LMT tool is very good. It's available on Sourceforge, afaik.
cliffw

 
 
 
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] MGT of 128 MB - already out of space

2009-12-22 Thread Jeffrey Bennett
Hi Andreas,

This turned out to be a bug on a script setting the timeout value with lctl 
every minute or so, thus filling the logs.

Hopefully a tune2fs --writeconf on the MGT will remove the logs, am I correct?  

jab


-Original Message-
From: andreas.dil...@sun.com [mailto:andreas.dil...@sun.com] On Behalf Of 
Andreas Dilger
Sent: Friday, December 18, 2009 10:26 PM
To: Jeffrey Bennett
Cc: lustre-discuss@lists.lustre.org
Subject: Re: [Lustre-discuss] MGT of 128 MB - already out of space

On 2009-12-18, at 18:13, Jeffrey Bennett wrote:
 Scenario is the following:

 - Lustre 1.8.1.1
 - 3 Lustre filesystems, fully redundant (two networks, OSSs on  
 active/active, MDSs on active/passive)
 - 1 MGS, 1 MDT, 2 OSTs
 - For the MGT, 128MB were allocated, following Lustre's manual  
 recommendations
 - The MGT is already out of space, and a ls of the MGT is showing  
 files are 8MB, like:

 -rw-r--r-- 1 root root 8.0M Dec  2 15:11 devfs-client
 -rw-r--r-- 1 root root 8.0M Dec  2 15:11 devfs-MDT
 -rw-r--r-- 1 root root 8.0M Dec  2 16:42 devfs-OST

How many OSTs do you have?  Is this consuming all of the space?

 Other lustre filesystems I have worked on show much smaller files. A  
 dumpe2fs on this MGT does not show anything strange like huge  
 block sizes, etc.

Are these files sparse by some chance?  What does ls -ls show?

It may be that your journal is consuming a lot of space?  Try running:

debugfs -c -R stat 8 /dev/{MGTdev}

You really don't need more than the absolute minimum of space for the  
MGT, which is 4MB.  You can remove the journal via tune2fs -O  
^has_journal on an umounted filesystem, then tune2fs -j -J size=4  
to recreate it at the minimum size (maybe -J size=5 if it complains).

 Question is, why are these files so big and how can we shrink them?
 Is it possible to run --writeconf to fix this?

If all of the space is really consumed by the config files, are you  
using a lot of lctl conf_param commands, ost pools, or something  
else that would put a lot of records into the config logs?

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] MGT of 128 MB - already out of space

2009-12-18 Thread Jeffrey Bennett
Hi,

Scenario is the following: 

- Lustre 1.8.1.1
- 3 Lustre filesystems, fully redundant (two networks, OSSs on active/active, 
MDSs on active/passive)
- 1 MGS, 1 MDT, 2 OSTs
- For the MGT, 128MB were allocated, following Lustre's manual recommendations
- The MGT is already out of space, and a ls of the MGT is showing files are 
8MB, like:

-rw-r--r-- 1 root root 8.0M Dec  2 15:11 devfs-client
-rw-r--r-- 1 root root 8.0M Dec  2 15:11 devfs-MDT
-rw-r--r-- 1 root root 8.0M Dec  2 16:42 devfs-OST

Other lustre filesystems I have worked on show much smaller files. A dumpe2fs 
on this MGT does not show anything strange like huge block sizes, etc.

Question is, why are these files so big and how can we shrink them? Is it 
possible to run --writeconf to fix this? 

Thanks,

Jeffrey A. Bennett
HPC Systems Engineer
San Diego Supercomputer Center
http://users.sdsc.edu/~jab

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] question about failnode with mixed networks

2009-11-24 Thread Jeffrey Bennett
Hi John,

Yes, you can use multiple MGS, but you have to tell the OSTs, in this way 
(example):

mkfs.lustre --fsname testfs --ost --mgsnode=m...@tcp0 --mgsnode=m...@tcp0 
/dev/sda

Whenever you mount the filesystem, mount it this way:

mount -t lustre m...@tcp0:m...@tcp0:/testfs /mnt/testfs

Jeffrey A. Bennett
HPC Systems Engineer
San Diego Supercomputer Center
http://users.sdsc.edu/~jab

-Original Message-
From: lustre-discuss-boun...@lists.lustre.org 
[mailto:lustre-discuss-boun...@lists.lustre.org] On Behalf Of John White
Sent: Tuesday, November 24, 2009 1:20 PM
To: Brian J. Murrell
Cc: lustre-discuss@lists.lustre.org
Subject: Re: [Lustre-discuss] question about failnode with mixed networks

Excellent, thanks for the replies.  One more question:
Is there a --failnode corollary for MGTs...?  Does lustre support MGT/S 
failover?

On Nov 16, 2009, at 6:51 AM, Brian J. Murrell wrote:

 On Fri, 2009-11-13 at 14:34 -0800, John White wrote: 
 
 In a failover situation, it would appear that tcp connected clients do not 
 get the hint to switch over to the secondary MDS
 
 Clients don't (yet) get hints to switch servers.  Clients continue to
 use a server until they don't get a response, at which time they cycle
 through their list of NIDs for the unresponsive service.
 
 When I initially set up the file system, I specified --failnode for the 
 @o2ib interfaces,
 
 Only the @o2ib interfaces?
 
 should I have also specified NIDs for the @tcp0 during the fs construction?
 
 Yes.  You specify the NIDs for all servers that should be considered for
 that service.
 
 If so, is it possible to add this as an afterthought?
 
 You want tunefs.lustre.
 
 b.
 
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss


John White
High Performance Computing Services (HPCS)
(510) 486-7307
One Cyclotron Rd, MS: 50B-3209C
Lawrence Berkeley National Lab
Berkeley, CA 94720








___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Lustre-1.9.260 mkfs.lustre errors

2009-09-16 Thread Jeffrey Bennett
Do you have same version of e2fsprogs on both?

jab  

 -Original Message-
 From: lustre-discuss-boun...@lists.lustre.org 
 [mailto:lustre-discuss-boun...@lists.lustre.org] On Behalf Of 
 Josephine Palencia
 Sent: Wednesday, September 16, 2009 6:43 AM
 To: lustre-discuss@lists.lustre.org
 Subject: [Lustre-discuss] Lustre-1.9.260 mkfs.lustre errors
 
 
 
 HEAD (Lustre-1.9.260) built on both archs (i386, x86_64).
 mkfs.lustre, mount works on  the i386.
 
 But I get this error for the x86_64 on 2 different machines:
 [r...@mds00w x86_64]# mkfs.lustre --verbose  --reformat  
 --fsname=jwan --mdt --mgsnode=mgs.jwan.teragrid@tcp0 /dev/sda8
 
 Permanent disk data:
 Target: jwan-MDT
 Index:  unassigned
 Lustre FS:  jwan
 Mount type: ldiskfs
 Flags:  0x71
(MDT needs_index first_time update ) 
 Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr
 Parameters: mgsnode=128.182.112@tcp
 
 device size = 11538MB
 2 6 18
 formatting backing filesystem ldiskfs on /dev/sda8
   target name  jwan-MDT
   4k blocks 0
   options-J size=400 -i 4096 -I 512 -O 
 dir_index,extents,uninit_groups -F
 mkfs_cmd = mke2fs -j -b 4096 -L jwan-MDT  -J size=400 -i 
 4096 -I 512 -O dir_index,extents,uninit_groups -F /dev/sda8
 cmd: mke2fs -j -b 4096 -L jwan-MDT  -J size=400 -i 4096 
 -I 512 -O dir_index,extents,uninit_groups -F /dev/sda8 mke2fs 
 1.40.7.sun3 (28-Feb-2008) Invalid filesystem option set: 
 dir_index,extents,uninit_groups  -?
 
 mkfs.lustre FATAL: Unable to build fs /dev/sda8 (256)
 
 mkfs.lustre FATAL: mkfs failed 256
 [r...@mds00w x86_64]# clear
 
 Machine 1 to serve as mdt:
 --
 [r...@mds00w x86_64]# mkfs.lustre --verbose  --reformat  
 --fsname=jwan --mdt --mgsnode=mgs.jwan.teragrid@tcp0 /dev/sda8
 
 Permanent disk data:
 Target: jwan-MDT
 Index:  unassigned
 Lustre FS:  jwan
 Mount type: ldiskfs
 Flags:  0x71
(MDT needs_index first_time update ) 
 Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr
 Parameters: mgsnode=128.182.112@tcp
 
 device size = 11538MB
 2 6 18
 formatting backing filesystem ldiskfs on /dev/sda8
   target name  jwan-MDT
   4k blocks 0
   options-J size=400 -i 4096 -I 512 -O 
 dir_index,extents,uninit_groups -F
 mkfs_cmd = mke2fs -j -b 4096 -L jwan-MDT  -J size=400 -i 
 4096 -I 512 -O dir_index,extents,uninit_groups -F /dev/sda8
 cmd: mke2fs -j -b 4096 -L jwan-MDT  -J size=400 -i 4096 
 -I 512 -O dir_index,extents,uninit_groups -F /dev/sda8 mke2fs 
 1.40.7.sun3 (28-Feb-2008)
 Invalid filesystem option set: 
 dir_index,extents,uninit_groups   -
 
 mkfs.lustre FATAL: Unable to build fs /dev/sda8 (256)
 
 mkfs.lustre FATAL: mkfs failed 256
 
 Machine 2 to serve as ost:
 -
 [r...@oss01w ~]# mkfs.lustre --reformat --fsname=jwan --ost 
 --mgsnode=mgs.jwan.teragrid@tcp0 /dev/sda8
 
 Permanent disk data:
 Target: jwan-OST
 Index:  unassigned
 Lustre FS:  jwan
 Mount type: ldiskfs
 Flags:  0x72
(OST needs_index first_time update ) 
 Persistent mount opts: errors=remount-ro,extents,mballoc
 Parameters: mgsnode=128.182.112@tcp
 
 
 mkfs.lustre FATAL: loop device requires a --device-size= param
 
 mkfs.lustre FATAL: Loop device setup for /dev/s
 ---
 
 For now, I combined the mgs/mdt on the i386 machines and that 
 created the fs and mounted without problems.
 
 I'd appreciate feedback on the 2 other machines with 
 mkfs.lustre errors.
 
 Thanks,
 josephine
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss
 
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Hastening lustrefs recovery

2009-07-16 Thread Jeffrey Bennett
  For 4 OSTS each with 7TB, ~40 connected clients , recovery time is 
  48min. Is that reasonable or is that too long?
 
 Wow.  That seems long.  That is recovery of what?  A single 
 OST or single OSS, or something other?
 
Recovery times I have been seeing on similar systems are around 2-3 minutes. 
That's what it takes clients to replay their transactions or time out. This is 
in the event of failover to a new MDS.

jab
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] gridftp and Lustre

2009-05-22 Thread Jeffrey Bennett
The gridftp servers need to be on lustre clients. No way you can send your data 
directly from the gridftp client to the OSS if this is what you're asking.

jab  

 -Original Message-
 From: lustre-discuss-boun...@lists.lustre.org 
 [mailto:lustre-discuss-boun...@lists.lustre.org] On Behalf Of Yujun Wu
 Sent: Friday, May 22, 2009 8:24 AM
 To: lustre-discuss@lists.lustre.org
 Subject: [Lustre-discuss] gridftp and Lustre
 
 Hello,
 
 Could somebody give me some advice on how to improve the 
 gridftp performance with Lustre?
 
 Currently, we are putting files onto Lustre through Lustre 
 file system mounted gridftp server. I noticed the network 
 traffic goes in this way (use putting data as an example):
 
 remote clientgridftp server---Lustre OSS
 
 And the gridftp server is busy with receiving and sending 
 packets all the time. Is there a way for the control info 
 goes to gridftp server, but the data go to Lustre OSS directly?
 
 Thanks in advance for answering my question.
 
 
 Regards,
 Yujun
 
 
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss
 
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] gridftp and Lustre

2009-05-22 Thread Jeffrey Bennett
I guess you could implement some sort of gridftp plugin, a DSI module is what 
they call it, to write data directly to the OSS and then you could install 
gridftp servers on the OSS. This is basically what the HPSS DSI module for 
Gridftp does, it writes data directly to the HPSS system using the HPSS API. 
However, I am not sure it's technically possible with Lustre.

I'll take a look at the dCache thing, thanks!

jab  

 -Original Message-
 From: Yujun Wu [mailto:yu...@phys.ufl.edu] 
 Sent: Friday, May 22, 2009 10:18 AM
 To: Jeffrey Bennett
 Cc: lustre-discuss@lists.lustre.org
 Subject: RE: [Lustre-discuss] gridftp and Lustre
 
 Hi Jeffrey,
 
 Thanks for your e-mail. Then extra traffic is moving from 
 gridftp servers to OSSs, which I really don't like---as you 
 may imagine.
 
 I know a product called dCache. The gridftp servers don't 
 handle the data traffic directly, but re-directing the data 
 to the data server(dCache pools). 
 
 
 Regards,
 Yujun
 On Fri, 22 May 2009, Jeffrey Bennett wrote:
 
  The gridftp servers need to be on lustre clients. No way 
 you can send your data directly from the gridftp client to 
 the OSS if this is what you're asking.
  
  jab
  
   -Original Message-
   From: lustre-discuss-boun...@lists.lustre.org
   [mailto:lustre-discuss-boun...@lists.lustre.org] On 
 Behalf Of Yujun 
   Wu
   Sent: Friday, May 22, 2009 8:24 AM
   To: lustre-discuss@lists.lustre.org
   Subject: [Lustre-discuss] gridftp and Lustre
   
   Hello,
   
   Could somebody give me some advice on how to improve the gridftp 
   performance with Lustre?
   
   Currently, we are putting files onto Lustre through Lustre file 
   system mounted gridftp server. I noticed the network 
 traffic goes in 
   this way (use putting data as an example):
   
   remote clientgridftp server---Lustre OSS
   
   And the gridftp server is busy with receiving and sending packets 
   all the time. Is there a way for the control info goes to gridftp 
   server, but the data go to Lustre OSS directly?
   
   Thanks in advance for answering my question.
   
   
   Regards,
   Yujun
   
   
   ___
   Lustre-discuss mailing list
   Lustre-discuss@lists.lustre.org
   http://lists.lustre.org/mailman/listinfo/lustre-discuss
   
 
 
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] lustre failover pairs

2009-05-13 Thread Jeffrey Bennett

Also note that you will need third-party software to do this failover, unlike 
GPFS.

jab  

 The difference versus GPFS (where LUNs are active on both 
 servers all the time, even though one is the primary server) 
 is that the secondary server does NOT serve the OSTs being 
 served by the primary, unless the primary is down and the OST 
 has been failed over.
 
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] client mount entry for fstab with Active/Passive MGS/MDS

2009-04-02 Thread Jeffrey Bennett
Andrew,
 
This is how I use it:
 
# mount -t lustre mds-...@tcp0:mds-...@tcp0:/npfs /mnt/npfs
 
So /etc/fstab would be something like:

mds-...@tcp0:mds-...@tcp0:/npfs /mnt/npfs lustre ...whatever...


Note that mds-0-0 and mds-0-1 are also MGS, despite their name.


jab 

 




From: lustre-discuss-boun...@lists.lustre.org 
[mailto:lustre-discuss-boun...@lists.lustre.org] On Behalf Of Lundgren, Andrew
Sent: Thursday, April 02, 2009 1:36 PM
To: Lustre discuss
Subject: [Lustre-discuss] client mount entry for fstab with 
Active/Passive MGS/MDS



I have a single MDT/MGT that is hosted on a pair of MGS/MDS machines 
acting as a failover pair.  Both MDS/MGS machines have visibility to the shared 
MDT/MGT.  One of the two machines has the FS mounted, the other does not.

 

On my client side I have the following in my fstab:

 

10.248.58@tcp0:10.248.58@tcp0:/content /content   
lustre  defaults,_netdev 0 0

 

I have also tried separating the machines with a , but it hasn't made 
any run difference.  There is a difference in /var/log/messages, but either 
way, it doesn't work.

 

Should I be using the , or the :?  Is my syntax correct?

 

Thanks!

 

--

Andrew

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Failover recovery issues / questions

2009-03-30 Thread Jeffrey Bennett
Hi, I am not familiar with using heartbeat with the OSS, I have only used it on 
the MDS for failover, since you can't have an active/active configuration on 
the MDS. However, you can have active/active on the OSS, I can't understand why 
would you want to use heartbeat to unmount the OSTs on one system if you can 
have them mounted on both?

Now when you say you kill heartbeat, what do you mean by that? You can't test 
heartbeat functionality by killing it, you have to use the provided tools for 
failing over to the other node. The tool usage and parameters depend on what 
version of heartbeat you are using.

Do you have a serial connection between these machines or a crossover cable for 
heartbeat or do you use the regular network?

jab  

 -Original Message-
 From: lustre-discuss-boun...@lists.lustre.org 
 [mailto:lustre-discuss-boun...@lists.lustre.org] On Behalf Of 
 Adam Gandelman
 Sent: Monday, March 30, 2009 4:38 PM
 To: lustre-discuss@lists.lustre.org
 Subject: [Lustre-discuss] Failover  recovery issues / questions
 
 Hi-
 
 I'm new to Lustre and am running into some issues with fail 
 over and recovery that I can't seem to find answers to in the 
 Lustre manual (v1.14).  If anyone can fill me in as to what 
 is going on (or not going on), or point me toward some 
 documentation that goes into more detail it would be greatly 
 appreciated. 
 
 It's a simple cluster at the moment:
 
 MDT/MGS data is collocated on node LUS-MDT
 
 LUS-OSS0 and LUS-OSS1 are set up in an active/active failover setup,. 
 LUS-OSS0 is primary for /dev/drbd1 and backup for /dev/drbd2, 
 LUS-OSS1 is primary for /dev/drbd2 and backup for /dev/drbd1. 
  I have heartbeat configured to monitor and handle fail over, 
 however, I run into the same problems when manually testing fail over.
 
 When heartbeat is killed on either OSS and resources failed 
 over to the backup, or when the filesystem is manually 
 unmounted and remounted on the backup node, the migrated OST 
 either 1, goes into a state of endless recovery or 2, doesn't 
 seem to go into recovery at all.  It becomes inactive on the 
 cluster entirely.  If I bring the OST's primary back up and 
 fail back the resources, the OST goes into recovery, 
 completes and comes back up online as it should.
 
 For example, if I take down OSS0, the OST fails over to it's 
 back up, however, it never makes it past this and never recovers:
 
 [r...@lus-oss0 ~]# cat
 /proc/fs/lustre/obdfilter/lustre-OST/recovery_status
 status: RECOVERING
 recovery_start: 0
 time_remaining: 0
 connected_clients: 0/4
 completed_clients: 0/4
 replayed_requests: 0/??
 queued_requests: 0
 next_transno: 2002
 
 In some instances, /proc/fs/lustre/obdfilter/lustre-OST/ 
 is empty.  
 Like I said, when the primary node comes back online and 
 resources are migrated back, the OST goes into recovery fine, 
 completes and comes back up online.
 
 Here are log output on the secondary node after fail over.
 
 Lustre: 13290:0:(filter.c:867:filter_init_server_data()) RECOVERY: 
 service lustre-OST, 4 recoverable clients, last_rcvd 2001
 Lustre: lustre-OST: underlying device drbd2 should be 
 tuned for larger I/O requests: max_sectors = 64 could be up 
 to max_hw_sectors=255
 Lustre: OST lustre-OST now serving dev 
 (lustre-OST/1ff44d23-d13a-b0c6-48e1-36c104ea6752), but 
 will be in recovery for at least 5:00, or until 4 clients 
 reconnect. During this time new clients will not be allowed 
 to connect. Recovery progress can be monitored by watching 
 /proc/fs/lustre/obdfilter/lustre-OST/recovery_status.
 Lustre: Server lustre-OST on device /dev/drbd2 has started
 Lustre: Request x8184 sent from lustre-OST-osc-c6cedc00 
 to NID 192.168.10...@tcp 100s ago has timed out (limit 100s).
 Lustre: lustre-OST-osc-c6cedc00: Connection to service 
 lustre-OST via nid 192.168.10...@tcp was lost; in 
 progress operations using this service will wait for recovery 
 to complete.
 Lustre: 3983:0:(import.c:410:import_select_connection())
 lustre-OST-osc-c6cedc00: tried all connections, 
 increasing latency to 6s
 
 
 
 
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss
 
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] Lustre with 10GbE or Infiniband?

2009-02-11 Thread Jeffrey Bennett
Hi,

Has anybody done any performance comparison between Lustre with 10GbE and 
Lustre with Infiniband 4X SDR? I wonder if they perform similarly.

Thanks,

Jeffrey A. Bennett
HPC Data Engineer
San Diego Supercomputer Center
http://users.sdsc.edu/~jab
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] Autoconf problem when compiling HEAD

2008-10-20 Thread Jeffrey Bennett
Hi,

We get the following error when compiling the HEAD version of Lustre:

[EMAIL PROTECTED] lustre]# sh ./autogen.sh 
Checking for a complete tree...
checking for automake-1.9 1.7.8... found 1.9.6
checking for autoconf 2.57... found 2.59
Running aclocal-1.9  -I /root/lustre-cvs-HEAD/lustre/build/autoconf -I
/root/lustre-cvs-HEAD/lustre/libcfs/autoconf -I
/root/lustre-cvs-HEAD/lustre/lnet/autoconf -I
/root/lustre-cvs-HEAD/lustre/lustre/autoconf -I
/root/lustre-cvs-HEAD/lustre/snmp/autoconf...
/root/lustre-cvs-HEAD/lustre/lustre/autoconf/lustre-core.m4:730: error:
m4_defn: undefined macro: _m4_divert_diversion
/root/lustre-cvs-HEAD/lustre/lustre/autoconf/kerberos5.m4:115:
AC_KERBEROS_V5 is expanded from...
/root/lustre-cvs-HEAD/lustre/lustre/autoconf/lustre-core.m4:730: the top
level
autom4te: /usr/bin/m4 failed with exit status: 1
aclocal-1.9: autom4te failed with exit status: 1

We run CentOS 5. All kerberos libraries and development RPMs are
installed and have been updated to the latest. Autoconf version is 2.59.
Automake version is 1.9.6. Not sure why we get this error message. Any
help will be greatly appreciated.

Jeff


___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Autoconf problem when compiling HEAD

2008-10-20 Thread Jeffrey Bennett
Actually, I found the solution.

Automake 1.7 needs to be installed, since automake 1.9 does not work
with Lustre CVS, seems like a bug in either Lustre or automake.

Jeff
 

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of 
 Jeffrey Bennett
 Sent: Monday, October 20, 2008 2:41 PM
 To: lustre-discuss@lists.lustre.org
 Subject: [Lustre-discuss] Autoconf problem when compiling HEAD
 
 Hi,
 
 We get the following error when compiling the HEAD version of Lustre:
 
 [EMAIL PROTECTED] lustre]# sh ./autogen.sh Checking for a 
 complete tree...
 checking for automake-1.9 1.7.8... found 1.9.6 checking for 
 autoconf 2.57... found 2.59 Running aclocal-1.9  -I 
 /root/lustre-cvs-HEAD/lustre/build/autoconf -I 
 /root/lustre-cvs-HEAD/lustre/libcfs/autoconf -I 
 /root/lustre-cvs-HEAD/lustre/lnet/autoconf -I 
 /root/lustre-cvs-HEAD/lustre/lustre/autoconf -I 
 /root/lustre-cvs-HEAD/lustre/snmp/autoconf...
 /root/lustre-cvs-HEAD/lustre/lustre/autoconf/lustre-core.m4:73
0: error:
 m4_defn: undefined macro: _m4_divert_diversion
 /root/lustre-cvs-HEAD/lustre/lustre/autoconf/kerberos5.m4:115:
 AC_KERBEROS_V5 is expanded from...
 /root/lustre-cvs-HEAD/lustre/lustre/autoconf/lustre-core.m4:73
0: the top level
 autom4te: /usr/bin/m4 failed with exit status: 1
 aclocal-1.9: autom4te failed with exit status: 1
 
 We run CentOS 5. All kerberos libraries and development RPMs 
 are installed and have been updated to the latest. Autoconf 
 version is 2.59.
 Automake version is 1.9.6. Not sure why we get this error 
 message. Any help will be greatly appreciated.
 
 Jeff
 
 
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss
 
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss