Re: [Lustre-discuss] Lustre and Sync IO

2014-06-12 Thread Keith Mannthey
What do you mean by sync IO?

Thanks,
 Keith 

On Thu, 2014-06-12 at 15:46 +0200, Andrew Holway wrote:
 Hi,
 
 
 Can someone give me the story on Lustre and sync IO?
 
 
 Thanks,
 
 
 Andrew
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] module dependecies

2014-05-15 Thread Keith Mannthey
Michael, 
  LNET configuration is done at module load time. I don't see any way
you will you not have to bring down parts of the FS to adjust the system
at this level. 

There is some Dynamics Lnet Config work on its way that might allow such
a switch but not in todays code. 

Thanks,
 Keith 


On Thu, 2014-05-15 at 15:13 +, Hebenstreit, Michael wrote:
 Please do not ask why, but I need to be able to replace the IB stack
 (as in – all InfiniBand modules) at runtime with a Lustre FS mounted.
 Is there a possibility to tell lnet to completely switch to tcp,
 unload ko2iblnd.ko and next unload the IB stack, load the new IB
 stack, load a matching ko2iblnd.ko and then switch lnet back to
 preferring o2ib?
 
  
 
 Thanks
 
 Michael
 
  
 
 
 Michael Hebenstreit Senior Cluster Architect
 Intel Corporation, MS: RR1-105/H14  Software and Services Group/DCE
 
 4100 Sara Road  Tel.:   +1 505-794-3144 
 
 Rio Rancho, NM 87124
 
 UNITED STATES   E-mail:
 michael.hebenstr...@intel.com
 
  
 
 
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] module dependecies

2014-05-15 Thread Keith Mannthey
Extra Note: If you have Dual Attached Storage and are setup for
Imperative Failover  you can failover targets and fixup your Servers one
at a time without impacting access.

Thanks,
 Keith 



On Thu, 2014-05-15 at 08:24 -0700, Keith Mannthey wrote:
 Michael, 
   LNET configuration is done at module load time. I don't see any way
 you will you not have to bring down parts of the FS to adjust the system
 at this level. 
 
 There is some Dynamics Lnet Config work on its way that might allow such
 a switch but not in todays code. 
 
 Thanks,
  Keith 
 
 
 On Thu, 2014-05-15 at 15:13 +, Hebenstreit, Michael wrote:
  Please do not ask why, but I need to be able to replace the IB stack
  (as in – all InfiniBand modules) at runtime with a Lustre FS mounted.
  Is there a possibility to tell lnet to completely switch to tcp,
  unload ko2iblnd.ko and next unload the IB stack, load the new IB
  stack, load a matching ko2iblnd.ko and then switch lnet back to
  preferring o2ib?
  
   
  
  Thanks
  
  Michael
  
   
  
  
  Michael Hebenstreit Senior Cluster Architect
  Intel Corporation, MS: RR1-105/H14  Software and Services Group/DCE
  
  4100 Sara Road  Tel.:   +1 505-794-3144 
  
  Rio Rancho, NM 87124
  
  UNITED STATES   E-mail:
  michael.hebenstr...@intel.com
  
   
  
  
  ___
  Lustre-discuss mailing list
  Lustre-discuss@lists.lustre.org
  http://lists.lustre.org/mailman/listinfo/lustre-discuss
 
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Lustre Build - Ubuntu 14.04 LTS

2014-05-01 Thread Keith Mannthey
I recommended you use Lustre 2.5+ for modern Linux kernels.  A large
amount of build changes would be need to be ported to run Lustre 2.3
version against your kernel version.  

Thanks,
 Keith  




On Thu, 2014-05-01 at 15:03 -0700, Steven Lokie wrote:
 Trying to run a specific version of lustre client for our setup at
 work - I am running into a weird error message on ./configure 
 
 
 personally never seen this error before - checking for external module
 build support... configure: error: unknown; check config.log for
 details
 
 
 GIT BUILD: 
 
 
 git clone git://git.whamcloud.com/fs/lustre-release.git
 cd lustre-release/
 git checkout --track -b b2_3 origin/b2_3
 sh ./autogen.sh
 ./configure --disable-server
 
 
 Log: 
 
 
 root@linux-desktop:/home/imemadmin/lustre-release# ./configure
 --without-server
 
 checking build system type... x86_64-unknown-linux-gnu
 checking host system type... x86_64-unknown-linux-gnu
 checking target system type... x86_64-unknown-linux-gnu
 checking for a BSD-compatible install... /usr/bin/install -c
 checking whether build environment is sane... yes
 checking for gawk... no
 checking for mawk... mawk
 checking whether make sets $(MAKE)... yes
 checking how to create a ustar tar archive... gnutar
 checking for gcc... gcc
 checking whether the C compiler works... yes
 checking for C compiler default output file name... a.out
 checking for suffix of executables... 
 checking whether we are cross compiling... no
 checking for suffix of object files... o
 checking whether we are using the GNU C compiler... yes
 checking whether gcc accepts -g... yes
 checking for gcc option to accept ISO C89... none needed
 checking for style of include used by make... GNU
 checking dependency style of gcc... gcc3
 checking how to run the C preprocessor... gcc -E
 checking for grep that handles long lines and -e... /bin/grep
 checking for egrep... /bin/grep -E
 checking for ANSI C header files... yes
 checking for sys/types.h... yes
 checking for sys/stat.h... yes
 checking for stdlib.h... yes
 checking for string.h... yes
 checking for memory.h... yes
 checking for strings.h... yes
 checking for inttypes.h... yes
 checking for stdint.h... yes
 checking for unistd.h... yes
 checking whether to configure just enough for make dist... no
 checking if this distro uses dpkg... yes
 checking for buildid... none... congratulations, you must be on a tag
 checking whether to build BGL features... no
 checking for ranlib... ranlib
 checking for buggy compiler... no known problems
 checking size of unsigned long long... 8
 --- size SIZEOF 
 --- size SIZEOF 8
 checking whether to enable uoss... no
 checking whether to enable posix osd... no
 checking whether to build docs... no
 checking whether to build utilities... yes
 checking whether to install init scripts... no
 checking whether to build Lustre tests... yes
 checking whether to build Lustre server support... yes
 checking whether to build Lustre client support... yes
 checking whether to enable split support... no
 checking whether to enable CDEBUG, CWARN... yes
 checking whether to enable ENTRY/EXIT... yes
 checking whether to enable LASSERT, LASSERTF... yes
 checking sys/quota.h usability... yes
 checking sys/quota.h presence... yes
 checking for sys/quota.h... yes
 checking whether to build kernel modules... yes (linux-gnu)
 /usr/src/linux-headers-3.13.0-24-generic
 /usr/src/linux-headers-3.13.0-24-generic
 checking for Linux sources... /lib/modules/3.13.0-24-generic/build
 checking for /lib/modules/3.13.0-24-generic/build... yes
 checking for Linux objects dir... /lib/modules/3.13.0-24-generic/build
 checking for /boot/kernel.h... no
 checking for /var/adm/running-kernel.h... no
 checking for /lib/modules/3.13.0-24-generic/build/.config... yes
 checking
 for /lib/modules/3.13.0-24-generic/build/include/generated/autoconf.h... yes
 checking
 for /lib/modules/3.13.0-24-generic/build/include/linux/version.h...
 yes
 checking
 for /lib/modules/3.13.0-24-generic/build/include/linux/kconfig.h...
 yes
 checking if you are running user mode linux for x86_64... no (asm-um
 missing)
 checking
 for /lib/modules/3.13.0-24-generic/build/include/linux/namei.h... yes
 checking if you are using Linux 2.6... yes
 checking for external module build support... configure: error:
 unknown; check config.log for details
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Is it safe to run MDS, MGS OSS on the same machine ?

2014-03-05 Thread Keith Mannthey


On Wed, 2014-03-05 at 10:24 +0100, Rafal Maszkowski wrote:
 On Tue, Mar 04, 2014 at 10:55:05PM +, Dilger, Andreas wrote:
  On 2014/03/04, 2:38 AM, 邓尧 tors...@gmail.commailto:tors...@gmail.com 
  wrote:
  We're running low on physical machines, and want to deploy MGS, MDS and OSS 
  on the same machine, is it officially supported ?
  I know that MGS and MDS can be put on the same machine, but not sure about 
  OSS and MDS.
  This will work, but if the node fails then there is no recovery for 
  operations in progress and the clients can get an IO error for operations 
  in progress.
 
 We mostly use this mode of operation and our experience is that after a
 machine crash* the nodes and heavy computing programs on them survive
 several hours of break.
 
 R.
 *The machines which crash are our aging Thumpers. We replace memory
 chips but we still do not know how to interpret the ILOM messages like:
 ID =  60c : 11/28/2013 : 16:39:08 : Memory : BIOS : Uncorrectable ECC Node 7 
 DIMM 1
 ID =  60b : 11/28/2013 : 16:39:08 : Memory : BIOS : Uncorrectable ECC Node 7 
 DIMM 0

These messages mean the ECC on Memory is failing and has returned a read
or possibly a write that was incorrect at the HW level.  Some firmware
will reboot you systems on such an event as to protect the system. This
is not healthy for the system. 


 Thumpers have only two nodes with four memory chips in each. The crashes
 are rare though so we cannot test various hypotheses easily.

Thanks,
 Keith 

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Which NID to use?

2014-03-03 Thread Keith Mannthey
Patrick,

The current manual also has the same language it is I way the system has
been designed. 


The order of LNET entries is important when configuring servers. If a
server node can be reached using more than one network, the first
network specified in lustre.conf will be used.


(Link to Lustre 2.x manual)
https://wiki.hpdd.intel.com/display/PUB/Documentation


Have you considered using Ethernet bonding?

What are you trying to accomplish with the dual Links between all the
systems? 

Thanks,
 Keith Mannthey
 Intel HPDD 


On Sun, 2014-03-02 at 08:26 +0800, Chan Ching Yu, Patrick wrote:
 Hi White,
 
 tcp0(eth0) and tcp1(eth1) are connected to different segment.
 (connected to two virtual bridges in KVM)
 
  
 
 Hi all,
 
 In old Lustre manual (version 1.8), I found that the order of LNET
 in /etc/modprobe/lustre.conf does matter:
 
 (Quoted in
 https://wiki.lustre.org/manual/LustreManual18_HTML/MoreComplicatedConfigurations.html)
 
 The order of LNET lines in modprobe.conf is important when
 configuring multi-homed servers. If a server node can be reached using
 more than one network, the first network specified in modprobe.conf
 will be used.
 
 That makes me more confused. Someone told me the order doesn't matter,
 the file just list all the available LNET devices to use.
 
 Does the order does matter ONLY in old version of Lustre?
 
  
 
 Regards,
 
 Patrick
 
  
 
  
 
  
 
 On Fri, 28 Feb 2014 21:20:58 +, White, Cliff wrote:
 
  On 2/28/14, 1:17 AM, Chan Ching Yu Patrick cyc...@clustertech.com
  wrote:
  
   Hi Mohr,
   
   The reason why I made this setup is I'm not sure how Lustre selects the
   interface in mult-rail environment.
   
   Especially when all node have Infiniband and Ethernet, how can I ensure
   Infiniband is used between client and OSS?
  
  The LNET Œnetworks¹ option is used to specify by interface. For example,
  where your Infiniband interface is Œib0¹ you would
  add this to your modprobe.conf  or equivalent:
  ‹‹‹
  options lnet networks=o2ib0(ib0)²
  ‹‹
  
  That will define IB (the interface denoted by ib0 to be specific).  Client
  mounts using @o2ib0 NIDS will only use IB,regardless of other interfaces
  present. 
  See the Lustre manual for details on the LNET Œnetworks¹ option.
  
  In your case, I would suspect that the two TCP/IP interfaces are
  equivalent in TCP/IP routing terms, perhaps on the same segment.
  When that happens TCP/IP routing is taking over. Basically, you can
  control which interface you send from, but if the receiver sees two equal
  TCP/IP paths back, you can¹t control which path it chooses to take. Has
  nothing to do with LNET or Lustre.
  
  In the case where the network hardware is dissimilar, you don¹t have this
  problem. Connections starting on IB stay on IB.
  If you only have one IB network, using the IB NID will ensure all clients
  use only IB. 
  
  cliffw
  
   
   
   Regards,
   Patrick
   
   
   
   On 02/27/2014 12:28 PM, Mohr Jr, Richard Frank (Rick Mohr) wrote:
On Feb 26, 2014, at 7:14 PM, Chan Ching Yu,
Patrickcyc...@clustertech.com
wrote:

 [root@mds1 ~]# lctl list_nids
 192.168.122.240@tcp
 192.168.100.100@tcp1
 
 [root@oss1 ~]# lctl list_nids
 192.168.122.194@tcp
 192.168.100.101@tcp1
 
 [root@client ~]# lctl list_nids
 192.168.122.70@tcp
 192.168.100.102@tcp1
 

 On Lustre client, I intentionally mount it with tcp1
 
 [root@client ~]# mount | grep lustre
 192.168.100.100@tcp1:/data on /lustre type lustre (rw)
 
 
 Now I dd a file on Lustre filesystem, you can see that tcp0 is used
 when writing on OST.
 Why?
I am not an expert on the inner workings of lustre, but as far as I
understand it, when oss1 connects to the mgs, it will report the nids it
has available.  When the client connects to mgs to get info about the
oss1 server, it will receive a list of all the oss1 nids.  The client
then steps through that list and compares the oss1 nids with its local
nids to find a match (i.e. - nids that are on the same lnet network).
If it matches tcp0 first, then that is the connection it uses.  The lnet
network used to connect to the mgs is irrelevant at that point.
However, I do not know if there are any guarantees about the ordering of
the nids that the mgs will report (ie - will tcp0 always be the first
nid?).

If there is an error in my description, hopefully a lustre developer
will point out the flaw.

It is not clear what you are trying to accomplish with this multi rail
setup.  Are you trying to force mds traffic over one client link and oss
traffic over the other?  Or are you trying to utilize both links
simultaneously for all traffic?

   
   
   ___
   Lustre-discuss mailing list
   Lustre-discuss@lists.lustre.org
   http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] where to download lustre for 32 bit servers

2013-10-14 Thread Keith Mannthey
On Mon, 2013-10-14 at 21:48 +, Weilin Chang wrote:
 Dilger:
 
 Thank you for replying my email.
 
 The latest releases only have kernel patches for 64 bit Linux. Where
 can I download a Lustre release which has kernel patches for a 32 bit
 Linux kernel?

32 bit and different arches are handled by config changes not by code
change.  I don't know much about Arm, have you ported much code to the
arch? I would assume it is not trivial work to get Lustre on 32 bit Arm
but likely newer Lustre will be easier than older Lustre.

You may want to look at the Linux Kernel staging tree and see if you can
get the Lustre Client working in your environment.  You will likely
learn alot about the process and the challenges if you can accomplish
this.  

Thanks,
  Keith 


 


 
 -Weilin
 
 -Original Message-
 From: Dilger, Andreas [mailto:andreas.dil...@intel.com] 
 Sent: Saturday, October 12, 2013 12:23 AM
 To: Weilin Chang
 Cc: lustre-discuss@lists.lustre.org; Weilin Chang
 Subject: Re: [Lustre-discuss] where to download lustre for 32 bit servers
 
 On 2013-10-11, at 17:59, Weilin Chang 
 weilin.ch...@huawei.commailto:weilin.ch...@huawei.com wrote:
 
 I like to configure Lustre Serever on a 32 bit ARM system. Where can I 
 download prebuilt binaries packages and its corresponding sources?
 I tried rpm files under 
 http://downloads.lustre.org/public/lustre/v1.8/lustre_1.8.5/rhel5-i686/  on 
 linux 2.6.18-194.el5, but there are some unknown symbol, like 
 ldiskfs_free_block, ldiskfs_journal_start_sb, ... in fsfilt_ldiskfs.ko.
 
 You are missing the lustre-ldiskfs package.
 
 That said, the lustre.orghttp://lustre.org site only has very ancient 
 versions of Lustre (for reasons too complex to discuss here). You should go 
 to downloads.hpdd.intel.comhttp://downloads.hpdd.intel.com for new versions 
 of Lustre, either 2.1.6 or 2.4.1.
 
 Does  anyone know where to get the complete package and which linux kernel 
 version will match to the package?
 
 There are no pre-built Arm binaries, and I don't know if anyone has ever 
 tried that. The newer versions of Lustre are more likely to build against a 
 newer kernel as is needed for Arm, and any build fixes would only go into the 
 new releases, so that is probably where you want to start.
 
 If you do decide to work on getting builds for Arm please see:
 https://wiki.hpdd.intel.com/display/PUB/Submitting+Changes
 
 For how to submit patches to be accepted into the tree.
 
 Cheers, Andreas
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss


___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] OSS panicing.....

2013-08-07 Thread Keith Mannthey
On Tue, 2013-08-06 at 14:22 +0100, Phill Harvey-Smith wrote:
 Hi all,
 
 Our OSS has started panicing in the last couple of days, it seems to be 
 related to nfs4, but not sure so asking the group for pointers.
 
 Fistly a couple of screen grabs are at :
 
 http://penguin.stats.warwick.ac.uk/~stsxab/Lustre/

It looks like a nfsd4 error in the backtrace. You should look into the
nfs side of your setup. It likely has nothing to do with Lustre (outside
of the kernel you are running) If this is a new install it may not like
the NFS userspace you have with the kernel you are using but that is
just a wild guess. 

Thanks,
  Keith Mannthey 
  
 
 The OSS server is currently running Ubuntu 10.04 LTS with an alien 
 (redhat I believe) kernel installed.
 
 The running kernel is :
 
 2.6.32-131.6.1.el6_lustre.g65156ed.x86_64
 
 I believe that it is running lustre 1.6.x. The MDS is also setup in a 
 similar manner.
 
 The clients are a mixture of Ubuntu 10.04 LTS with Lustre 1.6.x and the 
 3 most recent nodes are Ubuntu 12.04 LTS with Lustre 2.5.x which I built 
 recently.
 
 The OSS has 2 raid arrays, one on the onboard SAS controller which has 
 two of the Lustre volumes (/home and /scratch), along with the NFS 
 exported file system, on a separate XFS partition. The second raid array 
 is on an external PCIE Raid controler, and an external disk array and 
 holds the other Lustre filesystem on two virtual disks.
 
 The OSS also has a couple of NFS4 shares :
 
 /export 
 192.168.0.0/24(rw,async,fsid=0,crossmnt,no_root_squash,no_subtree_check) 
 192.168.1.0/24(rw,sync,fsid=0,no_root_squash,crossmnt,no_subtree_check)
 
 /export/software/packages-x86_64-linux-gnu 
 192.168.0.0/24(rw,async,no_subtree_check,no_root_squash)
 
 Which are on a separate disk.
 
 If I disable the NFS shares then the OSS server seems to stay up and 
 client machines can access the lustre file systems. But once I enable 
 the NFS shares the OSS will panic within a few minutes, this is why I 
 suspect some interaction with NFS.
 
 The odd thing is the machine only started doing this yesterday, I have 
 replaced / re-seated the RAM, CPUs and cards (Ethernet  SAS), but this 
 doesn't seem to have changed anything.
 
 I am aware that this setup is not a supported architecture (I inherited 
 custody of the cluster from a previous admin) and am planning on 
 re-installing both the OSS and MDS with (probably) CentOS, as that is 
 supported for the server. Is there anything I need to be aware of in 
 planning this upgrade ?
 
 Does anyone have any clue as to what I might try, is there an easy way I 
 can check the integrity of the Lustre volumes ?
 
 Cheers.
 
 Phill.
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss


___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss