Re: [Lustre-discuss] Lustre and Sync IO
What do you mean by sync IO? Thanks, Keith On Thu, 2014-06-12 at 15:46 +0200, Andrew Holway wrote: Hi, Can someone give me the story on Lustre and sync IO? Thanks, Andrew ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] module dependecies
Michael, LNET configuration is done at module load time. I don't see any way you will you not have to bring down parts of the FS to adjust the system at this level. There is some Dynamics Lnet Config work on its way that might allow such a switch but not in todays code. Thanks, Keith On Thu, 2014-05-15 at 15:13 +, Hebenstreit, Michael wrote: Please do not ask why, but I need to be able to replace the IB stack (as in – all InfiniBand modules) at runtime with a Lustre FS mounted. Is there a possibility to tell lnet to completely switch to tcp, unload ko2iblnd.ko and next unload the IB stack, load the new IB stack, load a matching ko2iblnd.ko and then switch lnet back to preferring o2ib? Thanks Michael Michael Hebenstreit Senior Cluster Architect Intel Corporation, MS: RR1-105/H14 Software and Services Group/DCE 4100 Sara Road Tel.: +1 505-794-3144 Rio Rancho, NM 87124 UNITED STATES E-mail: michael.hebenstr...@intel.com ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] module dependecies
Extra Note: If you have Dual Attached Storage and are setup for Imperative Failover you can failover targets and fixup your Servers one at a time without impacting access. Thanks, Keith On Thu, 2014-05-15 at 08:24 -0700, Keith Mannthey wrote: Michael, LNET configuration is done at module load time. I don't see any way you will you not have to bring down parts of the FS to adjust the system at this level. There is some Dynamics Lnet Config work on its way that might allow such a switch but not in todays code. Thanks, Keith On Thu, 2014-05-15 at 15:13 +, Hebenstreit, Michael wrote: Please do not ask why, but I need to be able to replace the IB stack (as in – all InfiniBand modules) at runtime with a Lustre FS mounted. Is there a possibility to tell lnet to completely switch to tcp, unload ko2iblnd.ko and next unload the IB stack, load the new IB stack, load a matching ko2iblnd.ko and then switch lnet back to preferring o2ib? Thanks Michael Michael Hebenstreit Senior Cluster Architect Intel Corporation, MS: RR1-105/H14 Software and Services Group/DCE 4100 Sara Road Tel.: +1 505-794-3144 Rio Rancho, NM 87124 UNITED STATES E-mail: michael.hebenstr...@intel.com ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Lustre Build - Ubuntu 14.04 LTS
I recommended you use Lustre 2.5+ for modern Linux kernels. A large amount of build changes would be need to be ported to run Lustre 2.3 version against your kernel version. Thanks, Keith On Thu, 2014-05-01 at 15:03 -0700, Steven Lokie wrote: Trying to run a specific version of lustre client for our setup at work - I am running into a weird error message on ./configure personally never seen this error before - checking for external module build support... configure: error: unknown; check config.log for details GIT BUILD: git clone git://git.whamcloud.com/fs/lustre-release.git cd lustre-release/ git checkout --track -b b2_3 origin/b2_3 sh ./autogen.sh ./configure --disable-server Log: root@linux-desktop:/home/imemadmin/lustre-release# ./configure --without-server checking build system type... x86_64-unknown-linux-gnu checking host system type... x86_64-unknown-linux-gnu checking target system type... x86_64-unknown-linux-gnu checking for a BSD-compatible install... /usr/bin/install -c checking whether build environment is sane... yes checking for gawk... no checking for mawk... mawk checking whether make sets $(MAKE)... yes checking how to create a ustar tar archive... gnutar checking for gcc... gcc checking whether the C compiler works... yes checking for C compiler default output file name... a.out checking for suffix of executables... checking whether we are cross compiling... no checking for suffix of object files... o checking whether we are using the GNU C compiler... yes checking whether gcc accepts -g... yes checking for gcc option to accept ISO C89... none needed checking for style of include used by make... GNU checking dependency style of gcc... gcc3 checking how to run the C preprocessor... gcc -E checking for grep that handles long lines and -e... /bin/grep checking for egrep... /bin/grep -E checking for ANSI C header files... yes checking for sys/types.h... yes checking for sys/stat.h... yes checking for stdlib.h... yes checking for string.h... yes checking for memory.h... yes checking for strings.h... yes checking for inttypes.h... yes checking for stdint.h... yes checking for unistd.h... yes checking whether to configure just enough for make dist... no checking if this distro uses dpkg... yes checking for buildid... none... congratulations, you must be on a tag checking whether to build BGL features... no checking for ranlib... ranlib checking for buggy compiler... no known problems checking size of unsigned long long... 8 --- size SIZEOF --- size SIZEOF 8 checking whether to enable uoss... no checking whether to enable posix osd... no checking whether to build docs... no checking whether to build utilities... yes checking whether to install init scripts... no checking whether to build Lustre tests... yes checking whether to build Lustre server support... yes checking whether to build Lustre client support... yes checking whether to enable split support... no checking whether to enable CDEBUG, CWARN... yes checking whether to enable ENTRY/EXIT... yes checking whether to enable LASSERT, LASSERTF... yes checking sys/quota.h usability... yes checking sys/quota.h presence... yes checking for sys/quota.h... yes checking whether to build kernel modules... yes (linux-gnu) /usr/src/linux-headers-3.13.0-24-generic /usr/src/linux-headers-3.13.0-24-generic checking for Linux sources... /lib/modules/3.13.0-24-generic/build checking for /lib/modules/3.13.0-24-generic/build... yes checking for Linux objects dir... /lib/modules/3.13.0-24-generic/build checking for /boot/kernel.h... no checking for /var/adm/running-kernel.h... no checking for /lib/modules/3.13.0-24-generic/build/.config... yes checking for /lib/modules/3.13.0-24-generic/build/include/generated/autoconf.h... yes checking for /lib/modules/3.13.0-24-generic/build/include/linux/version.h... yes checking for /lib/modules/3.13.0-24-generic/build/include/linux/kconfig.h... yes checking if you are running user mode linux for x86_64... no (asm-um missing) checking for /lib/modules/3.13.0-24-generic/build/include/linux/namei.h... yes checking if you are using Linux 2.6... yes checking for external module build support... configure: error: unknown; check config.log for details ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Is it safe to run MDS, MGS OSS on the same machine ?
On Wed, 2014-03-05 at 10:24 +0100, Rafal Maszkowski wrote: On Tue, Mar 04, 2014 at 10:55:05PM +, Dilger, Andreas wrote: On 2014/03/04, 2:38 AM, 邓尧 tors...@gmail.commailto:tors...@gmail.com wrote: We're running low on physical machines, and want to deploy MGS, MDS and OSS on the same machine, is it officially supported ? I know that MGS and MDS can be put on the same machine, but not sure about OSS and MDS. This will work, but if the node fails then there is no recovery for operations in progress and the clients can get an IO error for operations in progress. We mostly use this mode of operation and our experience is that after a machine crash* the nodes and heavy computing programs on them survive several hours of break. R. *The machines which crash are our aging Thumpers. We replace memory chips but we still do not know how to interpret the ILOM messages like: ID = 60c : 11/28/2013 : 16:39:08 : Memory : BIOS : Uncorrectable ECC Node 7 DIMM 1 ID = 60b : 11/28/2013 : 16:39:08 : Memory : BIOS : Uncorrectable ECC Node 7 DIMM 0 These messages mean the ECC on Memory is failing and has returned a read or possibly a write that was incorrect at the HW level. Some firmware will reboot you systems on such an event as to protect the system. This is not healthy for the system. Thumpers have only two nodes with four memory chips in each. The crashes are rare though so we cannot test various hypotheses easily. Thanks, Keith ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Which NID to use?
Patrick, The current manual also has the same language it is I way the system has been designed. The order of LNET entries is important when configuring servers. If a server node can be reached using more than one network, the first network specified in lustre.conf will be used. (Link to Lustre 2.x manual) https://wiki.hpdd.intel.com/display/PUB/Documentation Have you considered using Ethernet bonding? What are you trying to accomplish with the dual Links between all the systems? Thanks, Keith Mannthey Intel HPDD On Sun, 2014-03-02 at 08:26 +0800, Chan Ching Yu, Patrick wrote: Hi White, tcp0(eth0) and tcp1(eth1) are connected to different segment. (connected to two virtual bridges in KVM) Hi all, In old Lustre manual (version 1.8), I found that the order of LNET in /etc/modprobe/lustre.conf does matter: (Quoted in https://wiki.lustre.org/manual/LustreManual18_HTML/MoreComplicatedConfigurations.html) The order of LNET lines in modprobe.conf is important when configuring multi-homed servers. If a server node can be reached using more than one network, the first network specified in modprobe.conf will be used. That makes me more confused. Someone told me the order doesn't matter, the file just list all the available LNET devices to use. Does the order does matter ONLY in old version of Lustre? Regards, Patrick On Fri, 28 Feb 2014 21:20:58 +, White, Cliff wrote: On 2/28/14, 1:17 AM, Chan Ching Yu Patrick cyc...@clustertech.com wrote: Hi Mohr, The reason why I made this setup is I'm not sure how Lustre selects the interface in mult-rail environment. Especially when all node have Infiniband and Ethernet, how can I ensure Infiniband is used between client and OSS? The LNET Œnetworks¹ option is used to specify by interface. For example, where your Infiniband interface is Œib0¹ you would add this to your modprobe.conf or equivalent: ‹‹‹ options lnet networks=o2ib0(ib0)² ‹‹ That will define IB (the interface denoted by ib0 to be specific). Client mounts using @o2ib0 NIDS will only use IB,regardless of other interfaces present. See the Lustre manual for details on the LNET Œnetworks¹ option. In your case, I would suspect that the two TCP/IP interfaces are equivalent in TCP/IP routing terms, perhaps on the same segment. When that happens TCP/IP routing is taking over. Basically, you can control which interface you send from, but if the receiver sees two equal TCP/IP paths back, you can¹t control which path it chooses to take. Has nothing to do with LNET or Lustre. In the case where the network hardware is dissimilar, you don¹t have this problem. Connections starting on IB stay on IB. If you only have one IB network, using the IB NID will ensure all clients use only IB. cliffw Regards, Patrick On 02/27/2014 12:28 PM, Mohr Jr, Richard Frank (Rick Mohr) wrote: On Feb 26, 2014, at 7:14 PM, Chan Ching Yu, Patrickcyc...@clustertech.com wrote: [root@mds1 ~]# lctl list_nids 192.168.122.240@tcp 192.168.100.100@tcp1 [root@oss1 ~]# lctl list_nids 192.168.122.194@tcp 192.168.100.101@tcp1 [root@client ~]# lctl list_nids 192.168.122.70@tcp 192.168.100.102@tcp1 On Lustre client, I intentionally mount it with tcp1 [root@client ~]# mount | grep lustre 192.168.100.100@tcp1:/data on /lustre type lustre (rw) Now I dd a file on Lustre filesystem, you can see that tcp0 is used when writing on OST. Why? I am not an expert on the inner workings of lustre, but as far as I understand it, when oss1 connects to the mgs, it will report the nids it has available. When the client connects to mgs to get info about the oss1 server, it will receive a list of all the oss1 nids. The client then steps through that list and compares the oss1 nids with its local nids to find a match (i.e. - nids that are on the same lnet network). If it matches tcp0 first, then that is the connection it uses. The lnet network used to connect to the mgs is irrelevant at that point. However, I do not know if there are any guarantees about the ordering of the nids that the mgs will report (ie - will tcp0 always be the first nid?). If there is an error in my description, hopefully a lustre developer will point out the flaw. It is not clear what you are trying to accomplish with this multi rail setup. Are you trying to force mds traffic over one client link and oss traffic over the other? Or are you trying to utilize both links simultaneously for all traffic? ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] where to download lustre for 32 bit servers
On Mon, 2013-10-14 at 21:48 +, Weilin Chang wrote: Dilger: Thank you for replying my email. The latest releases only have kernel patches for 64 bit Linux. Where can I download a Lustre release which has kernel patches for a 32 bit Linux kernel? 32 bit and different arches are handled by config changes not by code change. I don't know much about Arm, have you ported much code to the arch? I would assume it is not trivial work to get Lustre on 32 bit Arm but likely newer Lustre will be easier than older Lustre. You may want to look at the Linux Kernel staging tree and see if you can get the Lustre Client working in your environment. You will likely learn alot about the process and the challenges if you can accomplish this. Thanks, Keith -Weilin -Original Message- From: Dilger, Andreas [mailto:andreas.dil...@intel.com] Sent: Saturday, October 12, 2013 12:23 AM To: Weilin Chang Cc: lustre-discuss@lists.lustre.org; Weilin Chang Subject: Re: [Lustre-discuss] where to download lustre for 32 bit servers On 2013-10-11, at 17:59, Weilin Chang weilin.ch...@huawei.commailto:weilin.ch...@huawei.com wrote: I like to configure Lustre Serever on a 32 bit ARM system. Where can I download prebuilt binaries packages and its corresponding sources? I tried rpm files under http://downloads.lustre.org/public/lustre/v1.8/lustre_1.8.5/rhel5-i686/ on linux 2.6.18-194.el5, but there are some unknown symbol, like ldiskfs_free_block, ldiskfs_journal_start_sb, ... in fsfilt_ldiskfs.ko. You are missing the lustre-ldiskfs package. That said, the lustre.orghttp://lustre.org site only has very ancient versions of Lustre (for reasons too complex to discuss here). You should go to downloads.hpdd.intel.comhttp://downloads.hpdd.intel.com for new versions of Lustre, either 2.1.6 or 2.4.1. Does anyone know where to get the complete package and which linux kernel version will match to the package? There are no pre-built Arm binaries, and I don't know if anyone has ever tried that. The newer versions of Lustre are more likely to build against a newer kernel as is needed for Arm, and any build fixes would only go into the new releases, so that is probably where you want to start. If you do decide to work on getting builds for Arm please see: https://wiki.hpdd.intel.com/display/PUB/Submitting+Changes For how to submit patches to be accepted into the tree. Cheers, Andreas ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] OSS panicing.....
On Tue, 2013-08-06 at 14:22 +0100, Phill Harvey-Smith wrote: Hi all, Our OSS has started panicing in the last couple of days, it seems to be related to nfs4, but not sure so asking the group for pointers. Fistly a couple of screen grabs are at : http://penguin.stats.warwick.ac.uk/~stsxab/Lustre/ It looks like a nfsd4 error in the backtrace. You should look into the nfs side of your setup. It likely has nothing to do with Lustre (outside of the kernel you are running) If this is a new install it may not like the NFS userspace you have with the kernel you are using but that is just a wild guess. Thanks, Keith Mannthey The OSS server is currently running Ubuntu 10.04 LTS with an alien (redhat I believe) kernel installed. The running kernel is : 2.6.32-131.6.1.el6_lustre.g65156ed.x86_64 I believe that it is running lustre 1.6.x. The MDS is also setup in a similar manner. The clients are a mixture of Ubuntu 10.04 LTS with Lustre 1.6.x and the 3 most recent nodes are Ubuntu 12.04 LTS with Lustre 2.5.x which I built recently. The OSS has 2 raid arrays, one on the onboard SAS controller which has two of the Lustre volumes (/home and /scratch), along with the NFS exported file system, on a separate XFS partition. The second raid array is on an external PCIE Raid controler, and an external disk array and holds the other Lustre filesystem on two virtual disks. The OSS also has a couple of NFS4 shares : /export 192.168.0.0/24(rw,async,fsid=0,crossmnt,no_root_squash,no_subtree_check) 192.168.1.0/24(rw,sync,fsid=0,no_root_squash,crossmnt,no_subtree_check) /export/software/packages-x86_64-linux-gnu 192.168.0.0/24(rw,async,no_subtree_check,no_root_squash) Which are on a separate disk. If I disable the NFS shares then the OSS server seems to stay up and client machines can access the lustre file systems. But once I enable the NFS shares the OSS will panic within a few minutes, this is why I suspect some interaction with NFS. The odd thing is the machine only started doing this yesterday, I have replaced / re-seated the RAM, CPUs and cards (Ethernet SAS), but this doesn't seem to have changed anything. I am aware that this setup is not a supported architecture (I inherited custody of the cluster from a previous admin) and am planning on re-installing both the OSS and MDS with (probably) CentOS, as that is supported for the server. Is there anything I need to be aware of in planning this upgrade ? Does anyone have any clue as to what I might try, is there an easy way I can check the integrity of the Lustre volumes ? Cheers. Phill. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss