Re: [lustre-discuss] Lustre 2.7 deployment issues
n errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information. Dec 3 18:21:53 athena-head kernel: Lustre: Unmounted ltest-client Dec 3 18:21:53 athena-head kernel: LustreError: 7346:0:(obd_mount.c:1339:lustre_fill_super()) Unable to mount (-5) On the server Dec 3 18:21:41 lustre-mds kernel: LNet: 1493:0:(o2iblnd_cb.c:2278:kiblnd_passive_connect()) Can't accept conn from 172.19.120.2@o2ib (version 12): max_frags 256 too large (32 wanted) On 12/04/2015 06:49 AM, jerome.be...@inserm.fr wrote: Hi, I honestly don't know if the compiled versions available here are meant to be used by everyone but they are publicly browsable on Intel Jenkins : https://build.hpdd.intel.com but as the source is publicly available from the whamcloud git, there imo might not be any problem If you are in production stick to the 2.5. Regards Le 04-12-2015 12:18, Jon Tegner a écrit : Hi, Where do you find the 2.7.x-releases? I thought fixes were only released for the Intel maintenance version? Regards, /jon On 12/04/2015 11:43 AM, jerome.be...@inserm.fr wrote: Hello Ray, One consideration first : You try the 2.7 version which is not the production one (aka 2.5). From this perspective wether you run 2.7.0 or 2.7.x won't make any big difference, it is the develpment release. Then if I understand the problem comes from the infiniband driver module which is buggy in the 2.6.32-504.8.1 kernel, meaning that you have to update the kernel to fix it. Doing this may result that the 2.7.0 version on the site, compiled on an older kernel version, will refuse to load then. (because kernel modules - i.e the lustre ones here - relies on features that may change between different kernel version making it incompatible) In any case you can try to rebuild the 2.7.0 version from the source to your new kernel. The procedure is quite easy : https://wiki.hpdd.intel.com/display/PUB/Rebuilding+the+Lustre-client+rpms+for+a+new+kernel It will regenerate the 2.7.0 client uppon your newer kernel with the working infinband modules, but the stability is not garanted as the 2.7 branch is under development anyway. Or use a precompiled one on the build site if you can't (some nasty bugs in the base 2.x.0 version are fixed in the latest builds) The only thing is to stick to the very same version on mds and oss and at least the same or newer version for the clients. Regards Le 03-12-2015 16:13, Ray Muno a écrit : I am trying to set up a test deployment of Lustre 2.7. I pulled RPMS from http://lustre.org/download/ and installed them on a set of server running Scientific Linux 6.6 which seems to be a proper OS for deployment. Everything installs and I can format the filesystems on the MDS (1) and OSS (2) servers. When I try and mount the OST files systems, I get communication errors. I can "lctl ping" the servers from each other, but cannot establish communication between the MDS and OSS. The installation is on servers connected over Infiniband (Qlogic DDR 4X). In trying to diagnose the issues related to the error messages, I found mention in some list discussions that o2ib is broken in the 2.6.32-504.8.1 kernel. After much frustration, I pulled a nightly build from build.hpdd.intel.com (kernel 2.6.32-573.8.1.el6_lustre.g8438f2a.x86_64) and tried the same set up. Everything worked as I expected. Am I missing something? Is the default release pointed to at https://downloads.hpdd.intel.com/ for 2.7 broken in some way? Is it just the hardware I am trying to deploy against? I can provide specifics about the errors I see, I am just posting this to make sure I am pulling the Lustre RPM's from the proper source. ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org -- Ray Muno Computer Systems Administrator e-mail: m...@aem.umn.edu Phone: (612) 625-9531 FAX: (612) 626-1558 University of Minnesota Aerospace Engineering and Mechanics Mechanical Engineering 110 Union St. S.E. 111 Church Street SE Minneapolis, MN 55455 Minneapolis, MN 55455 -- Ray Muno Computer Systems Administrator e-mail: m...@aem.umn.edu Phone: (612) 625-9531 FAX: (612) 626-1558 University of Minnesota Aerospace Engineering and Mechanics Mechanical Engineering 110 Union St. S.E. 111 Church Street SE Minneapolis, MN 55455
Re: [lustre-discuss] Lustre 2.7 deployment issues
Client was rebuilt locally from the source RPM's. I thought I had built it from the client source from the nightly build but I can see now it was the 2.7.0 source lustre-client-2.7.0-2.6.32_504.8.1.el6.x86_64.src.rpm Client kernel is the OS provided kernel. At this point I have ripped out all of the 2.7.0 based install and re-built everything with the current 2.5.3 pre-built RPMS for the server. The test client is RHEL 6.7 so I built the client locally against the current kernel. I can now mount the filesystem at least. On 12/04/2015 09:24 AM, jerome.be...@inserm.fr wrote: Ok, I am not using IB here but it looks obvious that the max_frag value differs between the MGS and the client. Do you use the same lustre version on the MGS/OSS AND the client built on the same Kernel version ? (ie lustre*-KERNEL_VERSION-LUSTRE_VERSION) Did you try it with the latest nightly build ? If so, i let developers answer or maybe you can open a bug Regards Le 04-12-2015 15:48, Ray Muno a écrit : As I mentioned, I am doing a test install to see what I want to run for deployment. We have run a couple Lustre installs, one 1.8.x based and a current production one that is 2.3. The Lustre 2.3 server set has been up for 750 days and has been very solid. This test replaces the old 1.8 setup and I need to come up with a consistent set of sever and clients that I can run on our clusters. The cluster (Rocks based) will get upgraded, most likely, once we have a working set. I have a set of compute nodes that will be set up to run either CentOS 6.6 or 6.7. I started with 2.7 since that is what I got pointed to when I went to the lustre.org download page. The "Most Recent Release" points me at the 2.7.0 tree. If I follow the path to download source on that page, git clone git://git.hpdd.intel.com/fs/lustre-release.git It is not even apparent from the downloaded tree which version I would be building. The Changelog file mentions 2.8 and 2.7. Everything on the Lustre Download page seems to indicate I should be downloading 2.7. Since I started with a clean install of a RHEL 6.6 on my server set, I had the expectation that that pre-compiled server binaries would give me a working set to test. That is when the frustration started. I tried searching for clues by looking at errors that I saw, but I did not find much that duplicated what I was seeing. I just saw some odd mentions about IB having issues in 2.6.32-504.8.1. This did not directly correlate with my issues but I figured I would try a later kernel. That is whey I pulled the nightly build off of build.hpdd.intel.com and found I could at least establish a set of servers that would talk to each other. That is where I am at now. I am trying to wrap my head around where my issues lie. Is the problem specific to my Qlogic InfiniPath_QLE7240 cards? Is it the underlying OS provided IB drivers? I guess I am just really surprised that the distribution pointed to on the download page, fails out of the box on a set of servers with a clean install of the specified OS. I just figured I must be doing something wrong (which may still be the case). At this point, it looks like I should be backing out 2.7 and build this with the current 2.5 release. Before I do that, however, I would like to gain some understanding as to what I am seeing right now. I have the server set built with 2.7.0 and the 2.6.32-573.8.1.el6_lustre.g8438f2a.x86_64 kernel on RHEL 6.6 (SL 6.6). I rebuilt the 2.7.0 Lustre client on a RHEL (CentOS) 6.6 client, and I could not mount the file system. It will mount my production Lustre file system from another server set (2.3.0) with out a problem. I also tried with a RHEL 6.7 install, with the 2.7 Lustre client rebuilt for the kernel (2.6.32-573.8.1.el6.x86_64). The client will not mount the 2.7 Lustre file system and I cannot even (lctl ping) the server from the client. On the client [root@athena-head ~]# lctl ping 172.19.120.29@o2ib failed to ping 172.19.120.29@o2ib: Input/output error In dmesg LNetError: 1444:0:(o2iblnd_cb.c:2649:kiblnd_rejected()) 172.19.120.29@o2ib rejected: incompatible # of RDMA fragments 32, 256 On the Lustre MDS server. Dec 3 18:14:08 lustre-mds kernel: LNet: 1493:0:(o2iblnd_cb.c:2278:kiblnd_passive_connect()) Can't accept conn from 172.19.120.2@o2ib (version 12): max_frags 256 too large (32 wanted) Trying to mount on the client [root@athena-head ~]# uname -a Linux athena-head.aem.umn.edu 2.6.32-573.8.1.el6.x86_64 #1 SMP Tue Nov 10 18:01:38 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux [root@athena-head ~]# mount -t lustre 172.19.120.29@o2ib:/ltest /ltest mount.lustre: mount 172.19.120.29@o2ib:/ltest at /ltest failed: Input/output error Is the MGS running? Dec 3 18:21:16 athena-head kernel: LNetError: 1444:0:(o2iblnd_cb.c:2649:kiblnd_rejected()) 172.19.120.29@o2ib rejected: incompatible # of RDMA fragments 32, 256 Dec 3 18:21:16 athena-head kernel: Lustre: 6091:0:(client.c:1939:ptlrpc_expire_o
[lustre-discuss] Lustre 2.7 deployment issues
I am trying to set up a test deployment of Lustre 2.7. I pulled RPMS from http://lustre.org/download/ and installed them on a set of server running Scientific Linux 6.6 which seems to be a proper OS for deployment. Everything installs and I can format the filesystems on the MDS (1) and OSS (2) servers. When I try and mount the OST files systems, I get communication errors. I can "lctl ping" the servers from each other, but cannot establish communication between the MDS and OSS. The installation is on servers connected over Infiniband (Qlogic DDR 4X). In trying to diagnose the issues related to the error messages, I found mention in some list discussions that o2ib is broken in the 2.6.32-504.8.1 kernel. After much frustration, I pulled a nightly build from build.hpdd.intel.com (kernel 2.6.32-573.8.1.el6_lustre.g8438f2a.x86_64) and tried the same set up. Everything worked as I expected. Am I missing something? Is the default release pointed to at https://downloads.hpdd.intel.com/ for 2.7 broken in some way? Is it just the hardware I am trying to deploy against? I can provide specifics about the errors I see, I am just posting this to make sure I am pulling the Lustre RPM's from the proper source. -- Ray Muno University of Minnesota Aerospace Engineering and Mechanics Mechanical Engineering 110 Union St. S.E. 111 Church Street SE Minneapolis, MN 55455 Minneapolis, MN 55455 ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[Lustre-discuss] New to Lustre, test install.
Now that I have located what I want to do a Lustre deployment test, I am running in to a few issues. (If there is a searchable archive for this mailing list, I would have started there. I only found it archived by date). I have a fresh install of CentOS 5.6. I installed Lustre from the pre-built RPM's available on Whamcloud's server. I followed the Walk-thru- Deploying a Lustre pre-built kernel which seems to be a bit out of date. There are some errors on this page relative to installation of Ldiskfs. The section seems to be an edited clone of the Lustre Modules section. http://wiki.whamcloud.com/display/PUB/Walk-thru-+Deploying+a+Lustre+pre-built+kernel From there I went to testing. http://wiki.whamcloud.com/display/PUB/Testing+a+Lustre+filesystem When I run the test suite, as indicated, I do not get very far. # /usr/lib64/lustre/tests/llmount.sh Stopping clients: nike-lustre-oss-0-0.local /mnt/lustre (opts:) Stopping clients: nike-lustre-oss-0-0.local /mnt/lustre2 (opts:) Loading modules from /usr/lib64/lustre/tests/.. lnet.debug=0x33f1504 lnet.subsystem_debug=0xffb7e3ff lnet options: 'networks=tcp0 accept=all' Formatting mgs, mds, osts Checking servers environments Checking clients nike-lustre-oss-0-0.local environments Setup mgs, mdt, osts Starting mds: -o loop /tmp/lustre-mdt /mnt/mds lnet.debug=0x33f1504 lnet.subsystem_debug=0xffb7e3ff lnet.debug_mb=24 error: set_param: writing to file /proc/sys/lnet/debug_mb: Invalid argument -Ray Muno University of Minnesota ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Where to download Lustre from since 01 Aug?
On 08/05/2011 05:20 PM, Anthony David wrote: On 08/05/2011 11:31 PM, Ray Muno wrote: I see a previous post regarding this. Since Oracle decommissioned the Sun Download Center, all links for downloading Lustre appear to go in circles, always bringing you back to a page that has no path to anything Lustre related. Does anyone have any insights as to where Oracle buried it? I too went around in circles following the Lustre download link. Anyone with a My Oracle Support account can download zipped bundles of RPMs from the Patches and Updates section. I had gone that route. At first attempt, I could not even locate it. Since you mentioned it was available on the My Oracle Support site. I tried again. I did manage to locate the RPM's for various distributions. Once selected, I am told I do not have sufficient privileges to download. I guess the Anyone with a My Oracle Support account does not apply. -- Ray Muno University of Minnesota ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] Where to download Lustre from since 01 Aug?
I see a previous post regarding this. Since Oracle decommissioned the Sun Download Center, all links for downloading Lustre appear to go in circles, always bringing you back to a page that has no path to anything Lustre related. Does anyone have any insights as to where Oracle buried it? ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss