Hello Ray,

One consideration first : You try the 2.7 version which is not the production one (aka 2.5). From this perspective wether you run 2.7.0 or 2.7.x won't make any big difference, it is the develpment release.

Then if I understand the problem comes from the infiniband driver module which is buggy in the 2.6.32-504.8.1 kernel, meaning that you have to update the kernel to fix it. Doing this may result that the 2.7.0 version on the site, compiled on an older kernel version, will refuse to load then. (because kernel modules - i.e the lustre ones here - relies on features that may change between different kernel version making it incompatible)

In any case you can try to rebuild the 2.7.0 version from the source to your new kernel. The procedure is quite easy :

https://wiki.hpdd.intel.com/display/PUB/Rebuilding+the+Lustre-client+rpms+for+a+new+kernel

It will regenerate the 2.7.0 client uppon your newer kernel with the working infinband modules, but the stability is not garanted as the 2.7 branch is under development anyway.

Or use a precompiled one on the build site if you can't (some nasty bugs in the base 2.x.0 version are fixed in the latest builds)

The only thing is to stick to the very same version on mds and oss and at least the same or newer version for the clients.

Regards

Le 03-12-2015 16:13, Ray Muno a écrit :
I am trying to set up a test deployment of Lustre 2.7.

I pulled RPMS from http://lustre.org/download/ and installed them on a
set of server running Scientific Linux 6.6 which seems to be a proper
OS for deployment.  Everything installs and I can format the
filesystems on the MDS (1) and OSS (2) servers. When I try and mount
the OST files systems, I get communication errors. I can "lctl ping"
the servers from each other, but cannot establish communication
between the MDS and OSS.

The installation is on servers connected over Infiniband (Qlogic DDR 4X).

In trying to diagnose the issues related to the error messages, I
found mention in some list discussions that o2ib is broken in the
2.6.32-504.8.1 kernel.

After much frustration, I pulled a nightly build from
build.hpdd.intel.com (kernel
2.6.32-573.8.1.el6_lustre.g8438f2a.x86_64) and tried the same set up.
Everything worked as I expected.

Am I missing something? Is the default release pointed to at
https://downloads.hpdd.intel.com/ for 2.7 broken in some way? Is it
just the hardware I am trying to deploy against?

I can provide specifics about the errors I see, I am just posting this
to make sure I am pulling the Lustre RPM's from the proper source.
_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to