[Lustre-discuss] problem with installing lustre and ofed

2012-12-28 Thread Jason Brooks
Hello,

I am having trouble installing the server modules for  lustre 2.1.4 and use 
mellanox's OFED distribution so we may use infiniband.  Would you folks look at 
my procedure and results below and let me know what you think?  Thanks very 
much!

The mellanox ofed installation builds and installs some kernel modules too, so 
I used this method to ensure OFED compiled against the correct kernel.  This is 
on centos 6.3.

 1.  download all lustre rpms from whamcloud
 2.  install kernel, kernel-firmware, kernel-headers, and kernel-devel
*   in this case, it's the rpm files with 
"2.6.32-279.14.1.el6_lustre.x86_64" in their name
 3.  reboot into this lustre kernel
 4.  install the remaining rpms
 5.  download ofed from mellanox 
"MLNX_OFED_LINUX-1.5.3-3.1.0-rhel6.3-x86_64.iso"
*   build mellanox ofed bits using the lustre kernel and kernel-devel info
*   install mellanox ofed
 6.  reboot
 7.  upon reboot, if I do NOT have o2ib3 in my lnet networks parameters, I can 
modprobe lnet and lustre.
 8.  if I DO have o2ib3 present in the lnet parameters, running modprobe lustre 
gets me:

ib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/fld.ko): 
Input/output error
WARNING: Error inserting fid 
(/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/fid.ko):
 Input/output error
WARNING: Error inserting mdc 
(/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/mdc.ko):
 Input/output error
WARNING: Error inserting osc 
(/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/osc.ko):
 Input/output error
WARNING: Error inserting lov 
(/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/lov.ko):
 Input/output error
FATAL: Error inserting lustre 
(/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/lustre.ko):
 Input/output error


dmesg shows:
ko2iblnd: disagrees about version of symbol ib_fmr_pool_unmap
ko2iblnd: Unknown symbol ib_fmr_pool_unmap
ko2iblnd: disagrees about version of symbol ib_create_cq
ko2iblnd: Unknown symbol ib_create_cq
…





___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] problem with installing lustre and OFED

2013-01-02 Thread Ms. Megan Larko
Greetings Jason,

As you have most likely discovered, Mellanox (MLNX) needs to be built
into the lustre linux kernel to use InfiniBand.

I worked on such an issue recently.   The Whamcloud linux kernel
2.1.2-2.6.32_220.17.1.el6_lustre would not work with our Mellanox
InfiniBand (IB) drivers optimally.  We got the MLXN version 1.8.5 to
match our Mellanox hardware and had to do the dance already described
to you in this list of...
1.   downloading all of the appropriate (Whamcloud) lustre linux
kernels, header and devel rpms
2.   boot into the lustre kernel
3.   in our /usr/src/lustre-2.1.2 directory built lustre against the
Mellanox "Module.symvers" information (which is why you see the
"Input/Output" errors on fid.ko, mdc.ko, osc.ko, lov.ko and because of
the aforementioned items, the lustre.ko.   The MLNX version 1.8.5 that
we needed was in the /usr/src/ofa_kernel directory (with the
Module.symvers etc)  We used the defaults other than the o2ib so
our command in the /usr/src/lustre-2.1.2 directory looked like
"./configure --with-o2ib=/usr/src/ofa_kernel"
4.   next we issued "make"
5.   next we chose to run a "make rpms" command so that we could have
rpms for our system for cluster re-building

We had to do this for *both* our lustre servers and lustre clients
(using the lustre-client Whamcloud kernel, headers, ...   So we had
the servers and the clients communicating properly over the MLNX ib
fabric.

In /etc/modprobe.d  we used a lustre.conf file to explicitly direct
the system to use the o2ib network when starting lustre at boot.

Without the above actions the ko2iblnd would not load.

Just confirming that you need to build Mellanox on servers and clients
to use MLNX IB with Lustre cluster file system.

megan
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] problem with installing lustre and ofed

2012-12-28 Thread Jeff Johnson
Jason,

The prebuilt server-side Lustre packages from Whamcloud are built 
against RHEL/CentOS kernel sources with kernel-ib active in them. This 
means that any of the Lustre prebuilt server packages are already tied 
to RHEL's kernel-ib.

To accomplish your stated goal you'll have to start with a non 
Whamcloud, stock kernel (plus headers, devel, etc). Then compile/install 
the OFED version of your choice. Once you have that you can build Lustre 
from source where it will compile against OFED and the installed kernel.

--Jeff

---
Jeff Johnson
Co-Founder
Aeon Computing

jeff.john...@aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x101   f: 858-412-3845

4170 Morena Boulevard, Suite D - San Diego, CA 92117

/* Follow us on Twitter - @AeonComputing */




On 12/28/12 3:54 PM, Jason Brooks wrote:
> Hello,
>
> I am having trouble installing the server modules for lustre 2.1.4 and 
> use mellanox's OFED distribution so we may use infiniband. Would you 
> folks look at my procedure and results below and let me know what you 
> think? Thanks very much!
>
> The mellanox ofed installation builds and installs some kernel modules 
> too, so I used this method to ensure OFED compiled against the correct 
> kernel. This is on centos 6.3.
>
>  1. download all lustre rpms from whamcloud
>  2. install kernel, kernel-firmware, kernel-headers, and kernel-devel
>  1. in this case, it's the rpm files with
> "2.6.32-279.14.1.el6_lustre.x86_64" in their name
>  3. reboot into this lustre kernel
>  4. install the remaining rpms
>  5. download ofed from mellanox
> "MLNX_OFED_LINUX-1.5.3-3.1.0-rhel6.3-x86_64.iso"
>  1. build mellanox ofed bits using the lustre kernel and
> kernel-devel info
>  2. install mellanox ofed
>  6. reboot
>  7. upon reboot, if I do NOT have o2ib3 in my lnet networks
> parameters, I can modprobe lnet and lustre.
>  8. if I DO have o2ib3 present in the lnet parameters, running
> modprobe lustre gets me:
>
> ib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/fld.ko):
>  
> Input/output error
> WARNING: Error inserting fid 
> (/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/fid.ko):
>  
> Input/output error
> WARNING: Error inserting mdc 
> (/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/mdc.ko):
>  
> Input/output error
> WARNING: Error inserting osc 
> (/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/osc.ko):
>  
> Input/output error
> WARNING: Error inserting lov 
> (/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/lov.ko):
>  
> Input/output error
> FATAL: Error inserting lustre 
> (/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/lustre.ko):
>  
> Input/output error
>
>
> dmesg shows:
> ko2iblnd: disagrees about version of symbol ib_fmr_pool_unmap
> ko2iblnd: Unknown symbol ib_fmr_pool_unmap
> ko2iblnd: disagrees about version of symbol ib_create_cq
> ko2iblnd: Unknown symbol ib_create_cq
> …
>
>
>
>
>
>
>
> ___
> Lustre-discuss mailing list
> Lustre-discuss@lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] problem with installing lustre and ofed

2012-12-28 Thread Jason Brooks
Hello,

That's good to know kernel-ib comes with the lustre stock install.

What about the rest of the OFED tools?  I mean things like ibdiagnet,
ibstatus, etc?  (I will look at the contents of the other rpms and see
what I can learn)

On 12/28/12 4:45 PM, "Jeff Johnson"  wrote:

>Jason,
>
>The prebuilt server-side Lustre packages from Whamcloud are built
>against RHEL/CentOS kernel sources with kernel-ib active in them. This
>means that any of the Lustre prebuilt server packages are already tied
>to RHEL's kernel-ib.
>
>To accomplish your stated goal you'll have to start with a non
>Whamcloud, stock kernel (plus headers, devel, etc). Then compile/install
>the OFED version of your choice. Once you have that you can build Lustre
>from source where it will compile against OFED and the installed kernel.
>
>--Jeff
>
>---
>Jeff Johnson
>Co-Founder
>Aeon Computing
>
>jeff.john...@aeoncomputing.com
>www.aeoncomputing.com
>t: 858-412-3810 x101   f: 858-412-3845
>
>4170 Morena Boulevard, Suite D - San Diego, CA 92117
>
>/* Follow us on Twitter - @AeonComputing */
>
>
>
>
>On 12/28/12 3:54 PM, Jason Brooks wrote:
>> Hello,
>>
>> I am having trouble installing the server modules for lustre 2.1.4 and
>> use mellanox's OFED distribution so we may use infiniband. Would you
>> folks look at my procedure and results below and let me know what you
>> think? Thanks very much!
>>
>> The mellanox ofed installation builds and installs some kernel modules
>> too, so I used this method to ensure OFED compiled against the correct
>> kernel. This is on centos 6.3.
>>
>>  1. download all lustre rpms from whamcloud
>>  2. install kernel, kernel-firmware, kernel-headers, and kernel-devel
>>  1. in this case, it's the rpm files with
>> "2.6.32-279.14.1.el6_lustre.x86_64" in their name
>>  3. reboot into this lustre kernel
>>  4. install the remaining rpms
>>  5. download ofed from mellanox
>> "MLNX_OFED_LINUX-1.5.3-3.1.0-rhel6.3-x86_64.iso"
>>  1. build mellanox ofed bits using the lustre kernel and
>> kernel-devel info
>>  2. install mellanox ofed
>>  6. reboot
>>  7. upon reboot, if I do NOT have o2ib3 in my lnet networks
>> parameters, I can modprobe lnet and lustre.
>>  8. if I DO have o2ib3 present in the lnet parameters, running
>> modprobe lustre gets me:
>>
>> 
>>ib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/fld
>>.ko): 
>> Input/output error
>> WARNING: Error inserting fid
>> 
>>(/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/
>>fid.ko): 
>> Input/output error
>> WARNING: Error inserting mdc
>> 
>>(/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/
>>mdc.ko): 
>> Input/output error
>> WARNING: Error inserting osc
>> 
>>(/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/
>>osc.ko): 
>> Input/output error
>> WARNING: Error inserting lov
>> 
>>(/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/
>>lov.ko): 
>> Input/output error
>> FATAL: Error inserting lustre
>> 
>>(/lib/modules/2.6.32-279.14.1.el6_lustre.x86_64/updates/kernel/fs/lustre/
>>lustre.ko): 
>> Input/output error
>>
>>
>> dmesg shows:
>> ko2iblnd: disagrees about version of symbol ib_fmr_pool_unmap
>> ko2iblnd: Unknown symbol ib_fmr_pool_unmap
>> ko2iblnd: disagrees about version of symbol ib_create_cq
>> ko2iblnd: Unknown symbol ib_create_cq
>> Š
>>
>>
>>
>>
>>
>>
>>
>> ___
>> Lustre-discuss mailing list
>> Lustre-discuss@lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>___
>Lustre-discuss mailing list
>Lustre-discuss@lists.lustre.org
>http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] problem with installing lustre and ofed

2012-12-28 Thread Ken Hornstein
>That's good to know kernel-ib comes with the lustre stock install.
>
>What about the rest of the OFED tools?  I mean things like ibdiagnet,
>ibstatus, etc?  (I will look at the contents of the other rpms and see
>what I can learn)

I think Jeff missed a few steps.  If you want the _server-side_ packages,
what you need to do is:

- Install a Lustre-patched kernel, including devel packages (you can use
  the ones from Whamcloud if they're suitable).
- Build your OFED against that kernel & install it.
- Compile Lustre against the Lustre-patched kernel and the OFED.  This
  is the tricky part; you need to make sure to tell Lustre to link against
  the right OFED package.

There are Lustre build scripts that actually automate all of this; last
time I checked, they were only available in the git tree, NOT in the
source tarball.  Those build scripts are a bit of a pain to use, and I
find that I always have to tweak them a bit.  But once you figure them all
out it makes things easier.

Now as for the userspace utilities ... well, you need to make sure they're
not too far off from the kernel.  How far is "too far"?  Good question.
I don't think they're guaranteed to work when they don't match, but in my
limited experience minor version differences are ok.

--Ken
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] problem with installing lustre and ofed

2012-12-31 Thread Brian J. Murrell
On Fri, 2012-12-28 at 15:54 -0800, Jason Brooks wrote:
> Hello,

Hi,

> I am having trouble installing the server modules for  lustre 2.1.4
> and use mellanox's OFED distribution

Is there a particular need for the Mellanox OFED distribution?  The
Redhat EL 6 kernel comes stock with the inifiniband drivers and stack
already baked in and we leverage that and build our Lustre modules RPM
against it.

So unless there is something particular that you need that is only in
the Mellanox OFED distribution and is not already in EL6's kernels, you
should be able to just use the binary kernel and lustre-modules RPMs
that we supply and have working inifiniband support.

Cheers,
b.



___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] problem with installing lustre and ofed

2012-12-31 Thread Michael Shuey
RedHat's OFED tends to lag Mellanox's.  They're pretty current on
bugfixes, but support for the latest hardware is usually 3-6 months
behind - it took about 4 months to bring in drivers for our most
recent FDR system.  Also, support for Mellanox's advanced features
(e.g., MXM, FCA) is often missing.

--
Mike Shuey


On Mon, Dec 31, 2012 at 11:32 AM, Brian J. Murrell
 wrote:
> On Fri, 2012-12-28 at 15:54 -0800, Jason Brooks wrote:
>> Hello,
>
> Hi,
>
>> I am having trouble installing the server modules for  lustre 2.1.4
>> and use mellanox's OFED distribution
>
> Is there a particular need for the Mellanox OFED distribution?  The
> Redhat EL 6 kernel comes stock with the inifiniband drivers and stack
> already baked in and we leverage that and build our Lustre modules RPM
> against it.
>
> So unless there is something particular that you need that is only in
> the Mellanox OFED distribution and is not already in EL6's kernels, you
> should be able to just use the binary kernel and lustre-modules RPMs
> that we supply and have working inifiniband support.
>
> Cheers,
> b.
>
>
>
> ___
> Lustre-discuss mailing list
> Lustre-discuss@lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss