Re: [lustre-discuss] Lustre rpm install creating a file that breaks lustre

2019-10-04 Thread Andreas Dilger
The "alias ko2iblnd-opa" line in ko2iblnd.conf doesn't do anything unless OPA 
(or older QLogic) interfaces are detected on your system via the 
/usr/sbin/ko2iblnd-probe script.

This indirection is used to allow better default module parameters between OPA 
and MLX devices, which can't easily be determined inside the kernel, and would 
be hard to change after the fact.  In theory, if there were substantially 
better ko2iblnd module parameters for new MLX or RoCE devices, then this same 
mechanism could be used to do similar interface-specific tunings in userspace.

You could run that script with the last line commented out (or replace 
"exec->echo" on the last line) to see what it is doing.

Cheers, Andreas

On Oct 2, 2019, at 12:55, Kurt Strosahl 
mailto:stros...@jlab.org>> wrote:



Good Afternoon,



While getting lustre 2.10.8 running on a RHEL 7.7 system I found that the 
RPM install was putting a file in /etc/modprobe.d that was preventing lnet from 
starting properly.



the file is ko2iblnd.conf, which contains the following...



alias ko2iblnd-opa ko2iblnd
options ko2iblnd-opa peer_credits=128 peer_credits_hiw=64 credits=1024 
concurrent_sends=256 ntx=2048 map_on_demand=32 fmr_pool_size=2048 
fmr_flush_trigger=512 fmr_cache=1 conns_per_peer=4



install ko2iblnd /usr/sbin/ko2iblnd-probe



Our system is running infiniband, not omnipath.  So I'm mot sure why this file 
is being put in place.  Removing the file allows lnet to start properly.

Cheers, Andreas
--
Andreas Dilger
Principal Lustre Architect
Whamcloud






___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Lustre rpm install creating a file that breaks lustre

2019-10-02 Thread Chris Horn
Anything in dmesg? We need to know _why_ the network failed to start.

Chris Horn

From: Kurt Strosahl 
Date: Wednesday, October 2, 2019 at 1:55 PM
To: Chris Horn , "lustre-discuss@lists.lustre.org" 

Subject: Re: [lustre-discuss] Lustre rpm install creating a file that breaks 
lustre

the lnet modules load, but when I start the lnet service it says that the 
network is down.  I backed everything out, removed the file, and then started 
the lnet service again and it worked properly.


From: Chris Horn 
Sent: Wednesday, October 2, 2019 2:48 PM
To: Kurt Strosahl ; lustre-discuss@lists.lustre.org 

Subject: [EXTERNAL] Re: [lustre-discuss] Lustre rpm install creating a file 
that breaks lustre


Might be best to open a ticket for this. What was the nature of the failure?



Chris Horn



From: lustre-discuss  on behalf of 
Kurt Strosahl 
Date: Wednesday, October 2, 2019 at 1:30 PM
To: "lustre-discuss@lists.lustre.org" 
Subject: [lustre-discuss] Lustre rpm install creating a file that breaks lustre



Good Afternoon,



While getting lustre 2.10.8 running on a RHEL 7.7 system I found that the 
RPM install was putting a file in /etc/modprobe.d that was preventing lnet from 
starting properly.



the file is ko2iblnd.conf, which contains the following...



alias ko2iblnd-opa ko2iblnd

options ko2iblnd-opa peer_credits=128 peer_credits_hiw=64 credits=1024 
concurrent_sends=256 ntx=2048 map_on_demand=32 fmr_pool_size=2048 
fmr_flush_trigger=512 fmr_cache=1 conns_per_peer=4



install ko2iblnd /usr/sbin/ko2iblnd-probe



Our system is running infiniband, not omnipath.  So I'm mot sure why this file 
is being put in place.  Removing the file allows lnet to start properly.



w/r,

Kurt J. Strosahl
System Administrator: Lustre, HPC
Scientific Computing Group, Thomas Jefferson National Accelerator Facility
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Lustre rpm install creating a file that breaks lustre

2019-10-02 Thread Kurt Strosahl
the lnet modules load, but when I start the lnet service it says that the 
network is down.  I backed everything out, removed the file, and then started 
the lnet service again and it worked properly.


From: Chris Horn 
Sent: Wednesday, October 2, 2019 2:48 PM
To: Kurt Strosahl ; lustre-discuss@lists.lustre.org 

Subject: [EXTERNAL] Re: [lustre-discuss] Lustre rpm install creating a file 
that breaks lustre


Might be best to open a ticket for this. What was the nature of the failure?



Chris Horn



From: lustre-discuss  on behalf of 
Kurt Strosahl 
Date: Wednesday, October 2, 2019 at 1:30 PM
To: "lustre-discuss@lists.lustre.org" 
Subject: [lustre-discuss] Lustre rpm install creating a file that breaks lustre



Good Afternoon,



While getting lustre 2.10.8 running on a RHEL 7.7 system I found that the 
RPM install was putting a file in /etc/modprobe.d that was preventing lnet from 
starting properly.



the file is ko2iblnd.conf, which contains the following...



alias ko2iblnd-opa ko2iblnd

options ko2iblnd-opa peer_credits=128 peer_credits_hiw=64 credits=1024 
concurrent_sends=256 ntx=2048 map_on_demand=32 fmr_pool_size=2048 
fmr_flush_trigger=512 fmr_cache=1 conns_per_peer=4



install ko2iblnd /usr/sbin/ko2iblnd-probe



Our system is running infiniband, not omnipath.  So I'm mot sure why this file 
is being put in place.  Removing the file allows lnet to start properly.



w/r,

Kurt J. Strosahl
System Administrator: Lustre, HPC
Scientific Computing Group, Thomas Jefferson National Accelerator Facility
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Lustre rpm install creating a file that breaks lustre

2019-10-02 Thread Chris Horn
Might be best to open a ticket for this. What was the nature of the failure?

Chris Horn

From: lustre-discuss  on behalf of 
Kurt Strosahl 
Date: Wednesday, October 2, 2019 at 1:30 PM
To: "lustre-discuss@lists.lustre.org" 
Subject: [lustre-discuss] Lustre rpm install creating a file that breaks lustre

Good Afternoon,

While getting lustre 2.10.8 running on a RHEL 7.7 system I found that the 
RPM install was putting a file in /etc/modprobe.d that was preventing lnet from 
starting properly.

the file is ko2iblnd.conf, which contains the following...

alias ko2iblnd-opa ko2iblnd
options ko2iblnd-opa peer_credits=128 peer_credits_hiw=64 credits=1024 
concurrent_sends=256 ntx=2048 map_on_demand=32 fmr_pool_size=2048 
fmr_flush_trigger=512 fmr_cache=1 conns_per_peer=4

install ko2iblnd /usr/sbin/ko2iblnd-probe

Our system is running infiniband, not omnipath.  So I'm mot sure why this file 
is being put in place.  Removing the file allows lnet to start properly.

w/r,

Kurt J. Strosahl
System Administrator: Lustre, HPC
Scientific Computing Group, Thomas Jefferson National Accelerator Facility
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] Lustre rpm install creating a file that breaks lustre

2019-10-02 Thread Kurt Strosahl
Good Afternoon,

While getting lustre 2.10.8 running on a RHEL 7.7 system I found that the 
RPM install was putting a file in /etc/modprobe.d that was preventing lnet from 
starting properly.

the file is ko2iblnd.conf, which contains the following...

alias ko2iblnd-opa ko2iblnd
options ko2iblnd-opa peer_credits=128 peer_credits_hiw=64 credits=1024 
concurrent_sends=256 ntx=2048 map_on_demand=32 fmr_pool_size=2048 
fmr_flush_trigger=512 fmr_cache=1 conns_per_peer=4

install ko2iblnd /usr/sbin/ko2iblnd-probe

Our system is running infiniband, not omnipath.  So I'm mot sure why this file 
is being put in place.  Removing the file allows lnet to start properly.

w/r,

Kurt J. Strosahl
System Administrator: Lustre, HPC
Scientific Computing Group, Thomas Jefferson National Accelerator Facility
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org