Re: [lustre-discuss] Lustre rpm install creating a file that breaks lustre
The "alias ko2iblnd-opa" line in ko2iblnd.conf doesn't do anything unless OPA (or older QLogic) interfaces are detected on your system via the /usr/sbin/ko2iblnd-probe script. This indirection is used to allow better default module parameters between OPA and MLX devices, which can't easily be determined inside the kernel, and would be hard to change after the fact. In theory, if there were substantially better ko2iblnd module parameters for new MLX or RoCE devices, then this same mechanism could be used to do similar interface-specific tunings in userspace. You could run that script with the last line commented out (or replace "exec->echo" on the last line) to see what it is doing. Cheers, Andreas On Oct 2, 2019, at 12:55, Kurt Strosahl mailto:stros...@jlab.org>> wrote: Good Afternoon, While getting lustre 2.10.8 running on a RHEL 7.7 system I found that the RPM install was putting a file in /etc/modprobe.d that was preventing lnet from starting properly. the file is ko2iblnd.conf, which contains the following... alias ko2iblnd-opa ko2iblnd options ko2iblnd-opa peer_credits=128 peer_credits_hiw=64 credits=1024 concurrent_sends=256 ntx=2048 map_on_demand=32 fmr_pool_size=2048 fmr_flush_trigger=512 fmr_cache=1 conns_per_peer=4 install ko2iblnd /usr/sbin/ko2iblnd-probe Our system is running infiniband, not omnipath. So I'm mot sure why this file is being put in place. Removing the file allows lnet to start properly. Cheers, Andreas -- Andreas Dilger Principal Lustre Architect Whamcloud ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Lustre rpm install creating a file that breaks lustre
Anything in dmesg? We need to know _why_ the network failed to start. Chris Horn From: Kurt Strosahl Date: Wednesday, October 2, 2019 at 1:55 PM To: Chris Horn , "lustre-discuss@lists.lustre.org" Subject: Re: [lustre-discuss] Lustre rpm install creating a file that breaks lustre the lnet modules load, but when I start the lnet service it says that the network is down. I backed everything out, removed the file, and then started the lnet service again and it worked properly. From: Chris Horn Sent: Wednesday, October 2, 2019 2:48 PM To: Kurt Strosahl ; lustre-discuss@lists.lustre.org Subject: [EXTERNAL] Re: [lustre-discuss] Lustre rpm install creating a file that breaks lustre Might be best to open a ticket for this. What was the nature of the failure? Chris Horn From: lustre-discuss on behalf of Kurt Strosahl Date: Wednesday, October 2, 2019 at 1:30 PM To: "lustre-discuss@lists.lustre.org" Subject: [lustre-discuss] Lustre rpm install creating a file that breaks lustre Good Afternoon, While getting lustre 2.10.8 running on a RHEL 7.7 system I found that the RPM install was putting a file in /etc/modprobe.d that was preventing lnet from starting properly. the file is ko2iblnd.conf, which contains the following... alias ko2iblnd-opa ko2iblnd options ko2iblnd-opa peer_credits=128 peer_credits_hiw=64 credits=1024 concurrent_sends=256 ntx=2048 map_on_demand=32 fmr_pool_size=2048 fmr_flush_trigger=512 fmr_cache=1 conns_per_peer=4 install ko2iblnd /usr/sbin/ko2iblnd-probe Our system is running infiniband, not omnipath. So I'm mot sure why this file is being put in place. Removing the file allows lnet to start properly. w/r, Kurt J. Strosahl System Administrator: Lustre, HPC Scientific Computing Group, Thomas Jefferson National Accelerator Facility ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Lustre rpm install creating a file that breaks lustre
the lnet modules load, but when I start the lnet service it says that the network is down. I backed everything out, removed the file, and then started the lnet service again and it worked properly. From: Chris Horn Sent: Wednesday, October 2, 2019 2:48 PM To: Kurt Strosahl ; lustre-discuss@lists.lustre.org Subject: [EXTERNAL] Re: [lustre-discuss] Lustre rpm install creating a file that breaks lustre Might be best to open a ticket for this. What was the nature of the failure? Chris Horn From: lustre-discuss on behalf of Kurt Strosahl Date: Wednesday, October 2, 2019 at 1:30 PM To: "lustre-discuss@lists.lustre.org" Subject: [lustre-discuss] Lustre rpm install creating a file that breaks lustre Good Afternoon, While getting lustre 2.10.8 running on a RHEL 7.7 system I found that the RPM install was putting a file in /etc/modprobe.d that was preventing lnet from starting properly. the file is ko2iblnd.conf, which contains the following... alias ko2iblnd-opa ko2iblnd options ko2iblnd-opa peer_credits=128 peer_credits_hiw=64 credits=1024 concurrent_sends=256 ntx=2048 map_on_demand=32 fmr_pool_size=2048 fmr_flush_trigger=512 fmr_cache=1 conns_per_peer=4 install ko2iblnd /usr/sbin/ko2iblnd-probe Our system is running infiniband, not omnipath. So I'm mot sure why this file is being put in place. Removing the file allows lnet to start properly. w/r, Kurt J. Strosahl System Administrator: Lustre, HPC Scientific Computing Group, Thomas Jefferson National Accelerator Facility ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Lustre rpm install creating a file that breaks lustre
Might be best to open a ticket for this. What was the nature of the failure? Chris Horn From: lustre-discuss on behalf of Kurt Strosahl Date: Wednesday, October 2, 2019 at 1:30 PM To: "lustre-discuss@lists.lustre.org" Subject: [lustre-discuss] Lustre rpm install creating a file that breaks lustre Good Afternoon, While getting lustre 2.10.8 running on a RHEL 7.7 system I found that the RPM install was putting a file in /etc/modprobe.d that was preventing lnet from starting properly. the file is ko2iblnd.conf, which contains the following... alias ko2iblnd-opa ko2iblnd options ko2iblnd-opa peer_credits=128 peer_credits_hiw=64 credits=1024 concurrent_sends=256 ntx=2048 map_on_demand=32 fmr_pool_size=2048 fmr_flush_trigger=512 fmr_cache=1 conns_per_peer=4 install ko2iblnd /usr/sbin/ko2iblnd-probe Our system is running infiniband, not omnipath. So I'm mot sure why this file is being put in place. Removing the file allows lnet to start properly. w/r, Kurt J. Strosahl System Administrator: Lustre, HPC Scientific Computing Group, Thomas Jefferson National Accelerator Facility ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[lustre-discuss] Lustre rpm install creating a file that breaks lustre
Good Afternoon, While getting lustre 2.10.8 running on a RHEL 7.7 system I found that the RPM install was putting a file in /etc/modprobe.d that was preventing lnet from starting properly. the file is ko2iblnd.conf, which contains the following... alias ko2iblnd-opa ko2iblnd options ko2iblnd-opa peer_credits=128 peer_credits_hiw=64 credits=1024 concurrent_sends=256 ntx=2048 map_on_demand=32 fmr_pool_size=2048 fmr_flush_trigger=512 fmr_cache=1 conns_per_peer=4 install ko2iblnd /usr/sbin/ko2iblnd-probe Our system is running infiniband, not omnipath. So I'm mot sure why this file is being put in place. Removing the file allows lnet to start properly. w/r, Kurt J. Strosahl System Administrator: Lustre, HPC Scientific Computing Group, Thomas Jefferson National Accelerator Facility ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org