Megan, You wrote: PS. [I am willing to add/contribute to the http://wiki.lustre.org/Infiniband_Configuration_Howto<http://wiki.lustre.org/Infiniband_Configuration_Howto> but I think my account for wiki editing has expired (at least the one I thought I had did not work).
Thank you for your offer! Did you try http://wiki.lustre.org/Special:PasswordReset? If that didn’t work then I think that you could email lustre....@lists.opensfs.org<mailto:lustre....@lists.opensfs.org>. -Cory On 6/24/20, 3:33 PM, "lustre-discuss on behalf of Ms. Megan Larko" <lustre-discuss-boun...@lists.lustre.org<mailto:lustre-discuss-boun...@lists.lustre.org> on behalf of dobsonu...@gmail.com<mailto:dobsonu...@gmail.com>> wrote: On 22 Jun 2020 "guru.novice" wrote: Hi, all We setup up a cluster use mlx4 and mlx5 driver mixed?all things goes well. Later I find something in wiki http://wiki.lustre.org/Infiniband_Configuration_Howto<http://wiki.lustre.org/Infiniband_Configuration_Howto> and http://lists.onebuilding.org/pipermail/lustre-devel-lustre.org/2016-May/003842.html<http://lists.onebuilding.org/pipermail/lustre-devel-lustre.org/2016-May/003842.html> which was last edited on 2016. So do i need to change lnet configuration described in this page ? Or the problem has been resolved in new version (like 2.12.x) ? Anymore where can i find more details ? Any suggestions would be appreciated. Thanks? Hello guru.novice, Lustre 2.12.x has some nice LNet configuration abilities. The old /etc/modprobe.d/ config files have been superceded by /etc/lnet.conf. An install of Lustre 2.12.x provides a sample of this file (with the lines commented out). Our experience has shown that not all lines are necessary; edit to suit. The Lustre 2.12.x has Multi-Rail (MR) on by default so Lustre will attempt to automatically find active and viable LNet paths to use. This should have no issue with your mlx4/5 mix environment; we have some mixed IB and eth that work. To explicitly use MR one may set "Multi-Rail: true" in the "peer" NID section of the /etc/lnet.conf file. But that was not necessary for us. We used a simple /etc/lnet.conf for MR systems: File stub: /etc/lnet.conf net: - net type: o2ib0 local NI(s): - interfaces: 0: ib0 - net type: o2ib777 local NI(s): - interfaces: 0: ib0:1 This allowed LNet to use any NID o2ib0 and o2ib777. Whatever is placed in the /etc/lnet.conf file is loaded into the kernel modules used via the Lustre starting mechanism (CentOS uses /usr/lib/systemd/system). Because we are choosing _not_ to use MR on a different box, we explicitly defined the available routes in /etc/lnet.conf using the lines: route: - net: tcp gateway: 10.10.10.101@o2ib11111 - net: tcp gateway: 10.10.10.102@o2ib1111 And so on up to 10.10.10.116@o2ib1111 In CentOS7, /usr/lib/systemd/system/lnet.service file is reproduced below. (details: lustre-2.12.4-1 with Mellanox OFED version 4.7-1.0.0.1 and kernel 3.10.957.27.2.el7) File lnet.service: [unit] Description=lnet management Requires=network-online.target After=network-online.target openibd.service rdma.service opa.service ConditionsPathExists=!/proc/sys/lnet/ [Service] Type=oneshot RemainAfterExit=true ExecStart=/sbin/modprobe lnet ExecStart=/usr/sbin/lnetctl lnet configure ExecStart=/usr/sbin/lnetctl set discover 0 <--Do NOT use this line if you want MR function ExecStart=/usr/sbin/lnetctl import /etc/lnet.conf <--The file with router, credit and similar info ExecStart=/usr/sbin/lnetctl peer add --nid 10.10.10.[101-116]@o2ib11111 --non_mr <--Omit non_rm if you want to use MR ExecStop=/usr/sbin/lustre_rmmod ptlrpc ExecStop=/usr/sbin/lnetctl lnet unconfigure ExecStop=/usr/sbin/lustre_rmmod libcfs ldiskfs [Install] WantedBy=multi-user.target I hope this info can help you in the right direction. Cheers, megan PS. [I am willing to add/contribute to the http://wiki.lustre.org/Infiniband_Configuration_Howto<http://wiki.lustre.org/Infiniband_Configuration_Howto> but I think my account for wiki editing has expired (at least the one I thought I had did not work). Our site had issues with Multi-Rail "not socially distancing appropriately" from other LNet networks so in our particular case we disabled MR. (An entirely different experience.) ]
_______________________________________________ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org