Re: [lustre-discuss] lnetctl fails to recreate exact config when importing exported lnet.conf

2020-09-06 Thread Angelos Ching
September 5, 2020 1:04 AM, "Mohr Jr, Richard Frank"  wrote:
> So your server has both tcp and o2ib NIDs, and you
> want the server to route requests from tcp clients to other resources on the 
> o2ib network. But when
> you mount Lustre, you want the client to use the server’s o2ib NID instead of 
> mounting with the
> server’s tcp NID.

Correct.

Actually the pair of Lnet router themselves are also serving MDS & OSS, and 
with 4 more MDS/OSS that are only on o2ib serving yet another file system. With 
the extraneous peer added by route add, the Lnet router would print the follow 
kernel message:
> LNetError: 34250:0:(lib-move.c:4259:lnet_parse()) 10.4.7.145@tcp, src 
> 10.4.7.145@tcp: Bad dest nid 10.1.4.24@o2ib (it's my nid but on a different 
> network)

This is worked around by manually adding the routers as peers with the 2 NIDs 
prior to route add, whether o2ib or tcp is used as primary NID does not seems 
to matter; and I just discovered that if I perform a lnetctl discover with the 
router's TCP NID, either before or after route add, that would also yield a 
usable Lnet. After discovering the later workaround, I've implemented it using 
a systemd drop-in for lnet.service unit.

Best regards.
Angelos Ching

E: angelosch...@clustertech.com
P: +852-2655-6138
A: 210-213, Lake Side 1, Science Park, Hong Kong
W: http://clustertech.com
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] lnetctl fails to recreate exact config when importing exported lnet.conf

2020-09-04 Thread Mohr Jr, Richard Frank


> On Sep 4, 2020, at 11:26 AM, Angelos Ching  
> wrote:
> 
> If I don't add the "Lnet router + Server" peers manually as multi-rail 
> enabled peer before route add, a non-multi-rail
> peer with only TCP NID would be added by the route add command for the "Lnet 
> router + Server" (as seen in line 76-83 in https://pastebin.com/h3wHyCM7) and 
> the existent of those 2 peers would interfere with normal Lnet communication 
> with server side kernel message printing "Bad dest nid n.n.n.n@o2ib (it's my 
> nid but on a different network)"

Sorry.  I think I misread your original email.  So your server has both tcp and 
o2ib NIDs, and you want the server to route requests from tcp clients to other 
resources on the o2ib network.  But when you mount Lustre, you want the client 
to use the server’s o2ib NID instead of mounting with the server’s tcp NID.  Is 
that correct?

Rick

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] lnetctl fails to recreate exact config when importing exported lnet.conf

2020-09-04 Thread Mohr Jr, Richard Frank


> On Sep 4, 2020, at 12:11 AM, Angelos Ching  
> wrote:
> 
> All steps below carried out on Lustre client:
> 
> 1. Restart lnet service with empty /etc/lnet.conf
> 2. lnetctl net add: TCP network using Ethernet
> 3. lnetctl peer add: 2 peers with "Lnet router + server"@o2ib,tcp NIDs

The commands you ran were:

[root@access2 ~]# lnetctl peer add --nid 10.1.4.24@o2ib,10.4.7.24@tcp
[root@access2 ~]# lnetctl peer add --nid 10.1.4.25@o2ib,10.4.7.25@tcp

Commands like this can be used when a node has a multirail setup, like when a 
node has multiple interfaces on the same network.  But for your routers, it 
looks like the tcp network is available to the client and the o2ib network is 
available to the server.  Since those interfaces are not on the same network so 
you don’t need to add both of them as a peer.

> 4. lnetctl route add: 2 gateways to o2ib network using "Lnet router +
> server"@TCP NID

[root@access2 ~]# lnetctl route add --net o2ib --gateway 10.4.7.24@tcp
[root@access2 ~]# lnetctl route add --net o2ib --gateway 10.4.7.25@tcp

These should be the only commands you need to run to configure your routing.

-Rick


___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] lnetctl fails to recreate exact config when importing exported lnet.conf

2020-09-04 Thread Angelos Ching
Hi Rick,

If I don't add the "Lnet router + Server" peers manually as multi-rail enabled 
peer before route add, a non-multi-rail
peer with only TCP NID would be added by the route add command for the "Lnet 
router + Server" (as seen in line 76-83 in https://pastebin.com/h3wHyCM7) and 
the existent of those 2 peers would interfere with normal Lnet communication 
with server side kernel message printing "Bad dest nid n.n.n.n@o2ib (it's my 
nid but on a different network)"

This is also what happens when lnet.conf is imported by lnetctl: if lnetctl 
imports the peer before route, no extraneous peer entries were created and 
everything works as expected (as output by line 16). If lnetctl import the 
route before peer, the scenario mentioned in the last paragraph occurs and 
results in a non-usable Lnet for the client. And the order lnetctl import each 
section depends on its order of appearance inside the yaml file.

Best regards,
Angelos Ching

E: angelosch...@clustertech.com
P: +852-2655-6138
A: 210-213, Lake Side 1, Science Park, Hong Kong
W: http://clustertech.com

September 4, 2020 11:06 PM, "Mohr Jr, Richard Frank"  wrote:

>> On Sep 4, 2020, at 12:11 AM, Angelos Ching  
>> wrote:
>> 
>> All steps below carried out on Lustre client:
>> 
>> 1. Restart lnet service with empty /etc/lnet.conf
>> 2. lnetctl net add: TCP network using Ethernet
>> 3. lnetctl peer add: 2 peers with "Lnet router + server"@o2ib,tcp NIDs
> 
> The commands you ran were:
> 
> [root@access2 ~]# lnetctl peer add --nid 10.1.4.24@o2ib,10.4.7.24@tcp
> [root@access2 ~]# lnetctl peer add --nid 10.1.4.25@o2ib,10.4.7.25@tcp
> 
> Commands like this can be used when a node has a multirail setup, like when a 
> node has multiple
> interfaces on the same network. But for your routers, it looks like the tcp 
> network is available to
> the client and the o2ib network is available to the server. Since those 
> interfaces are not on the
> same network so you don’t need to add both of them as a peer.
> 
>> 4. lnetctl route add: 2 gateways to o2ib network using "Lnet router +
>> server"@TCP NID
> 
> [root@access2 ~]# lnetctl route add --net o2ib --gateway 10.4.7.24@tcp
> [root@access2 ~]# lnetctl route add --net o2ib --gateway 10.4.7.25@tcp
> 
> These should be the only commands you need to run to configure your routing.
> 
> -Rick
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] lnetctl fails to recreate exact config when importing exported lnet.conf

2020-09-04 Thread Peter Jones
i...@whamcloud.com


On 2020-09-04, 3:36 AM, "lustre-discuss on behalf of Angelos Ching" 
 wrote:

Hi Aurélien,

May I have some pointers on to whom my account request for the Jira should 
be sent?

Thanks,
Angelos
(Sent from mobile, please pardon me for typos and cursoriness.)

> 2020/09/04 16:01、Degremont, Aurelien のメール:
> 
> Hi Angelos,
> 
> Bug reports could be made at  https://jira.whamcloud.com/
> 
> 
> Aurélien
> 
> Le 04/09/2020 06:11, « lustre-discuss au nom de Angelos Ching » 
 a écrit :
> 
>CAUTION: This email originated from outside of the organization. Do 
not click links or open attachments unless you can confirm the sender and know 
the content is safe.
> 
> 
> 
>Dear all,
> 
>I think I've encountered a bug in lnetctl but not sure where to submit 
a
>bug report:
> 
>Summary:
>It's expected that the Lnet config on a node can be recreated on
>lnet.service start up by saving the config using: lnetctl export
>--backup > /etc/lnet.conf
>But ordering within ymal file causes extraneous NIDs to be created when
>used in combination with routing, thus breaking Lnet routing / node
>communication, with server side dmesg showing "Bad dest nid 
n.n.n.n@o2ib
>(it's my nid but on a different network)"
> 
>Environment:
>Client: CentOS 7.8, Lustre 2.12.5-ib, MLNX OFED 4.9-0.1.7.1
>Lnet router + server: CentOS 7.7, Lustre 2.12.4-ib, MLNX OFED 
4.7-3.2.9.0
> 
>Steps to reproduce:
>(Listing 1) Server side Lnet config (peer list omitted for 
conciseness):
>https://pastebin.com/DH6HAt5a
>(Listing 2) Full command listing and output on client side is 
reproduced
>here: https://pastebin.com/h3wHyCM7
> 
>All steps below carried out on Lustre client:
> 
>1. Restart lnet service with empty /etc/lnet.conf
>2. lnetctl net add: TCP network using Ethernet
>3. lnetctl peer add: 2 peers with "Lnet router + server"@o2ib,tcp NIDs
>4. lnetctl route add: 2 gateways to o2ib network using "Lnet router +
>server"@TCP NID
>5. lnetctl export: with --backup to /etc/lnet.conf; check the saved 
file
>and confirm Lnet is configured with 2 peers and 2 gateways (Listing 2:
>37-47)
>6. Mount o2ib exported Lustre volume and confirm volume functioning
>correctly; unmount volume
>7. Restart lnet.service and check lnet configuration; finds 2 extra 
peer
>entries that reference only TCP NID of the "Lnet router + server" along
>with 2 manually configured peers that reference both o2ib and tcp NIDs
>(Listing 2: 75-93)
>8. Client fails to mount o2ib exported volume; server side kernel
>message shows "Bad dest nid n.n.n.n@o2ib (it's my nid but on a 
different
>network)"
> 
>9. If we reorder the peer list to go before the route list in
>/etc/lnet.conf (Listing 2: 16), then lnet would be properly configured
>with 2 peers on service restart and everything works as expected.
> 
>Best regards,
> 
>--
>Angelos Ching
>ClusterTech Limited
> 
>Tel : +852-2655-6138
>Fax : +852-2994-2101
>Address : Unit 211-213, Lakeside 1, 8 Science Park West Ave., Shatin, 
Hong Kong
> 
>Got praises or room for improvements? http://bit.ly/TellAngelos
> 
>

>The information contained in this e-mail and its attachments is 
confidential and
>intended solely for the specified addressees. If you have received 
this email in
>error, please do not read, copy, distribute, disclose or use any 
information of
>this email in any way and please immediately notify the sender and 
delete this
>email. Thank you for your cooperation.
>

> 
>___
>lustre-discuss mailing list
>lustre-discuss@lists.lustre.org
>http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> 

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] lnetctl fails to recreate exact config when importing exported lnet.conf

2020-09-04 Thread Degremont, Aurelien
Hi Angelos,

Bug reports could be made at  https://jira.whamcloud.com/


Aurélien

Le 04/09/2020 06:11, « lustre-discuss au nom de Angelos Ching » 
 a écrit :

CAUTION: This email originated from outside of the organization. Do not 
click links or open attachments unless you can confirm the sender and know the 
content is safe.



Dear all,

I think I've encountered a bug in lnetctl but not sure where to submit a
bug report:

Summary:
It's expected that the Lnet config on a node can be recreated on
lnet.service start up by saving the config using: lnetctl export
--backup > /etc/lnet.conf
But ordering within ymal file causes extraneous NIDs to be created when
used in combination with routing, thus breaking Lnet routing / node
communication, with server side dmesg showing "Bad dest nid n.n.n.n@o2ib
(it's my nid but on a different network)"

Environment:
Client: CentOS 7.8, Lustre 2.12.5-ib, MLNX OFED 4.9-0.1.7.1
Lnet router + server: CentOS 7.7, Lustre 2.12.4-ib, MLNX OFED 4.7-3.2.9.0

Steps to reproduce:
(Listing 1) Server side Lnet config (peer list omitted for conciseness):
https://pastebin.com/DH6HAt5a
(Listing 2) Full command listing and output on client side is reproduced
here: https://pastebin.com/h3wHyCM7

All steps below carried out on Lustre client:

1. Restart lnet service with empty /etc/lnet.conf
2. lnetctl net add: TCP network using Ethernet
3. lnetctl peer add: 2 peers with "Lnet router + server"@o2ib,tcp NIDs
4. lnetctl route add: 2 gateways to o2ib network using "Lnet router +
server"@TCP NID
5. lnetctl export: with --backup to /etc/lnet.conf; check the saved file
and confirm Lnet is configured with 2 peers and 2 gateways (Listing 2:
37-47)
6. Mount o2ib exported Lustre volume and confirm volume functioning
correctly; unmount volume
7. Restart lnet.service and check lnet configuration; finds 2 extra peer
entries that reference only TCP NID of the "Lnet router + server" along
with 2 manually configured peers that reference both o2ib and tcp NIDs
(Listing 2: 75-93)
8. Client fails to mount o2ib exported volume; server side kernel
message shows "Bad dest nid n.n.n.n@o2ib (it's my nid but on a different
network)"

9. If we reorder the peer list to go before the route list in
/etc/lnet.conf (Listing 2: 16), then lnet would be properly configured
with 2 peers on service restart and everything works as expected.

Best regards,

--
Angelos Ching
ClusterTech Limited

Tel : +852-2655-6138
Fax : +852-2994-2101
Address : Unit 211-213, Lakeside 1, 8 Science Park West Ave., Shatin, Hong 
Kong

Got praises or room for improvements? http://bit.ly/TellAngelos



The information contained in this e-mail and its attachments is 
confidential and
intended solely for the specified addressees. If you have received this 
email in
error, please do not read, copy, distribute, disclose or use any 
information of
this email in any way and please immediately notify the sender and delete 
this
email. Thank you for your cooperation.



___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] lnetctl fails to recreate exact config when importing exported lnet.conf

2020-09-04 Thread Angelos Ching
Hi Aurélien,

May I have some pointers on to whom my account request for the Jira should be 
sent?

Thanks,
Angelos
(Sent from mobile, please pardon me for typos and cursoriness.)

> 2020/09/04 16:01、Degremont, Aurelien のメール:
> 
> Hi Angelos,
> 
> Bug reports could be made at  https://jira.whamcloud.com/
> 
> 
> Aurélien
> 
> Le 04/09/2020 06:11, « lustre-discuss au nom de Angelos Ching » 
>  angelosch...@clustertech.com> a écrit :
> 
>CAUTION: This email originated from outside of the organization. Do not 
> click links or open attachments unless you can confirm the sender and know 
> the content is safe.
> 
> 
> 
>Dear all,
> 
>I think I've encountered a bug in lnetctl but not sure where to submit a
>bug report:
> 
>Summary:
>It's expected that the Lnet config on a node can be recreated on
>lnet.service start up by saving the config using: lnetctl export
>--backup > /etc/lnet.conf
>But ordering within ymal file causes extraneous NIDs to be created when
>used in combination with routing, thus breaking Lnet routing / node
>communication, with server side dmesg showing "Bad dest nid n.n.n.n@o2ib
>(it's my nid but on a different network)"
> 
>Environment:
>Client: CentOS 7.8, Lustre 2.12.5-ib, MLNX OFED 4.9-0.1.7.1
>Lnet router + server: CentOS 7.7, Lustre 2.12.4-ib, MLNX OFED 4.7-3.2.9.0
> 
>Steps to reproduce:
>(Listing 1) Server side Lnet config (peer list omitted for conciseness):
>https://pastebin.com/DH6HAt5a
>(Listing 2) Full command listing and output on client side is reproduced
>here: https://pastebin.com/h3wHyCM7
> 
>All steps below carried out on Lustre client:
> 
>1. Restart lnet service with empty /etc/lnet.conf
>2. lnetctl net add: TCP network using Ethernet
>3. lnetctl peer add: 2 peers with "Lnet router + server"@o2ib,tcp NIDs
>4. lnetctl route add: 2 gateways to o2ib network using "Lnet router +
>server"@TCP NID
>5. lnetctl export: with --backup to /etc/lnet.conf; check the saved file
>and confirm Lnet is configured with 2 peers and 2 gateways (Listing 2:
>37-47)
>6. Mount o2ib exported Lustre volume and confirm volume functioning
>correctly; unmount volume
>7. Restart lnet.service and check lnet configuration; finds 2 extra peer
>entries that reference only TCP NID of the "Lnet router + server" along
>with 2 manually configured peers that reference both o2ib and tcp NIDs
>(Listing 2: 75-93)
>8. Client fails to mount o2ib exported volume; server side kernel
>message shows "Bad dest nid n.n.n.n@o2ib (it's my nid but on a different
>network)"
> 
>9. If we reorder the peer list to go before the route list in
>/etc/lnet.conf (Listing 2: 16), then lnet would be properly configured
>with 2 peers on service restart and everything works as expected.
> 
>Best regards,
> 
>--
>Angelos Ching
>ClusterTech Limited
> 
>Tel : +852-2655-6138
>Fax : +852-2994-2101
>Address : Unit 211-213, Lakeside 1, 8 Science Park West Ave., Shatin, Hong 
> Kong
> 
>Got praises or room for improvements? http://bit.ly/TellAngelos
> 
>
> 
>The information contained in this e-mail and its attachments is 
> confidential and
>intended solely for the specified addressees. If you have received this 
> email in
>error, please do not read, copy, distribute, disclose or use any 
> information of
>this email in any way and please immediately notify the sender and delete 
> this
>email. Thank you for your cooperation.
>
> 
> 
>___
>lustre-discuss mailing list
>lustre-discuss@lists.lustre.org
>http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> 

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org