If deleting and re-adding it restores the status to up then this sounds like a bug to me.
Can you enable debug tracing, reproduce the issue, and add this information to a ticket? To enable/gather debug: # lctl set_param debug=+net <reproduce issue> # lctl dk > /tmp/dk.log You can create a ticket at https://jira.whamcloud.com/ Please provide the dk.log with the ticket. Thanks, Chris Horn From: lustre-discuss <lustre-discuss-boun...@lists.lustre.org> on behalf of 腐朽银 via lustre-discuss <lustre-discuss@lists.lustre.org> Date: Friday, February 17, 2023 at 2:53 AM To: lustre-discuss@lists.lustre.org <lustre-discuss@lists.lustre.org> Subject: [lustre-discuss] LNet nid down after some thing changed the NICs Hi, I encountered a problem when using Lustre Client on k8s with kubenet. Very happy if you could help me. My LNet configuration is: net: - net type: lo local NI(s): - nid: 0@lo status: up - net type: tcp local NI(s): - nid: 10.224.0.5@tcp status: up interfaces: 0: eth0 It works. But after I deploy or delete a pod on the node. The nid goes down like: - nid: 10.224.0.5@tcp status: down interfaces: 0: eth0 k8s uses veth pairs, so it will add or delete network interfaces when deploying or deleting pods. But it doesn't touch the eth0 NIC. I can fix it by deleting the tcp net by `lnetctl net del` and re-add it by `lnetctl net add`. But I need to do this every time after a pod is scheduled to this node. My node OS is Ubuntu 18.04 5.4.0-1101-azure. The Lustre Client is built by myself from 2.15.1. Is this an expected LNet behavior or I got something wrong? I re-build and tested it several times and got the same problem. Regards, Chuanjun
_______________________________________________ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org