unfortunately i don't think so.  we're pretty good about assigning
addresses, but still human.  i don't see any evidence of a dup'd
address, but i'll keep looking

thanks

On Thu, Oct 30, 2025 at 8:10 PM Mohr, Rick <[email protected]> wrote:
>
> Michael,
>
> It might be a long shot, but is there any chance another machine has the same 
> IP address as the one having problems?
>
> --Rick
>
>
>
> On 10/30/25, 3:09 PM, "lustre-discuss on behalf of Michael DiDomenico via 
> lustre-discuss" wrote:
> our network is running 2.15.6 everywhere on rhel9.5, we recently built a new 
> machine using 2.15.7 on rhel9.6 and i'm seeing a strange problem. the client 
> is ethernet connected to ten lnet routers which bridge ethernet to 
> infiniband. i can mount the client just fine, read/write data, but then 
> several hours later, the client marks all the routers offline. the only 
> recovery is to lazy unmount, lustre_rmmod, and then restart the lustre mount 
> nothing unusual comes out in the journal/dmesg logs. to lustre it "looks" 
> like someone pulled the network cable, but there's no evidence that this has 
> happened physically or even at the switch/software layers we upgraded two 
> other machine to see if the problem replicates, but so far it hasn't. the 
> only significant difference between the three machines is the one with the 
> problem has heavy container (podman) usage, the others have zero. i'm not 
> sure if this is an cause or just a red herring any suggestions
>
>
_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to