thanks after two days of digging, i think i finally figured out that we have a layer 2 routing problem. i'm not the network guy so i'm not digging into it deeper, but it appears that there are either malfunctioning LACP trunks or more likely a misconfigured VPC connection inside the menagerie of switches the network team built.
the network is too complicated to describe here, but the base issue is that there are two switches 'supposedly' operating jointly, but don't seem to be sharing their CAM/ARP tables correctly. for whatever reason packets get duped to the switch that does not have the destination machine and since there's no arp/cam entry the switch just blasts the packet out all the ports. its not clear why the packets are being sent to layer 2 devices where the device doesn't exist, but it's clear there's something broken in the spanning tree database. it's also not clear why it only affects one of the vlans and not all. but again, not the network guy... and for once it is the network... :) On Wed, Jun 16, 2021 at 12:48 AM Greg Lindahl <[email protected]> wrote: > > On Mon, Jun 14, 2021 at 12:38:50PM -0400, Michael Di Domenico wrote: > > i got roped into troubleshooting an odd network issue. we have a mix > > of cisco (mostly nexus) gear spread over our facility. on one > > particular vlan it's operating as if it's a hub instead of switch. > > I have run into this situation when I have servers that have incoming > UDP traffic and never talk or do TCP. The switches have no idea where the > server is, so they broadcast all of the incoming packets. > > An ARP reply or connecting with TCP tells the switch which port to use. > > -- greg > > _______________________________________________ Beowulf mailing list, [email protected] sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
