On Tue, Mar 04, 2025 at 06:46:20PM +0000, Eugen Block wrote:
> > It's almost always the network ;-)
>
> I know, I have memorized your famous tweet about Ceph being the best network
> monitor 😄
It seems to be ;-)
When I spun up my small cluster, I used a noname 10G switch. Ceph
complained bitterly amount massive latencies across the board,
mostly around one node (hosting OSD, MDS, monitors). Running iperf3
between nodes:
- A and B: 10G line speed
- A and C: barely 10 MBit/s ... but ethtool says 10G ... WAT?
Checked the switch management UI: turns out, one of the ports seemed
to be bad, with the error counters going brrrrrrr at high speed.
Replacing the switch with a Mikrotik one (keeping the same NICs and
DAC cables) solved the problem.
Kind regards,
Alex.
--
"Opportunity is missed by most people because it is dressed in overalls and
looks like work." -- Thomas A. Edison
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]