>> If your public network is saturated, that actually is a problem, last thing 
>> you want is to add recovery traffic, or to slow down heartbeats.  For most 
>> people, it isn’t saturated.
> 
>        See Frank Schilder's post about a meltdown which he believes could have
>        been caused by beacon/hearbeat being drowned out by other recovery/IO
>        trafic, not at the network level, but at the processing level on the 
> OSDs.
> 
>        If indeed there are cases where the OSDs are too busy to send (or 
> process)
>        heartbeat/beacon messaging, it wouldn't help to have a separate 
> network ?

Agreed.  Many times I’ve had to argue that CPUs that aren’t nearly saturated 
*aren’t* necessarily overkill, especially with fast media where latency hurts.  
It would be interesting to consider an architecture where a core/HT is 
dedicated to the control plane.

That said, I’ve seen a situation where excessive CPU appeared to affect latency 
by allowing the CPUs to drop C-states, this especially affected network traffic 
(2x dual 10GE).  
Curiously some systems in the same cluster experienced this but some didn’t.   
There was a mix of Sandy Bridge and Ivy Bridge IIRC, as well as different 
Broadcom chips.  Despite an apparently alignment with older vs newer Broadcom 
chip, I never fully characterized the situation — replacing one of the Broadcom 
NICs in an affected system with the model in use on unaffected systems diddn’t 
resolve the issue.  It’s possible that replacing the other wwould have made a 
difference.
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to