Hello Ceph Users,

yesterday I had a defective Gbic in 1 node of my 10 node ceph cluster.

The Gbic was working somehow but had 50% packet-loss. Some packets went 
through, some did not.

What happend that the whole cluster did not service requests in time, there 
were lots of timeouts and so on
until the problem was isolated. Monitors and osds where asked for data but did 
dot answer or answer late.

I am wondering, here we have a highly redundant network setup and a highly 
redundant piece of software, but a small
network fault brings down the whole cluster.

Is there anything that can be configured or changed in ceph so that 
availability will become better in case of flapping networks ?

I understand, it is not a ceph problem but a network problem but maybe 
something can be learned from such incidents  ?

Thanks
  Christoph
-- 
Christoph Adomeit
GATWORKS GmbH
Reststrauch 191
41199 Moenchengladbach
Sitz: Moenchengladbach
Amtsgericht Moenchengladbach, HRB 6303
Geschaeftsfuehrer:
Christoph Adomeit, Hans Wilhelm Terstappen

christoph.adom...@gatworks.de     Internetloesungen vom Feinsten
Fon. +49 2166 9149-32                      Fax. +49 2166 9149-10
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to