On Mon, Oct 1, 2012 at 8:42 PM, Tommi Virtanen <t...@inktank.com> wrote:
> On Sun, Sep 30, 2012 at 2:55 PM, Andrey Korolyov <and...@xdel.ru> wrote:
>> Short post mortem - EX3200/12.1R2.9 may begin to drop packets (seems
>> to appear more likely on 0.51 traffic patterns, which is very strange
>> for L2 switching) when a bunch of the 802.3ad pairs, sixteen in my
>> case, exposed to extremely high load - database benchmark over 700+
>> rbd-backed VMs and cluster rebalance at same time. It explains
>> post-reboot lockups in igb driver and all types of lockups above. I
>> would very appreciate any suggestions of switch models which do not
>> expose such behavior in simultaneous conditions both off-list and in
>> this thread.
>
> I don't see how a switch dropping packets would give an ethernet card
> driver any excuse to crash, but I'm simultaneously happy to hear that
> it doesn't seem like Ceph is at fault, and sorry for your troubles.
>
> I don't have an up to date 1GbE card recommendation to share, but I
> would recommend making sure you're using a recent Linux kernel.

I have incorrectly formulated a reason - of course drops can not cause
a lockup by themselves, but switch may create somehow a long-lasting
`corrupt` state on the trunk ports which leads to such lockups at the
ethernet card. Of course I`ll play with the driver versions and
card|port settings, thanks for suggestion :)

I`m still investigating the issue since it is a quite hard to repeat
in the right time and hope I`m able to capture this state using
tcpdump-like, e.g. s/w methods - if card driver locks on something, it
may prevent to process problematic byte sequence at packet sniffer level.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to