On Wed, Feb 8, 2017 at 10:25 AM, <george.vasilaka...@stfc.ac.uk> wrote: > Hi Greg, > >> Yes, "bad crc" indicates that the checksums on an incoming message did >> not match what was provided — ie, the message got corrupted. You >> shouldn't try and fix that by playing around with the peering settings >> as it's not a peering bug. >> Unless there's a bug in the messaging layer causing this (very >> unlikely), you have bad hardware or a bad network configuration >> (people occasionally talk about MTU settings?). Fix that and things >> will work; don't and the only software tweaks you could apply are more >> likely to result in lost data than a happy cluster. >> -Greg > > > I thought of the network initially but I didn't observe packet loss between > the two hosts and neither host is having trouble talking to the rest of its > peers. It's these two OSDs that can't talk to each other so I figured it's > not likely to be a network issue. Network monitoring does show virtually > non-existent inbound traffic over those links compared to the other ports on > the switch but no other peerings fail. > > Is there something you can suggest to do to drill down deeper?
Sadly no. It being a single route is indeed weird and hopefully somebody with more networking background can suggest a cause. :) > Also, am I correct in assuming that I can pull one of these OSDs from the > cluster as a last resort to cause a remapping to a different to potentially > give this a quick/temp fix and get the cluster serving I/O properly again? I'd expect so! _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com