Right now we're just scraping the output of ifconfig: ifconfig p2p1 | grep -e 'RX\|TX' | grep packets | awk '{print $3}'
It clunky, but it works. I'm sure there's a cleaner way, but this was expedient. QH On Tue, Mar 31, 2015 at 5:05 PM, Francois Lafont <flafdiv...@free.fr> wrote: > Hi, > > Quentin Hartman wrote: > > > Since I have been in ceph-land today, it reminded me that I needed to > close > > the loop on this. I was finally able to isolate this problem down to a > > faulty NIC on the ceph cluster network. It "worked", but it was > > accumulating a huge number of Rx errors. My best guess is some receive > > buffer cache failed? Anyway, having a NIC go weird like that is totally > > consistent with all the weird problems I was seeing, the corrupted PGs, > and > > the inability for the cluster to settle down. > > > > As a result we've added NIC error rates to our monitoring suite on the > > cluster so we'll hopefully see this coming if it ever happens again. > > Good for you. ;) > > Could you post here the command that you use to get NIC error rates? > > -- > François Lafont > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com