Here are the first 33k lines or so: https://dl.dropboxusercontent.com/u/104949139/ceph-osd-log.txt
This is a different (but more or less identical) machine from the past set of logs. This system doesn't have quite as many drives in it, so I couldn't spot a same-host error burst, but it's logging tons of the same errors while trying to talk to 10.2.0.34. On Wed Nov 12 2014 at 10:47:30 AM Gregory Farnum <g...@gregs42.com> wrote: > On Tue, Nov 11, 2014 at 6:28 PM, Scott Laird <sc...@sigkill.org> wrote: > > I'm having a problem with my cluster. It's running 0.87 right now, but I > > saw the same behavior with 0.80.5 and 0.80.7. > > > > The problem is that my logs are filling up with "replacing existing > (lossy) > > channel" log lines (see below), to the point where I'm filling drives to > > 100% almost daily just with logs. > > > > It doesn't appear to be network related, because it happens even when > > talking to other OSDs on the same host. > > Well, that means it's probably not physical network related, but there > can still be plenty wrong with the networking stack... ;) > > > The logs pretty much all point to > > port 0 on the remote end. Is this an indicator that it's failing to > resolve > > port numbers somehow, or is this normal at this point in connection > setup? > > That's definitely unusual, but I'd need to see a little more to be > sure if it's bad. My guess is that these pipes are connections from > the other OSD's Objecter, which is treated as a regular client and > doesn't bind to a socket for incoming connections. > > The repetitive channel replacements are concerning, though — they can > be harmless in some circumstances but this looks more like the > connection is simply failing to establish and so it's retrying over > and over again. Can you restart the OSDs with "debug ms = 10" in their > config file and post the logs somewhere? (There is not really any > documentation available on what they mean, but the deeper detail ones > might also be more understandable to you.) > -Greg > > > > > The systems that are causing this problem are somewhat unusual; they're > > running OSDs in Docker containers, but they *should* be configured to > run as > > root and have full access to the host's network stack. They manage to > work, > > mostly, but things are still really flaky. > > > > Also, is there documentation on what the various fields mean, short of > > digging through the source? And how does Ceph resolve OSD numbers into > > host/port addresses? > > > > > > 2014-11-12 01:50:40.802604 7f7828db8700 0 -- 10.2.0.36:6819/1 >> > > 10.2.0.36:0/1 pipe(0x1ce31c80 sd=135 :6819 s=0 pgs=0 cs=0 l=1 > > c=0x1e070580).accept replacing existing (lossy) channel (new one lossy=1) > > > > 2014-11-12 01:50:40.802708 7f7816538700 0 -- 10.2.0.36:6830/1 >> > > 10.2.0.36:0/1 pipe(0x1ff61080 sd=120 :6830 s=0 pgs=0 cs=0 l=1 > > c=0x1f3db2e0).accept replacing existing (lossy) channel (new one lossy=1) > > > > 2014-11-12 01:50:40.803346 7f781ba8d700 0 -- 10.2.0.36:6819/1 >> > > 10.2.0.36:0/1 pipe(0x1ce31180 sd=125 :6819 s=0 pgs=0 cs=0 l=1 > > c=0x1e070420).accept replacing existing (lossy) channel (new one lossy=1) > > > > 2014-11-12 01:50:40.803944 7f781996c700 0 -- 10.2.0.36:6830/1 >> > > 10.2.0.36:0/1 pipe(0x1ff618c0 sd=107 :6830 s=0 pgs=0 cs=0 l=1 > > c=0x1f3d8420).accept replacing existing (lossy) channel (new one lossy=1) > > > > 2014-11-12 01:50:40.804185 7f7816538700 0 -- 10.2.0.36:6819/1 >> > > 10.2.0.36:0/1 pipe(0x1ffd1e40 sd=20 :6819 s=0 pgs=0 cs=0 l=1 > > c=0x1e070840).accept replacing existing (lossy) channel (new one lossy=1) > > > > 2014-11-12 01:50:40.805235 7f7813407700 0 -- 10.2.0.36:6819/1 >> > > 10.2.0.36:0/1 pipe(0x1ffd1340 sd=60 :6819 s=0 pgs=0 cs=0 l=1 > > c=0x1b2d6260).accept replacing existing (lossy) channel (new one lossy=1) > > > > 2014-11-12 01:50:40.806364 7f781bc8f700 0 -- 10.2.0.36:6819/1 >> > > 10.2.0.36:0/1 pipe(0x1ffd0b00 sd=162 :6819 s=0 pgs=0 cs=0 l=1 > > c=0x675c580).accept replacing existing (lossy) channel (new one lossy=1) > > > > 2014-11-12 01:50:40.806425 7f781aa7d700 0 -- 10.2.0.36:6830/1 >> > > 10.2.0.36:0/1 pipe(0x1db29600 sd=143 :6830 s=0 pgs=0 cs=0 l=1 > > c=0x1f3d9600).accept replacing existing (lossy) channel (new one lossy=1) > > > > > > > > _______________________________________________ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com