Here are the first 33k lines or so:
https://dl.dropboxusercontent.com/u/104949139/ceph-osd-log.txt

This is a different (but more or less identical) machine from the past set
of logs.  This system doesn't have quite as many drives in it, so I
couldn't spot a same-host error burst, but it's logging tons of the same
errors while trying to talk to 10.2.0.34.

On Wed Nov 12 2014 at 10:47:30 AM Gregory Farnum <g...@gregs42.com> wrote:

> On Tue, Nov 11, 2014 at 6:28 PM, Scott Laird <sc...@sigkill.org> wrote:
> > I'm having a problem with my cluster.  It's running 0.87 right now, but I
> > saw the same behavior with 0.80.5 and 0.80.7.
> >
> > The problem is that my logs are filling up with "replacing existing
> (lossy)
> > channel" log lines (see below), to the point where I'm filling drives to
> > 100% almost daily just with logs.
> >
> > It doesn't appear to be network related, because it happens even when
> > talking to other OSDs on the same host.
>
> Well, that means it's probably not physical network related, but there
> can still be plenty wrong with the networking stack... ;)
>
> > The logs pretty much all point to
> > port 0 on the remote end.  Is this an indicator that it's failing to
> resolve
> > port numbers somehow, or is this normal at this point in connection
> setup?
>
> That's definitely unusual, but I'd need to see a little more to be
> sure if it's bad. My guess is that these pipes are connections from
> the other OSD's Objecter, which is treated as a regular client and
> doesn't bind to a socket for incoming connections.
>
> The repetitive channel replacements are concerning, though — they can
> be harmless in some circumstances but this looks more like the
> connection is simply failing to establish and so it's retrying over
> and over again. Can you restart the OSDs with "debug ms = 10" in their
> config file and post the logs somewhere? (There is not really any
> documentation available on what they mean, but the deeper detail ones
> might also be more understandable to you.)
> -Greg
>
> >
> > The systems that are causing this problem are somewhat unusual; they're
> > running OSDs in Docker containers, but they *should* be configured to
> run as
> > root and have full access to the host's network stack.  They manage to
> work,
> > mostly, but things are still really flaky.
> >
> > Also, is there documentation on what the various fields mean, short of
> > digging through the source?  And how does Ceph resolve OSD numbers into
> > host/port addresses?
> >
> >
> > 2014-11-12 01:50:40.802604 7f7828db8700  0 -- 10.2.0.36:6819/1 >>
> > 10.2.0.36:0/1 pipe(0x1ce31c80 sd=135 :6819 s=0 pgs=0 cs=0 l=1
> > c=0x1e070580).accept replacing existing (lossy) channel (new one lossy=1)
> >
> > 2014-11-12 01:50:40.802708 7f7816538700  0 -- 10.2.0.36:6830/1 >>
> > 10.2.0.36:0/1 pipe(0x1ff61080 sd=120 :6830 s=0 pgs=0 cs=0 l=1
> > c=0x1f3db2e0).accept replacing existing (lossy) channel (new one lossy=1)
> >
> > 2014-11-12 01:50:40.803346 7f781ba8d700  0 -- 10.2.0.36:6819/1 >>
> > 10.2.0.36:0/1 pipe(0x1ce31180 sd=125 :6819 s=0 pgs=0 cs=0 l=1
> > c=0x1e070420).accept replacing existing (lossy) channel (new one lossy=1)
> >
> > 2014-11-12 01:50:40.803944 7f781996c700  0 -- 10.2.0.36:6830/1 >>
> > 10.2.0.36:0/1 pipe(0x1ff618c0 sd=107 :6830 s=0 pgs=0 cs=0 l=1
> > c=0x1f3d8420).accept replacing existing (lossy) channel (new one lossy=1)
> >
> > 2014-11-12 01:50:40.804185 7f7816538700  0 -- 10.2.0.36:6819/1 >>
> > 10.2.0.36:0/1 pipe(0x1ffd1e40 sd=20 :6819 s=0 pgs=0 cs=0 l=1
> > c=0x1e070840).accept replacing existing (lossy) channel (new one lossy=1)
> >
> > 2014-11-12 01:50:40.805235 7f7813407700  0 -- 10.2.0.36:6819/1 >>
> > 10.2.0.36:0/1 pipe(0x1ffd1340 sd=60 :6819 s=0 pgs=0 cs=0 l=1
> > c=0x1b2d6260).accept replacing existing (lossy) channel (new one lossy=1)
> >
> > 2014-11-12 01:50:40.806364 7f781bc8f700  0 -- 10.2.0.36:6819/1 >>
> > 10.2.0.36:0/1 pipe(0x1ffd0b00 sd=162 :6819 s=0 pgs=0 cs=0 l=1
> > c=0x675c580).accept replacing existing (lossy) channel (new one lossy=1)
> >
> > 2014-11-12 01:50:40.806425 7f781aa7d700  0 -- 10.2.0.36:6830/1 >>
> > 10.2.0.36:0/1 pipe(0x1db29600 sd=143 :6830 s=0 pgs=0 cs=0 l=1
> > c=0x1f3d9600).accept replacing existing (lossy) channel (new one lossy=1)
> >
> >
> >
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to