I guess the point I was trying to make is that, ideally, Ceph would isolate
its logging system in a way that a problem with writing the logs wouldn't
affect the operation of the core Ceph services.

In my case, all other services running on the machine (ssh, ntp, cron,
etc.) are operating normally, even though the logs might not be getting
pushed out to the central syslog servers.

On Wed, Jul 27, 2016 at 4:49 AM, Brad Hubbard <bhubb...@redhat.com> wrote:

> On Tue, Jul 26, 2016 at 03:48:33PM +0100, Sergio A. de Carvalho Jr. wrote:
> > As per my previous messages on the list, I was having a strange problem
> in
> > my test cluster (Hammer 0.94.6, CentOS 6.5) where my monitors were
> > literally crawling to a halt, preventing them to ever reach quorum and
> > causing all sort of problems. As it turned out, to my surprise everything
> > went back to normal as soon as I turned off syslog -- special thanks to
> > Sean!
> >
> > The slowdown with syslog on was so severe that logs were being written
> with
> > a timestamp that was several minutes (and eventually up to hours) behind
> > the system clock. The logs from my 4 monitors can be seen in the links
> > below:
> >
> > https://gist.github.com/anonymous/85213467f701c5a69c7fdb4e54bc7406
> > https://gist.github.com/anonymous/f30a8903e701423825fd4d5aaa651e6a
> > https://gist.github.com/anonymous/42a1856cc819de5b110d9f887e9859d2
> > https://gist.github.com/anonymous/652bc41197e83a9d76cf5b2e6a211aa2
> >
> > I'm still trying to understand what is going on with my syslog servers
> but
> > I was wondering... is this a known/documented issue?
>
> If it is it would be known/documented by the syslog community right?
>
> >
> > Luckily this was a test cluster but I'm worried I could hit this on a
> > production cluster any time soon, and I'm wondering how I could detect it
> > before my support engineers loose their minds.
>
> This does not appear to be a ceph-specific issue and would likely affect
> any
> daemon that logs to syslog right?
>
> One thing you could try is running strace against the MON to see what
> system
> calls are taking a long time and extrapolate from there. The procedure
> would
> be the same if things were being held up by a slow disk (for whatever
> reason)
> or filesystem, etc. This is just a standard performance problem and not a
> ceph-specific issue.
>
> >
> > Thanks,
> >
> > Sergio
>
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> --
> Cheers,
> Brad
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to