Bug#801690: [Pkg-samba-maint] Bug#801690: 'smbstatus -b' leads to broken ctdb cluster

Adi Kriegisch Wed, 04 Nov 2015 01:18:56 -0800

Hi!

Thanks for getting back to me! :)


> > I recently upgraded a samba cluster from Wheezy (with Kernel, ctdb, samba
> > and glusterfs from backports) to Jessie. The cluster itself is way older
> > and basically always worked. Since the upgrade to Jessie 'smbstatus -b'
> > (almost always) just hangs the whole cluster; I need to interrupt the call
> > with ctrl+c (or run with 'timeout 2') to avoid a complete cluster lockup
> > leading to the other cluster nodes being banned and the node I run smbstatus
> > on to have ctdbd run at 100% load but not being able to recover.
> 
> How do you recover then? KILL-ing ctdbd?
Killing the loaded node is the easiest; manual unbanning of the other nodes
is still required. Combinations of enabling and disabling nodes may fix the
situation too.

> > Calling 'smbstatus --locks' and 'smbstatus --shares' works just fine.
> 
> Have you tried which of --processes, --notify hangs? Does it hangs
> with "-b --fast"?
Ah, I missed that: '--brief --fast' works just fine. So obviously the
validation does not work...

> > 'strace'ing ctdbd leads to a massive amount of these messages:
> >   | 
> > write(58,"\240\4\0\0BDTC\1\0\0\0\215U\336\25\5\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
> >   |                          1184) = -1 EAGAIN (Resource temporarily 
> > unavailable)
> 
> fd 58 is probably the ctdb socket. Can you confirm?
Right.

> To have more usefull info, can you install gdb, ctdb-dbg and samba-dbg
> and send the stacktrace of ctdbd at the write?
Ok, I will report back the stack traces in a few days (I'm afraid I can
only do these during the weekend).

All the best,
        Adi

signature.asc
Description: Digital signature

Bug#801690: [Pkg-samba-maint] Bug#801690: 'smbstatus -b' leads to broken ctdb cluster

Reply via email to