Thanks.

This does indeed seem to fix the test case I sent - however I'm now getting
a segmentation fault during the "make test".

It's during the second ns_proxy 5.10 test (there are two 5.10s) - "test
ns_proxy-5.10 {check killing active proxy}".

ns_close is being called on an invalid SlavePtr during ns_proxy cleanup.
Is this something platform specific?

[15/May/2017:13:56:37][1717.7ffff543a700][-command-] Notice: 5.10
[15/May/2017:13:56:38][1717.7fffc99dc700][-nsproxy:reap-] Warning: nsproxy:
zombie: 1782
[15/May/2017:13:56:38][1717.7fffc99dc700][-nsproxy:reap-] Warning:
[testpool]: pid 1785 won't die, send signal 9
[15/May/2017:13:56:38][1717.7ffff543a700][-command-] Notice: releasing busy
proxy testpool-8

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff543a700 (LWP 1722)]
0x00007fffeebe8ba8 in ReleaseProxy (interp=interp@entry=0x7ffff000cf80,
proxyPtr=0x7ffff0456c60) at nsproxylib.c:3261
3261            ns_close(proxyPtr->slavePtr->rfd);
(gdb) print proxyPtr
$1 = (Proxy *) 0x7ffff0456c60
(gdb) print proxyPtr->slavePtr
$2 = (Slave *) 0x0
(gdb) print *proxyPtr->slavePtr
Cannot access memory at address 0x0
(gdb) list
3256                result = Eval(interp, proxyPtr, Tcl_DStringValue(&ds),
-1);
3257            }
3258            Tcl_DStringFree(&ds);
3259        } else if (proxyPtr->state == Busy) {
3260            Ns_Log(Notice, "releasing busy proxy %s", proxyPtr->id);
3261            ns_close(proxyPtr->slavePtr->rfd);
3262            proxyPtr->slavePtr->rfd = NS_INVALID_FD;
3263        }
3264        if (proxyPtr->cmdToken != NULL) {
3265            /*
(gdb) bt
#0  0x00007fffeebe8ba8 in ReleaseProxy (interp=interp@entry=0x7ffff000cf80,
proxyPtr=0x7ffff0456c60) at nsproxylib.c:3261
#1  0x00007fffeebe8d5c in ReleaseHandles (interp=0x7ffff000cf80,
idataPtr=<optimized out>) at nsproxylib.c:3381
#2  0x00007fffeebe6876 in ProxyObjCmd (data=0x7ffff0024ab0,
interp=0x7ffff000cf80, objc=2, objv=0x7ffff001bcd8)
    at nsproxylib.c:1668
#3  0x00007ffff71bae59 in ?? () from /usr/lib/x86_64-linux-gnu/libtcl8.5.so
#4  0x00007ffff720195e in ?? () from /usr/lib/x86_64-linux-gnu/libtcl8.5.so
#5  0x00007ffff7200897 in ?? () from /usr/lib/x86_64-linux-gnu/libtcl8.5.so
#6  0x00007ffff71bc6e6 in TclEvalObjEx () from /usr/lib/x86_64-linux-gnu/
libtcl8.5.so
#7  0x00007ffff724442f in ?? () from /usr/lib/x86_64-linux-gnu/libtcl8.5.so




On 13 May 2017 at 19:53, Gustaf Neumann <neum...@wu.ac.at> wrote:

> Hi David,
>
> i've committed a version to bitbucket, that should address the problem.
> Here is what's seems to happen:
> a) in your example, you are sending exec commands with huge output and a
> eval-timeout  of 1ms
> b) NaviServer stops it side the eval more or less immediately (after one
> ms)
> c) the slave still tries to send the data, but runs into a blocking
> write operation
> d) the slave does react in this state by "normal" interactions, causing
> the hang.
>
> After my change, NaviServer closes in ReleaseProxy() its end manually,
> in case the slave is busy.
> -g
> PS: it is clear, that we do not see in our production environment this
> problem, since we are not using an eval timeout.
>
>
> ------------------------------------------------------------
> ------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> naviserver-devel mailing list
> naviserver-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/naviserver-devel
>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
naviserver-devel mailing list
naviserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/naviserver-devel

Reply via email to