Has anyone got any ideas on debugging this issue?
Turns out it's causing us many more operational issues that we first
thought...
Regards,
--
David
On 18 April 2017 at 15:44, David Osborne <da...@qcode.co.uk> wrote:
> Hi,
>
> We're encountering an occasional problem which means calls to ns_proxy
> hang forever.
> It seems to be when an ns_proxy eval command timeouts during evaluation &
> it produces a large amount of output.
> Under those circumstances ns_proxy becomes inoperable.
>
> Following is a contrived test case which I think demonstrates what we are
> experiencing.
> (The ns_proxy timeout here is set unrealistically small to force the
> timeout - in production the timeout is 1 second)
>
>
> Normal test case using command:
>
> ns_proxy eval $handle [list exec dd if=/dev/urandom bs=1K count=23 | base64] 1
>
>
> % set handle [ns_proxy get exec]
> [07/Apr/2017:13:06:57][28009.7fa7b760b700][-command-] Debug(nsproxy): set
> expire in 300000 ms for pool exec slave 28197
> [07/Apr/2017:13:06:57][28009.7fa7b760b700][-command-] Debug(nsproxy):
> nsproxy: slave 28197 started
> [07/Apr/2017:13:06:57][28197.7f3f5dcc3700][-main-] Notice: OpenSSL 1.0.1t
> 3 May 2016 initialized
> [07/Apr/2017:13:06:57][28009.7fa78bfff700][-nsproxy:reap-] Notice:
> starting
> [07/Apr/2017:13:06:57][28009.7fa78bfff700][-nsproxy:reap-]
> Debug(nsproxy): reaper run
> [07/Apr/2017:13:06:57][28009.7fa78bfff700][-nsproxy:reap-]
> Debug(nsproxy): reaper checks pool exec
> [07/Apr/2017:13:06:57][28009.7fa78bfff700][-nsproxy:reap-]
> Debug(nsproxy): reaper sets timeout based on idle diff 1491567117.835940 of
> pool exec
> [07/Apr/2017:13:06:57][28009.7fa78bfff700][-nsproxy:reap-]
> Debug(nsproxy): reaper waits for cond with timeout 1491567117.835940
> exec-7
>
> % set result [ns_proxy eval $handle [list exec dd if=/dev/urandom bs=1K
> count=23 | base64] 1]
> [07/Apr/2017:13:07:09][28009.7fa7b760b700][-command-] Debug(nsproxy):
> proxy send pool exec slave 28197: exec dd if=/dev/urandom bs=1K count=23 |
> base64
> could not wait for proxy "exec-7": timeout waiting for evaluation
>
> % ns_proxy active exec
> {handle exec-7 slave 28197 start 1491566829:518164 script {exec dd
> if=/dev/urandom bs=1K count=23 | base64}}
>
> % ns_proxy cleanup
> [07/Apr/2017:13:07:26][28009.7fa7b760b700][-command-] Debug(nsproxy):
> nsproxy [exec]: close slave 28197 (expire 1000 ms)
> [07/Apr/2017:13:07:26][28009.7fa7b760b700][-command-] Debug(nsproxy): set
> expire in 1000 ms for pool exec slave 28197
> [07/Apr/2017:13:07:26][28009.7fa7b760b700][-command-] Debug(nsproxy):
> nsproxy [exec]: slave 28197 closed
> [07/Apr/2017:13:07:26][28009.7fa78bfff700][-nsproxy:reap-]
> Debug(nsproxy): reaper run
> [07/Apr/2017:13:07:26][28009.7fa78bfff700][-nsproxy:reap-]
> Debug(nsproxy): reaper checks pool exec
> [07/Apr/2017:13:07:26][28009.7fa78bfff700][-nsproxy:reap-]
> Debug(nsproxy): reaper sets timeout based on idle diff 1491567146.613112 of
> pool exec
> [07/Apr/2017:13:07:26][28009.7fa78bfff700][-nsproxy:reap-]
> Debug(nsproxy): reaper waits for cond with timeout 1491567146.613112
>
>
> Hanging test case using command (note: same command except with slightly
> larger output):
>
> ns_proxy eval $handle [list exec dd if=/dev/urandom bs=1K count=24 | base64] 1
>
>
> % set handle [ns_proxy get exec]
> [07/Apr/2017:13:08:13][28009.7fa7b760b700][-command-] Debug(nsproxy): set
> expire in 300000 ms for pool exec slave 28968
> [07/Apr/2017:13:08:13][28009.7fa7b760b700][-command-] Debug(nsproxy):
> nsproxy: slave 28968 started
> [07/Apr/2017:13:08:13][28968.7f0059a9e700][-main-] Notice: OpenSSL 1.0.1t
> 3 May 2016 initialized
> [07/Apr/2017:13:08:13][28009.7fa78bfff700][-nsproxy:reap-]
> Debug(nsproxy): reaper run
> [07/Apr/2017:13:08:13][28009.7fa78bfff700][-nsproxy:reap-]
> Debug(nsproxy): reaper checks pool exec
> [07/Apr/2017:13:08:13][28009.7fa78bfff700][-nsproxy:reap-]
> Debug(nsproxy): reaper sets timeout based on idle diff 1491567193.223279 of
> pool exec
> [07/Apr/2017:13:08:13][28009.7fa78bfff700][-nsproxy:reap-]
> Debug(nsproxy): reaper waits for cond with timeout 1491567193.223279
> exec-7
>
> % set result [ns_proxy eval $handle [list exec dd if=/dev/urandom bs=1K
> count=24 | base64] 1]
> [07/Apr/2017:13:08:21][28009.7fa7b760b700][-command-] Debug(nsproxy):
> proxy send pool exec slave 28968: exec dd if=/dev/urandom bs=1K count=24 |
> base64
> could not wait for proxy "exec-7": timeout waiting for evaluation
>
> % ns_proxy active exec
> {handle exec-7 slave 28968 start 1491566901:234806 script {exec dd
> if=/dev/urandom bs=1K count=24 | base64}}
>
> % ns_proxy cleanup
> [07/Apr/2017:13:08:40][28009.7fa7b760b700][-command-] Debug(nsproxy):
> nsproxy [exec]: close slave 28968 (expire 1000 ms)
> [07/Apr/2017:13:08:40][28009.7fa7b760b700][-command-] Debug(nsproxy): set
> expire in 1000 ms for pool exec slave 28968
> [07/Apr/2017:13:08:40][28009.7fa7b760b700][-command-] Debug(nsproxy):
> nsproxy [exec]: slave 28968 closed
> [07/Apr/2017:13:08:40][28009.7fa78bfff700][-nsproxy:reap-]
> Debug(nsproxy): reaper run
> [07/Apr/2017:13:08:40][28009.7fa78bfff700][-nsproxy:reap-]
> Debug(nsproxy): reaper checks pool exec
> [07/Apr/2017:13:08:40][28009.7fa78bfff700][-nsproxy:reap-]
> Debug(nsproxy): reaper sets timeout based on idle diff 1491567220.342132 of
> pool exec
>
> ns_proxy cleanup never returns.
>
>
> The location of the hang is during waitpid() called via Ns_WaitProcess
> <https://bitbucket.org/naviserver/naviserver/src/5b852119ea5f3e3e796eb9385c22162a070a6b24/nsproxy/nsproxylib.c?at=default&fileviewer=file-view-default#nsproxylib.c-2935>
>
> The system seems to think that the process hasn't exited.
> Any ideas what is going on here?
>
>
> Regards,
> --
> David
>
>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
naviserver-devel mailing list
naviserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/naviserver-devel