Re: [naviserver-devel] nsthreadtest fails on both Linux and Windows

2014-10-15 Thread Stephen
On Wed, Oct 15, 2014 at 8:24 PM, Andrew Piskorski a...@piskorski.com wrote:

 On Wed, Oct 15, 2014 at 03:45:59AM -0400, Andrew Piskorski wrote:

  In my own application on Windows, my connection thread tells a worker
  thread to do something using nsv, mutexes, and a condition variable.
 
  This is old code that works fine on AOLserver 4.0.x (on a different
  older Windows XP box), but now with Naviserver on Windows 7, the
  worker thread seems to never wake up, and never do anything at all.

 There is definitely a real problem.  For debugging purposes in my
 application, I replaced that one use of ns_cond (there are others)
 with a simpler sleep-and-poll arrangement.  That works - the submit
 thread wakes up after sleeping, sees the data waiting for it in its
 nsv queue, and does its job.

 So I don't know what exactly, but something is seriously broken with
 ns_cond and/or ns_mutex in Naviserver, at least on Windows, possibly
 on Linux as well.

 Btw, for those who haven't used ns_cond before, it is a low-level
 wrapper around standard POSIX condition variable semantics.  By design
 it MUST be used in conjunction with ns_mutex.

 Thus I'm extra interested in the way nsthreadtest crashes, which I
 don't understand.  But failing feedback on that, next I'll probably
 try and write some simpler stand-alone Tcl code to reproduce the My
 thread waiting on ns_cond never wakes up behavior I'm seeing with
 Naviserver.

You have a known good revision: aolserver-4.0.10, a known bad: tip,
and an automated test to distinguish between the two states:
nsthreadtest. Try 'hg bisect ...' to figure out where the error was
introduced.

--
Comprehensive Server Monitoring with Site24x7.
Monitor 10 servers for $9/Month.
Get alerted through email, SMS, voice calls or mobile push notifications.
Take corrective actions from your mobile device.
http://p.sf.net/sfu/Zoho
___
naviserver-devel mailing list
naviserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/naviserver-devel


Re: [naviserver-devel] nsthreadtest fails on both Linux and Windows

2014-10-15 Thread Andrew Piskorski
On Wed, Oct 15, 2014 at 10:24:19PM +0100, Stephen wrote:

 You have a known good revision: aolserver-4.0.10, a known bad: tip,
 and an automated test to distinguish between the two states:
 nsthreadtest. Try 'hg bisect ...' to figure out where the error was
 introduced.

Good idea.  The automatic bisect didn't give me the answer, but it did
get me started down a useful path.  The culprit with nsthreadtest
seems to be, Tcl 8.5!

I typically generate and run the configure script like this:

   AUTOCONF=autoconf2.50 ./autogen.sh --prefix=$inst_dir --enable-symbols 
--enable-rpath --with-tcl=/usr/lib/tcl8.5 --with-zlib=/usr/lib 

Turns out, it doesn't matter what version of the C code I use, and it
doesn't matter if I run autoconf from the tip or use the ancient
configure script from 2005 (which was deleted on 2005-10-09 in rev
1345:f6af29cbe4fb).

All that seems to matter, is that nsthreadtest ALWAYS fails when I
configured with this:
  --with-tcl=/usr/lib/tcl8.5
And if I instead use Tcl 8.4 like this it works fine:
  --with-tcl=/usr/lib/tcl8.4

I can imagine several possible causes for this:

- A bug in Tcl 8.5.
- A bug in how Naviserver uses Tcl 8.5.
- Some side effect of the configure process.
- A broken Tcl 8.5 on my machine(s).

Note that I see the very same sort of nsthreadtest failures on Windows 7.

I never run configure or autoconf on Windows.  Unfortunately there is
some crosstalk with Linux, because the Windows build uses some subset
of the stuff (via NFS) generated by Linux configure.  (That is
horribly ugly, but hasn't seemed to break anything so far and I
haven't gone back and tried to fix it yet.)

On Linux I am using the stock Tcl 8.4 and 8.5 included with Ubuntu
10.04.4 LTS.  On Windows 7 I am using ActiveTcl 8.4 and 8.5 (32-bit),
both installed into the same C:\P\tcl-32\ directory tree.

So, what could possibly be wrong here that shows itself by
nsthreadtest passing with Tcl 8.4, but crashing with 8.5?

Next, I rebuilt my 32-bit Naviserver on Windows with ActiveTcl 8.4,
and its nsthreadtest now passes too!  So Linux and Windows are
consistent, 8.4 nsthreadtest works, 8.5 fails.


Unfortunately, Linux and Windows are consistent in another way - on
both platforms, my new Tcl 8.4 Naviserver build fails in at least two
ways:

One, running Naviserver with the sample conf/nsd-config.tcl file dumps
core.  On Linux that happens while running in NsConfigEval(); the
backtrace is below.

Two, running nsd -c throws a Tcl backtrace:

  bad option ensemble: must be children, code, current, delete, eval, exists, 
export, forget, import, inscope, origin, parent, qualifiers, tail, or which 
  while executing 
  namespace ensemble exists $cmd 

 invoked from within 
  nstrace::statescript 

I will try to debug these new 8.4 failures more tomorrow.


# Linux, Tcl 8.4, nsthreadtest passes but Naviserver fails like this:
warning: Can't read pathname for load map: Input/output error. 
[Thread debugging using libthread_db enabled] 
Using host libthread_db library /lib/x86_64-linux-gnu/libthread_db.so.1. 
Core was generated by `/usr/local/nsd-t84-atp-1/bin/nsd -f -t 
/usr/local/nsd-t84-atp-1/conf/nsd-config'. 
Program terminated with signal 6, Aborted. 
#0  0x7fb77715d425 in __GI_raise (sig=optimized out) 
at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 
64  ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory. 
(gdb) bt 
#0  0x7fb77715d425 in __GI_raise (sig=optimized out) 
at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 
#1  0x7fb777160b8b in __GI_abort () at abort.c:91 
#2  0x7fb77752a4d9 in Panic (fmt=optimized out) at log.c:616 
#3  0x7fb77687f29d in Tcl_PanicVA () from /usr/lib/libtcl8.4.so.0 
#4  0x7fb77687f409 in Tcl_Panic () from /usr/lib/libtcl8.4.so.0 
#5  signal handler called 
#6  0x7fb776862b89 in Tcl_CreateHashEntry () from /usr/lib/libtcl8.4.so.0 
#7  0x7fb777542382 in Ns_TclDestroyInterp (interp=0x1497730) 
at tclinit.c:521 
#8  0x7fb7775193ac in NsConfigEval ( 
config=0x14af160 #set, ' ' repeats 13 times, home, ' ' repeats 16 
times, /usr/local/ns\nset, ' ' repeats 13 times, home, ' ' repeats 16 
times, [file dirname [file dirname [info nameofexec\
utable]]]\n\nns_section \ns/server/default/modules\\nns_param   ..., 
argc=4,  
argv=optimized out, optind=4) at config.c:717 
#9  0x7fb77752c8ac in Ns_Main (argc=4, argv=0x7fff38339a18,  
initProc=0x400750 ServerInit) at nsmain.c:449 
#10 0x7fb77714876d in __libc_start_main (main=0x400660 main, argc=4,  
ubp_av=0x7fff38339a18, init=optimized out, fini=optimized out,  
rtld_fini=optimized out, stack_end=0x7fff38339a08) at libc-start.c:226 
#11 0x00400695 in _start () 
(gdb)  

-- 
Andrew Piskorski a...@piskorski.com

--
Comprehensive Server Monitoring with Site24x7.
Monitor 10 servers for $9/Month.
Get alerted through email, SMS, voice calls or mobile push