Re: [naviserver-devel] scheduler thread getting stuck
On Sun, Jun 14, 2020 at 02:44:35PM +0200, Gustaf Neumann wrote: > i fixed two more bugs for win64 (see [1]). The most complex > case was handling thread results (threads returning string values). Wow, great news Gustaf, thank you very much. > @Andrew: are the still show-stoppers for you, > which have to be fixed urgently? No, everything for my application is working well now! I've been running your 2020-06-07 time_t Ns_Time fix for a week, that completely fixed the problem with the scheduler thread getting stuck. I just recently upgraded to your 06-14 fixes as well. The other Windows regression test failures don't seem to affect me, but I'll still put some time into some them if/when I come up with any better ideas for how to debug them. And once Ibrahim Tannir gets his nsproxy fixes ready, I can certainly try them. -- Andrew Piskorski ___ naviserver-devel mailing list naviserver-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/naviserver-devel
Re: [naviserver-devel] scheduler thread getting stuck
Dear all, i fixed two more bugs for win64 (see [1]). The most complex case was handling thread results (threads returning string values). While pthread_exit() receives a pointer value (64 bit), the native windows counter part _endthreadex() receives just a 32 bit value (both, on win32 and win64). Since the received result is used for setting the result-obj, the value truncation caused many bad things to happen. This could never have worked with win64 before. Now all the tests of ns_thread.test should work correctly. @Andrew: are the still show-stoppers for you, which have to be fixed urgently? -gn [1] https://bitbucket.org/naviserver/naviserver/commits/9c48894ae8e433aa4dfbe5473e9553f796ec24bd On 08.06.20 17:46, Gustaf Neumann wrote: No change to the other failing tests, nor to the ones that we're currently skipping with the notWin32 constraint. E.g., test ns_thread-2.6 still triggers this: Assertion failed: tid != NULL, file tclthread.c, line 238 i am not surprised, since i have not changed anything around this. ___ naviserver-devel mailing list naviserver-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/naviserver-devel
Re: [naviserver-devel] scheduler thread getting stuck
On 08.06.20 19:39, Andrew Piskorski wrote: On Windows there are still a few compiler warnings that look a little suspicious (below), but I don't see any good way to fix these. it is not hard to silence these cases (at least one of these appeared multiple times on stackoverflow), but these are not related to the errors you have reported. i hope, the next weekend, i can get a better PC for continuing on this. -g ___ naviserver-devel mailing list naviserver-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/naviserver-devel
Re: [naviserver-devel] scheduler thread getting stuck
On Mon, Jun 08, 2020 at 05:46:54PM +0200, Gustaf Neumann wrote: > >Assertion failed: tid != NULL, file tclthread.c, line 238 > You might check whether "ns_thread handle" > in a classical setup (e.g. in a ds/shell) thows the same exception. Good idea. I started up NaviServer with the same test.nscfg config file, but using the installed binaries instead of the "nmake -f Makefile.win32 _test" approach. Then I typed "ns_thread handle" at the control port prompt. That threw the exact same exception as before. Under WinDbg it also looks the same, inside Ns_ThreadSelf() wPtr appears to be defined, but threadPtr and wPtr->self are null. -- Andrew Piskorski ___ naviserver-devel mailing list naviserver-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/naviserver-devel
Re: [naviserver-devel] scheduler thread getting stuck
On Windows there are still a few compiler warnings that look a little suspicious (below), but I don't see any good way to fix these. cl /W3 /nologo /c /EHsc /MDd /Od /Zi /RTC1 /I "..\include" /I "C:\P\OpenSSL-Win64\include" /I "C:\P\Tcl-64-8.6\include" /D "_WINDOWS" /D "TCL_THREADS=1" /D "FD_SETSIZE=128" /D "_MBCS" /D _CRT_SECURE_NO_WARNINGS /D _CRT_SECURE_NO_DEPRECATE /D "_DEBUG" /c /Foexec.o exec.c exec.c(154): warning C4312: 'type cast': conversion from 'pid_t' to 'HANDLE' of greater size exec.c(371): warning C4311: 'type cast': pointer truncation from 'HANDLE' to 'pid_t' cl /W3 /nologo /c /EHsc /MDd /Od /Zi /RTC1 /I "..\include" /I "C:\P\OpenSSL-Win64\include" /I "C:\P\Tcl-64-8.6\include" /D "_WINDOWS" /D "TCL_THREADS=1" /D "FD_SETSIZE=128" /D "_MBCS" /D _CRT_SECURE_NO_WARNINGS /D _CRT_SECURE_NO_DEPRECATE /D "_DEBUG" /c /Fotls.o tls.c tls.c(228): warning C4244: 'function': conversion from 'SOCKET' to 'int', possible loss of data tls.c(376): warning C4244: 'function': conversion from 'SOCKET' to 'int', possible loss of data cl /W3 /nologo /c /EHsc /MDd /Od /Zi /RTC1 /I "..\include" /I "C:\P\OpenSSL-Win64\include" /I "C:\P\Tcl-64-8.6\include" /D "_WINDOWS" /D "TCL_THREADS=1" /D "FD_SETSIZE=128" /D "_MBCS" /D _CRT_SECURE_NO_WARNINGS /D _CRT_SECURE_NO_DEPRECATE /D "_DEBUG" /c /Fotclcrypto.o tclcrypto.c tclcrypto.c(592): warning C4090: 'initializing': different 'const' qualifiers tclcrypto.c(656): warning C4090: 'initializing': different 'const' qualifiers tclcrypto.c(711): warning C4090: 'initializing': different 'const' qualifiers tclcrypto.c(822): warning C4090: 'initializing': different 'const' qualifiers tclcrypto.c(955): warning C4090: 'initializing': different 'const' qualifiers tclcrypto.c(1011): warning C4090: 'initializing': different 'const' qualifiers tclcrypto.c(1068): warning C4090: 'initializing': different 'const' qualifiers -- Andrew Piskorski ___ naviserver-devel mailing list naviserver-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/naviserver-devel
Re: [naviserver-devel] scheduler thread getting stuck
On 08.06.20 16:32, Andrew Piskorski wrote: On Mon, Jun 08, 2020 at 12:04:59PM +0200, Gustaf Neumann wrote: So, i have modified the code to use "time_t" for the "sec" member, ... and many of the warnings disappeared. That's a big improvement, thank you, Gustaf! The 22 regression tests below used to fail, but now pass! good news! No change to the other failing tests, nor to the ones that we're currently skipping with the notWin32 constraint. E.g., test ns_thread-2.6 still triggers this: Assertion failed: tid != NULL, file tclthread.c, line 238 i am not surprised, since i have not changed anything around this. The problem might have to to do with the different way of the setup for tests. You might check whether "ns_thread handle" in a classical setup (e.g. in a ds/shell) thows the same exception. -gn ___ naviserver-devel mailing list naviserver-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/naviserver-devel
Re: [naviserver-devel] scheduler thread getting stuck
On Mon, Jun 08, 2020 at 12:04:59PM +0200, Gustaf Neumann wrote: > So, i have modified the code to use "time_t" for the "sec" member, > ... and many of the warnings disappeared. That's a big improvement, thank you, Gustaf! The 22 regression tests below used to fail, but now pass! No change to the other failing tests, nor to the ones that we're currently skipping with the notWin32 constraint. E.g., test ns_thread-2.6 still triggers this: Assertion failed: tid != NULL, file tclthread.c, line 238 ## Gustaf's 2020-06-07 changes fixed these test failures: ns_schedule-2.1 schedule proc: interval FAILED ns_time-1.2ms ns_time incr timeunit float+ms int FAILED ns_time-1.3ms ns_time incr timeunit int+ms int FAILED ns_time-1.3?s ns_time incr timeunit int+ms int FAILED ns_time-1.4-100ms ns_time incr timeunit 100ms int FAILED ns_time-1.4-10ms ns_time incr timeunit 10ms int FAILED ns_time-1.4-1ms ns_time incr timeunit 1ms int FAILED ns_time-1.4-0.1ms ns_time incr timeunit 0.1ms int FAILED ns_time-1.4-0.01ms ns_time incr timeunit 0.01ms int FAILED ns_time-1.4-0.001ms ns_time incr timeunit 0.001ms int FAILED ns_time-format-1.2 ns_time format positive microsecond FAILED ns_time-format-2.1 ns_time format negative second FAILED ns_time-format-2.2 ns_time format negative second with fraction FAILED ns_time-format-2.4 ns_time format negative microsecond FAILED ns_time-format-2.4-0.001ms ns_time format negative microsecond FAILED ns_time-diff-1 ns_time diff simple FAILED ns_time-diff-2 ns_time diff requires adjust FAILED ns_time-diff-3 ns_time diff subtract nothing FAILED ns_time-diff-4 ns_time diff add 1ms FAILED ns_time-diff-5 ns_time diff turn positive to negative FAILED ns_time-diff-6 ns_time diff make negative more negative FAILED ns_time-diff-9 ns_time diff turn negative to positive FAILED -- Andrew Piskorski ___ naviserver-devel mailing list naviserver-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/naviserver-devel
Re: [naviserver-devel] scheduler thread getting stuck
On 04.06.20 17:26, Gustaf Neumann wrote: This sounds indeed related with the original problem. The test registers a repeating proc (interval 1s), but within in the time-range of 2.5s, it is executed only once. ... maybe i get on the weekend some access to a win environent. i could use a windows machine over the weekend, but unfortunately, this was very limited (windows 7, very small hd). However, i was able to set everything up to be able to compile NaviServer with msvc, but i was not able to run the regression tests (path to long, etc.). When compiling with x64, there were many warnings concerning the "sec" member in Ns_Time, which is defined as long. Due to the memory model in windows 64 bit (LLP64) a long is there 32 bit, ... but an ns_time (e.g. the result of time()) is 64 bit. This value is often supplied to the "sec" member. So, i have modified the code to use "time_t" for the "sec" member, ... and many of the warnings disappeared. Most other 64bit OS use LP64 (long is 64 bit), where assigning time_t to long was not an issue. This change will not solve all of the issues you are experiencing, bit it might improve the situation for a few. Background: The problem with LLP64 and using "long" for sec is not new, many of the operations on Ns_Time were most likely never working correctly under win64. But they started to show up as a problem lately, since the newer code relies more on this functions working correctly (among other things, in the scheduler). Hope that these changes helped a little. all the best -gn ___ naviserver-devel mailing list naviserver-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/naviserver-devel
Re: [naviserver-devel] scheduler thread getting stuck
On Thu, Jun 04, 2020 at 04:55:10PM -0400, Andrew Piskorski wrote: > On Thu, Jun 04, 2020 at 05:26:04PM +0200, Gustaf Neumann wrote: > > Probably "Ns_ThreadSelf();" does not work under windows (get the > > id of the current thread). Ns_ThreadSelf() is defined in the OS specific > > part (winthread.c). The exception is probably coming from > > test thread-2.3, it looks to me as if the the thread (here the > > thread running the tests) is not properly initiated under windows. > Assertion failed: tid != NULL, file tclthread.c, line 238 For debugging, I turned test ns_thread-2.6 back on, and added an assertion inside Ns_ThreadSelf(), so that it was basically doing this: void Ns_ThreadSelf(Ns_Thread *threadPtr) { WinThread *wPtr = TlsGetValue(tlskey); *threadPtr = (Ns_Thread) wPtr->self; assert(NULL != *threadPtr); } That got it to break into Microsoft's WinDbg debugger there, rather than later in NsTclThreadObjCmd(). The global "tlskey" seems to be initialized, and so does the "wPtr" WinThread pointer. But the "wPtr->self" looks like it's NULL, there is no Ns_Thread structure stored there! I see the wPtr WinThread allocation code in DllMain(). That seems to be working fine. The wPtr->self Ns_Thread stuff gets set up in NsCreateThread() and ThreadMain(), but I don't really understand understand what that code is doing. Is that the place where something is going wrong? Btw, The tlskey TLS index value looks like it's a 64-bit DWORD (unsigned integer), not 32-bit. So I think Nsthreads_LibInit() should be checking for TLS_OUT_OF_INDEXES, not the 0x (decimal 4294967295, the maximum size of a 32-bit DWORD) it's been checking for since ancient times. That's probably a small bug, but it's not the cause of the problems here. WinDbug output: -- 0:001> .frame 7 07 `0062f150 07fe`ed490aae nsthread!Ns_ThreadSelf+0x89 [Z:\src\web\ns-fork-pub\naviserver\nsthread\winthread.c @ 848] 0:001> dt wPtr Local var @ 0x62f170 Type WinThread* 0x`004706d0 +0x000 nextPtr : (null) +0x008 wakeupPtr: (null) +0x010 self : (null) +0x018 event: 0x`015c Void +0x020 condwait : 0n0 +0x028 slots: [100] (null) 0:001> dt threadPtr Local var @ 0x62f190 Type Ns_Thread_** 0x`0062f208 -> (null) 0:001> ? tlskey Evaluate expression: 8791677299988 = 07fe`f8cd6d14 0:001> .formats tlskey Evaluate expression: Hex: 07fe`f8cd6d14 Decimal: 8791677299988 Octal: 000177737063266424 Binary: 0111 1110 1000 11001101 01101101 00010100 Chars: ..m. Time:Thu Jan 11 00:12:47.729 1601 (UTC - 4:00) Float: low -3.33323e+034 high 2.86706e-042 Double: 4.34367e-311 0:001> dt -v tlskey Got address 07fef8cd6d14 for symbol nsthread!tlskey 7 0:001> kb : Call Site : ucrtbased!issue_debug_notification+0x45 [minkernel\crts\ucrt\src\appcrt\internal\report_runtime_error.cpp @ 28] : ucrtbased!__acrt_report_runtime_error+0x13 [minkernel\crts\ucrt\src\appcrt\internal\report_runtime_error.cpp @ 154] : ucrtbased!abort+0x1d [minkernel\crts\ucrt\src\appcrt\startup\abort.cpp @ 61] : ucrtbased!common_assert_to_stderr_direct+0xe5 [minkernel\crts\ucrt\src\appcrt\startup\assert.cpp @ 161] : ucrtbased!common_assert_to_stderr+0x27 [minkernel\crts\ucrt\src\appcrt\startup\assert.cpp @ 179] : ucrtbased!common_assert+0x68 [minkernel\crts\ucrt\src\appcrt\startup\assert.cpp @ 420] : ucrtbased!_wassert+0x2f [minkernel\crts\ucrt\src\appcrt\startup\assert.cpp @ 444] : nsthread!Ns_ThreadSelf+0x89 [Z:\src\web\ns-fork-pub\naviserver\nsthread\winthread.c @ 848] : libnsd!NsTclThreadObjCmd+0x42e [Z:\src\web\ns-fork-pub\naviserver\nsd\tclthread.c @ 238] : tcl86t!TclNRRunCallbacks+0x63 : tcl86t!Tcl_EvalEx+0x9dd : tcl86t!Tcl_FSEvalFileEx+0x223 : tcl86t!Tcl_MainEx+0x4be : libnsd!CmdThread+0x6e [Z:\src\web\ns-fork-pub\naviserver\nsd\nsmain.c @ 1333] : nsthread!NsThreadMain+0x77 [Z:\src\web\ns-fork-pub\naviserver\nsthread\thread.c @ 236] : nsthread!ThreadMain+0x6c [Z:\src\web\ns-fork-pub\naviserver\nsthread\winthread.c @ 880] : ucrtbased!thread_start+0x9c [minkernel\crts\ucrt\src\appcrt\startup\thread.cpp @ 97] : kernel32!BaseThreadInitThunk+0xd : ntdll!RtlUserThreadStart+0x1d -- -- Andrew Piskorski ___ naviserver-devel mailing list naviserver-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/naviserver-devel
Re: [naviserver-devel] scheduler thread getting stuck
On Thu, Jun 04, 2020 at 04:55:10PM -0400, Andrew Piskorski wrote: > Yes, with your new change, when running ns_thread.test on Windows I > now always get this: > > Assertion failed: tid != NULL, file tclthread.c, line 238 A bunch of different tests seem to trigger that assertion failure. However, it does seem to be the only thing in the tests that causes crashes, which is good. In my latest code here, I used the new "notWin32" tcltest contstraint to turn off all the tests that tend to trigger that assertion: https://bitbucket.org/apiskors/naviserver/commits/ That let's me run the rest of the regression tests to completion, with the summary results below. Is there someplace I should upload or attach the full test output? It's about 5k lines and 3 megabytes. Tests ended at Fri Jun 05 13:32:03 EDT 2020 all.tcl:Total 1569Passed 1376Skipped 39 Failed 154 Sourced 70 Test Files. Files with failing tests: encoding.test http.test http_byteranges.test http_chunked.test http_keep.test ns_adp_compress.test ns_base64.test ns_driver.test ns_hostbyaddr.test ns_httptime.test ns_info.test ns_log.test ns_proxy.test ns_schedule.test ns_time.test ns_urlencode.test tclconnio.test tclresp.test Number of tests skipped for each constraint: 2 binaryMismatch 5 curl 2 knownBug 1 notDarwin 28 notWin32 1 stress -- Andrew Piskorski ___ naviserver-devel mailing list naviserver-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/naviserver-devel
Re: [naviserver-devel] scheduler thread getting stuck
On Fri, May 15, 2020 at 10:37:15AM +0200, Gustaf Neumann wrote: > On 15.05.20 09:21, Andrew Piskorski wrote: > > Previously on Windows I was running NaviServer code from > > c. 2019-07 and an ancient Microsoft compiler from 2010; the problem > > did NOT happen then. > Can you try with the released version 4.99.17 (2018-11-04) > with your new Windows environment? It can be tricky to find an old version of the NaviServer code that builds correctly on Windows. I did successfully build these two older points in the code: commit c31d3a0c4ef60b79c542cacbdc66c9cb53428faa Author: Gustaf Neumann Date: 2019-06-18 20:43:41 +0200 Tue fix prototype of Ns_SockListenCallback in in ns.h (many thanks to Maurizio Martignano) commit 83e8c50a38a6986f3c0468b69e8ef3abd68f926e Author: Gustaf Neumann Date: 2020-01-17 21:03:31 +0100 Fri improve spelling For each of those, first I did a "git checkout VERSION" to the commit version number above. Then I copied the latest makefiles and tests on top of the old code like so: cp -p $NEW_DIR/Makefile.win32 . cp -p $NEW_DIR/include/Makefile.* include/ cp -pr $NEW_DIR/win32-util . cp -pr $NEW_DIR/tests . With that, those two older codebases both compiled on Windows. However, when I then ran the latest regression tests, NaviServer crashed with: Run-Time Check Failure #2 - Stack around the variable 'spoolLimit' was corrupted. (Press Retry to debug the application) That terminated the tests early, of course. It looks like about 93 tests passed and 42 failed before the testing NaviServer crashed. Many of the failed tests did look like the same ones failing on the latest head code. The variable spoolLimit only appears in "nsd/tclhttp.c", so presumably one of the later commits fixed a bug there. But at that point I gave up trying to test the older code. -- Andrew Piskorski ___ naviserver-devel mailing list naviserver-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/naviserver-devel
Re: [naviserver-devel] scheduler thread getting stuck
On Thu, Jun 04, 2020 at 05:26:04PM +0200, Gustaf Neumann wrote: > >Assertion failed: (addr != ((void *)0)), file tclobj.c, line 325 > Probably "Ns_ThreadSelf();" does not work under windows (get the > id of the current thread). Ns_ThreadSelf() is defined in the OS specific > part (winthread.c). The exception is probably coming from > test thread-2.3, it looks to me as if the the thread (here the > thread running the tests) is not properly initiated under windows. > > i have added one more assert, to make it easier to pinpoint the > problem. Yes, with your new change, when running ns_thread.test on Windows I now always get this: Assertion failed: tid != NULL, file tclthread.c, line 238 -- Andrew Piskorski ___ naviserver-devel mailing list naviserver-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/naviserver-devel
Re: [naviserver-devel] scheduler thread getting stuck
On 03.06.20 21:13, Andrew Piskorski wrote: ns_thread.test [03/Jun/2020:14:25:28][4844.13bc][-tcl-nsthread:7-] Notice: update interpreter to epoch 1, trace none, time 0.219973 secs Assertion failed: (addr != ((void *)0)), file tclobj.c, line 325 [03/Jun/2020:14:25:32][4844.1dd8][-tcl-nsthread:8-] Notice: update interpreter to epoch 1, trace none, time 3.902536 secs The stack trace looked like this: ucrtbased.dll!07feedad41cf()Unknown libnsd.dll!Ns_TclSetAddrObj(Tcl_Obj * objPtr, const char * type, void * addr) Line 325 C libnsd.dll!NsTclThreadObjCmd(void * clientData, Tcl_Interp * interp, int objc, Tcl_Obj * const * objv) Line 239 C [External Code] libnsd.dll!CmdThread(void * arg) Line 1333 C nsthread.dll!NsThreadMain(void * arg) Line 236 C nsthread.dll!ThreadMain(void * arg) Line 874C So that was inside Ns_TclSetAddrObj(), probably in the "NS_NONNULL_ASSERT(addr != NULL);" line. It was called from NsTclThreadObjCmd(), in "case THandleIdx", line 238 in tclthread.c. That presumably came from a Tcl "ns_thread handle" call, and there's only one of those in the test suite, "test ns_thread-2.6" on line 70 of "ns_thread.test". But I don't understand why that would throw a null pointer exception! Probably "Ns_ThreadSelf();" does not work under windows (get the id of the current thread). Ns_ThreadSelf() is defined in the OS specific part (winthread.c). The exception is probably coming from test thread-2.3, it looks to me as if the the thread (here the thread running the tests) is not properly initiated under windows. i have added one more assert, to make it easier to pinpoint the problem. ns_listencallback-1.0 register FAILED Contents of test case: This is again one of these low-level socket commands. The ns_schedule-2.1 failure certainly sounds related to my original problem of the scheduler thread getting stuck, but there's enough else going on here that don't have any idea where the real source of the problem might be. This sounds indeed related with the original problem. The test registers a repeating proc (interval 1s), but within in the time-range of 2.5s, it is executed only once. On 03.06.20 23:41, Andrew Piskorski wrote: Weirdly, that stacktrace seems like it must be missing some intermediate function calls, because nsproxy's Ns_ModuleInit() definitely never calls Ns_IncrTime() DIRECTLY. So I'm not sure what's going on there either. this is typical, when the code is compiled with an optimizer. Try to deactivate the optimizer, this will improve the feedback. maybe i get on the weekend some access to a win environent. -gn ___ naviserver-devel mailing list naviserver-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/naviserver-devel
Re: [naviserver-devel] scheduler thread getting stuck
On 03-Jun-20 23:41, Andrew Piskorski wrote: Is nsproxy supposed to work correctly on Windows? I had to make extensive changes in the nsproxy code to make it work on Windows. The code is Unix-centric and makes some false assumptions w.r.t. to Windows handles v.s. unix file descriptors and therefore cannot run in Windows - at least not with native MSDN libraries. I didn't push my changes back into the repository yet, since the changes need to possibly be readjusted and retested for Unix. I will ask Zoran to peer-review, adjust and push my changes, however this will not be before next week. Cheers, Ibrahim ___ naviserver-devel mailing list naviserver-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/naviserver-devel
Re: [naviserver-devel] scheduler thread getting stuck
Is nsproxy supposed to work correctly on Windows? Its test framework wants to use test-nsproxy.sh to set LD_LIBRARY_PATH, which of course can't work on Windows. But as I work around that, when I run just the ns_proxy.test tests, I get this error: Assertion failed: sec >= 0, file time.c, line 344 Which gives this stacktrace when run under the WinDbg debugger: : nsthread!Ns_IncrTime+0x6c [naviserver\nsthread\time.c @ 344] : nsproxy!Ns_ModuleInit+0x7a76 : nsthread!NsThreadMain+0x77 [naviserver\nsthread\thread.c @ 236] : nsthread!ThreadMain+0x6c [naviserver\nsthread\winthread.c @ 874] : ucrtbased!thread_start+0x9c [minkernel\crts\ucrt\src\appcrt\startup\thread.cpp @ 97] : kernel32!BaseThreadInitThunk+0xd : ntdll!RtlUserThreadStart+0x21 Weirdly, that stacktrace seems like it must be missing some intermediate function calls, because nsproxy's Ns_ModuleInit() definitely never calls Ns_IncrTime() DIRECTLY. So I'm not sure what's going on there either. -- Andrew Piskorski ___ naviserver-devel mailing list naviserver-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/naviserver-devel
Re: [naviserver-devel] scheduler thread getting stuck
On Mon, Jun 01, 2020 at 11:08:48AM -0400, Andrew Piskorski wrote: > ## Last test that runs on Windows, it locks up forever: > > http_persistent.test > ns_sockioctl failed: no such file or directory > while executing > "ns_socknread $s" > (procedure "client_readable" line 2) For now I simply moved the entire http_persistent.test file out of the way, so the server skips those tests. With that, test server got further, but eventually crashed with what looks like a null pointer dereference, here: [03/Jun/2020:14:25:28][4844.2b10][-conn:test:default:1:229-] Notice: inside the filter 3.4 ns_serverpath.test ns_set.test ns_sha1.test ns_sls.test ns_striphtml.test ns_thread.test [03/Jun/2020:14:25:28][4844.13bc][-tcl-nsthread:7-] Notice: update interpreter to epoch 1, trace none, time 0.219973 secs Assertion failed: (addr != ((void *)0)), file tclobj.c, line 325 [03/Jun/2020:14:25:32][4844.1dd8][-tcl-nsthread:8-] Notice: update interpreter to epoch 1, trace none, time 3.902536 secs The stack trace looked like this: ucrtbased.dll!07feedad41cf()Unknown libnsd.dll!Ns_TclSetAddrObj(Tcl_Obj * objPtr, const char * type, void * addr) Line 325 C libnsd.dll!NsTclThreadObjCmd(void * clientData, Tcl_Interp * interp, int objc, Tcl_Obj * const * objv) Line 239 C [External Code] libnsd.dll!CmdThread(void * arg) Line 1333 C nsthread.dll!NsThreadMain(void * arg) Line 236 C nsthread.dll!ThreadMain(void * arg) Line 874C So that was inside Ns_TclSetAddrObj(), probably in the "NS_NONNULL_ASSERT(addr != NULL);" line. It was called from NsTclThreadObjCmd(), in "case THandleIdx", line 238 in tclthread.c. That presumably came from a Tcl "ns_thread handle" call, and there's only one of those in the test suite, "test ns_thread-2.6" on line 70 of "ns_thread.test". But I don't understand why that would throw a null pointer exception! Prior to that crash, various other interesting test failures cropped up, including both "ns_listencallback-1.0" and "ns_schedule-2.1" below. The ns_schedule-2.1 failure certainly sounds related to my original problem of the scheduler thread getting stuck, but there's enough else going on here that don't have any idea where the real source of the problem might be. ns_listencallback-1.0 register FAILED Contents of test case: set localhost [expr {[ns_info ipv6] ? "::1" : "127.0.0.1"}] ns_log notice "open sockent on $localhost 7227" set fds [ns_sockopen $localhost 7227] lassign $fds rfd wfd set size 0 if {[gets $rfd line] == -1} { ns_log error "got no data" } else { incr size [string length $line] puts $wfd "How are you?" flush $wfd gets $rfd line incr size [string length $line] } return [list size $size] Result was: size 0 Result should have been (exact matching): size 46 ns_listencallback-1.0 FAILED ns_schedule-2.1 schedule proc: interval FAILED Contents of test case: set id [ns_schedule_proc 1s {nsv_lappend . . ns_schedule-2.1}] ns_sleep 2.5s ns_unschedule_proc $id nsv_get . . Result was: ns_schedule-2.1 Result should have been (glob matching): ns_schedule-2.1 ns_schedule-2.1* ns_schedule-2.1 FAILED -- Andrew Piskorski ___ naviserver-devel mailing list naviserver-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/naviserver-devel
Re: [naviserver-devel] scheduler thread getting stuck
On Mon, Jun 01, 2020 at 11:08:48AM -0400, Andrew Piskorski wrote: > encoding-1.1 Send body with ns_return and charset utf-8 FAILED > errorInfo: select failed: no such file or directory > invoked from within > "nstest::http-0.9 -encoding utf-8 -getbody 1 -getheaders {Content-Type > Content-Length} GET "/encoding"" There are 7 different versions of the encoding.* page present. If I start up the test server and then ask it for the FULL URL of any one of those files, like "encoding.utf2iso_adp", it works fine! But if I just ask for "encoding" without the extension it fails. So hitting this URL works fine: http://localhost:8000/encoding.utf_adp But this fails with 404 Not Found: http://localhost:8000/encoding I see that test.nscfg has what look like appropriate "ns/mimetypes" and "ns/encodings" sections, and of course that same config file works fine on Linux. So what could be going wrong on my Windows box to break the mapping of "/encoding" to "/encoding.utf_adp"? -- Andrew Piskorski ___ naviserver-devel mailing list naviserver-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/naviserver-devel
Re: [naviserver-devel] scheduler thread getting stuck
On Fri, May 15, 2020 at 10:37:15AM +0200, Gustaf Neumann wrote: > does the regression test run ok? Good question. Unfortunately, I'd never run the regression tests on Windows before. I now have them set up to run, however, I get LOTS of failures, and I don't know if these are real problems with NaviServer, or something wrong with my testing setup. Either way, I would like to track it down so I can rely on running these same regression tests on Windows as on Linux. Are the tests in "naviserver/tests/all.tcl" supposed to work correctly on Windows too? Is anyone else successfully running these tests there? On Windows, I always get immediate failures due to return codes of 1. The first such failure is "encoding-1.1", output shown below. More concerning, is that once it gets to "http_persistent.test", the whole NaviServer process locks up and never gets any farther. So any tests that come AFTER that one are not being run at all. I've left the test NaviServer running overnight just to be sure, and after that point there's never any more output until I hit Ctrl-c to shut it down. So far I've tested the nearly latest NaviServer head code on Windows 7 (no Windows 10 yet), with both the old 2010 and newer 2019 Microsoft compilers. The behavior of the regression tests appears identical in both cases. I have not yet tested older versions of NaviServer. On Linux these tests all run fine, of course. On Windows, I can invoke "tests/all.tcl" either before or after installing NaviServer. Test behavior appears to be the same in both cases. Before installing, I start the tests like this: nmake -f Makefile.win32 _test For that to work, you need these small patches to Makefile.win32: https://bitbucket.org/apiskors/naviserver/commits/7d7e245f8451419de3ac9b1d6202e5f26c883fdd ## First test to fail on Windows: encoding-1.1 Send body with ns_return and charset utf-8 FAILED Contents of test case: nstest::http-0.9 -encoding utf-8 -getbody 1 -getheaders {Content-Type Content-Length} GET "/encoding" Test generated error; Return code was: 1 Return code should have been one of: 0 2 errorInfo: select failed: no such file or directory invoked from within "nstest::http-0.9 -encoding utf-8 -getbody 1 -getheaders {Content-Type Content-Length} GET "/encoding"" ("uplevel" body line 2) invoked from within "uplevel 1 $script" errorCode: NONE encoding-1.1 FAILED ## Last test that runs on Windows, it locks up forever: http_persistent.test ns_sockioctl failed: no such file or directory while executing "ns_socknread $s" (procedure "client_readable" line 2) invoked from within "client_readable 1000 $s" (procedure "tcltest::client_receive" line 2) invoked from within "tcltest::client_receive sock05DA6CE0" -- Andrew Piskorski ___ naviserver-devel mailing list naviserver-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/naviserver-devel
Re: [naviserver-devel] scheduler thread getting stuck
On 15.05.20 09:21, Andrew Piskorski wrote: Recently I started seeing some weird behavior from NaviServer that I've never seen before. From time to time, it looks like the scheduler thread is getting stuck and not running anything, often for hours at a time. Then, on rare occasions, it will inexplicably come unstuck and go back to normal. With what exact version happens this? does the regression test run ok? So far I've ONLY seen this strange behavior on Windows 7, where I recently upgraded to newer NaviServer code and a newer Microsoft 2019 Visual Studio Community Edition compiler. I suspect the problem doesn't happen on Linux at all, but I haven't checked for that thoroughly. This is the first report of this kind. My suspicion is as well that it has to do with Windows and the used compiler mix. Previously on Windows I was running NaviServer code from c. 2019-07 and an ancient Microsoft compiler from 2010; the problem did NOT happen then. Can you try with the released version 4.99.17 ( 2018-11-04) with your new Windows environment? -gn ___ naviserver-devel mailing list naviserver-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/naviserver-devel