Thanks very much,
For versions of Tcl less than 8.6.11 we're failing because it's a new
test exposing an old problem, is that correct?
This would explain why I don't see any test failures after
building v4.99.22 with Tcl 8.6.9 - it's only because 4.99.23 has
introduced the tests and not that 4.99.22 doesn't have the problem.
To test Tcl8.6.11 the easiest way for me is to jump to bullseye
(Debian v11) which provides 8.6.11+dfsg-1
Strangely I seem to get the same ns_strcoll seg fault with that.
But if I remove misc.test temporarily, all other tests pass happily.
# uname -a
Linux ip-172-0-1-190 5.10.0-12-cloud-amd64 #1 SMP Debian 5.10.103-1
(2022-03-07) x86_64 GNU/Linux
# cat /etc/debian_version
11.3
# ls -l /lib/x86_64-linux-gnu/libc.so.6
lrwxrwxrwx 1 root root 12 Mar 17 21:37 /lib/x86_64-linux-gnu/libc.so.6
-> libc-2.31.so <http://libc-2.31.so>
# apt-cache policy tcl8.6
tcl8.6:
Installed: 8.6.11+dfsg-1
# git clone https://bitbucket.org/naviserver/naviserver.git
Cloning into 'naviserver'...
# cd naviserver
# git checkout tags/naviserver-4.99.23
Note: switching to 'tags/naviserver-4.99.23'.
# ./autogen.sh --with-tcl=/usr/lib/tcl8.6 --enable-rpath
--enable-threads --enable-symbols
# make
Compiler warning for reference:
gcc -Wall -fPIC -g -O2
-fdebug-prefix-map=/build/tcl8.6-qxVr7a/tcl8.6-8.6.11+dfsg=.
-fstack-protector-strong -Wformat -Werror=format-security
-fno-unit-at-a-time -pipe -Wdate-time -D_FORTIFY_SOURCE=2 -DNDEBUG
-DSYSTEM_MALLOC -DTCL_NO_DEPRECATED -std=c99 -I../include
-I"/usr/include/tcl8.6" -DHAVE_CONFIG_H -c -o tclenv.o tclenv.c
In file included from /usr/include/string.h:495,
from ../include/nsthread.h:378,
from ../include/ns.h:46,
from nsd.h:38,
from tclenv.c:37:
In function ‘strncat’,
inlined from ‘PutEnv’ at tclenv.c:349:13:
/usr/include/x86_64-linux-gnu/bits/string_fortified.h:136:10: warning:
‘__builtin_strncat’ specified bound depends on the length of the
source argument [
]8;;https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Wstringop-overflow=-Wstringop-overflow=
]8;;]
136 | return __builtin___strncat_chk (__dest, __src, __len, __bos
(__dest));
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
tclenv.c: In function ‘PutEnv’:
tclenv.c:314:23: note: length computed here
314 | valueLength = strlen(value) + 1;
| ^~~~~~~~~~~~~
# make memcheck TESTFLAGS="-verbose start -file misc.test"
---- ns_random-1.1 start
---- ns_fmttime-1.0 start
---- ns_fmttime-1.1 start
---- ns_trim-0.0 start
---- ns_trim-0.1 start
---- ns_trim-0.2 start
---- ns_trim-1.1 start
---- ns_trim-1.2 start
---- ns_trim-1.3 start
---- ns_trim-1.4 start
---- ns_trim-1.5 start
---- ns_trim-2.1 start
---- ns_trim-2.2 start
---- ns_quotehtml start
---- ns_strcoll-1.0.0 start
==37899== Thread 2:
==37899== Invalid read of size 8
==37899== at 0x49E1361: strcoll_l (strcoll_l.c:260)
==37899== by 0x48DA9FF: NsTclStrcollObjCmd (tclmisc.c:2802)
==37899== by 0x4BBC4A1: TclNRRunCallbacks (in
/usr/lib/x86_64-linux-gnu/libtcl8.6.so <http://libtcl8.6.so>)
==37899== by 0x4BBD71F: ??? (in
/usr/lib/x86_64-linux-gnu/libtcl8.6.so <http://libtcl8.6.so>)
==37899== by 0x4C794D8: Tcl_FSEvalFileEx (in
/usr/lib/x86_64-linux-gnu/libtcl8.6.so <http://libtcl8.6.so>)
==37899== by 0x4C818AD: Tcl_MainEx (in
/usr/lib/x86_64-linux-gnu/libtcl8.6.so <http://libtcl8.6.so>)
==37899== by 0x4B745AF: NsThreadMain (thread.c:232)
==37899== by 0x4B75A48: ThreadMain (pthread.c:870)
==37899== by 0x521BEA6: start_thread (pthread_create.c:477)
==37899== by 0x4A4DDEE: clone (clone.S:95)
==37899== Address 0x18 is not stack'd, malloc'd or (recently) free'd
==37899==
==37899==
==37899== Process terminating with default action of signal 11 (SIGSEGV)
==37899== Access not within mapped region at address 0x18
==37899== at 0x49E1361: strcoll_l (strcoll_l.c:260)
==37899== by 0x48DA9FF: NsTclStrcollObjCmd (tclmisc.c:2802)
==37899== by 0x4BBC4A1: TclNRRunCallbacks (in
/usr/lib/x86_64-linux-gnu/libtcl8.6.so <http://libtcl8.6.so>)
==37899== by 0x4BBD71F: ??? (in
/usr/lib/x86_64-linux-gnu/libtcl8.6.so <http://libtcl8.6.so>)
==37899== by 0x4C794D8: Tcl_FSEvalFileEx (in
/usr/lib/x86_64-linux-gnu/libtcl8.6.so <http://libtcl8.6.so>)
==37899== by 0x4C818AD: Tcl_MainEx (in
/usr/lib/x86_64-linux-gnu/libtcl8.6.so <http://libtcl8.6.so>)
==37899== by 0x4B745AF: NsThreadMain (thread.c:232)
==37899== by 0x4B75A48: ThreadMain (pthread.c:870)
==37899== by 0x521BEA6: start_thread (pthread_create.c:477)
==37899== by 0x4A4DDEE: clone (clone.S:95)
==37899== If you believe this happened as a result of a stack
==37899== overflow in your program's main thread (unlikely but
==37899== possible), you can try to increase the size of the
==37899== main thread stack using the --main-stacksize= flag.
==37899== The main thread stack size used in this run was 8388608.
==37899==
==37899== HEAP SUMMARY:
==37899== in use at exit: 12,499,085 bytes in 8,840 blocks
==37899== total heap usage: 12,059 allocs, 3,219 frees, 29,314,210
bytes allocated
==37899==
==37899== LEAK SUMMARY:
==37899== definitely lost: 131 bytes in 1 blocks
==37899== indirectly lost: 0 bytes in 0 blocks
==37899== possibly lost: 10,466,879 bytes in 2,978 blocks
==37899== still reachable: 2,032,075 bytes in 5,861 blocks
==37899== suppressed: 0 bytes in 0 blocks
==37899== Rerun with --leak-check=full to see details of leaked memory
==37899==
==37899== For lists of detected and suppressed errors, rerun with: -s
==37899== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
Segmentation fault
make: *** [Makefile:273: memcheck] Error 139
On Wed, 6 Apr 2022 at 16:58, Gustaf Neumann <neum...@wu.ac.at> wrote:
On 06.04.22 16:46, David Osborne wrote:
On Wed, 6 Apr 2022 at 14:53, Gustaf Neumann <neum...@wu.ac.at> wrote:
Hi David,
i will setup a VM for testing in your configuration, but
first i have to
understand, what pt1/pt2 means.
*
*
*Sorry that is just an abbreviation for "part1" and "part2" of a
2 part email.
*
ok, i thought there is a version called "Debian Buster pt1"....
but could not find insights via googling :)
*"tcl8.6" debian supplied package version 8.6.9+dfsg-2*
This seems to be a part of the problem. Tcl 8.6.9 was released in
nov 2018 and has
probably some issues with UTF-8 which were fixed in later releases.
i have just now installed NaviServer on a fresh Debian Buster
machine using my usual install script [1] (using Tcl 8.6.11) and
everything looks ok. It is not unlikely that the problem with
ns_strcoll is related, since one has to translate the "internal"
UTF-8 to the external variant before calling "strcoll_l()", so,
when this step is broken, then there might be some invalid memory
around.
For you, it would the best to use a newer version of Tcl. There
are newer Debian packages of Tcl around...
https://packages.debian.org/search?keywords=tcl
Is this an option for you?
Not sure, how NaviServer could address the problem. Deactivating
the ns_strcoll command in NaviServer when it is compiled with Tcl
8.6.9 or older, is probably no good option, since the
UTF-to-external conversion is now all over the place and the
problem will pop up at other places. We can consider deactivating
the UTF-to-external conversion altogether for older Tcl version
(requires several changes, including PostgreSQL driver) ... but
the many tests will fail as well, which have to be deactivated as
well.
What do you think?
-gn
_______________________________________________
naviserver-devel mailing list
naviserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/naviserver-devel