Dear David,

The problem with manual compilation on minimal Debian installations is that the locale "en_US.UTF-8" is there not installed by default. The configuration file for the test sets LC_COLLATE to this value such that the collate test will work. ....

The problem with Debian is that when LC_COLLATE is set to any of the predefined values, that crash will go away, but the test will fail (leading to a different comparison result).  So we have either to skip the ns_stroll tests if no proper locale is defined, or refuse to start naviserver, if the local is not installed. Since other Unixes have the locale "en_US.UTF-8" installed by default, it is probably the best to require its installation during startup of nsd. This can avoid surprises later.

See below for its installation of the missing locale under Debian; with this set-up, the problem with strcoll_l() will disappear.

all the best

-gn

$ locale -a
C
C.UTF-8
POSIX

$ sed -i 's/^# *\(en_US.UTF-8\)/\1/' /etc/locale.gen
$ locale-gen

$ locale -a
C
C.UTF-8
POSIX
en_US.utf8


On 07.04.22 13:17, David Osborne wrote:
Thanks very much,

For versions of Tcl less than 8.6.11 we're failing because it's a new test exposing an old problem, is that correct? This would explain why I don't see any test failures after building v4.99.22 with Tcl 8.6.9 - it's only because 4.99.23 has introduced the tests and not that 4.99.22 doesn't have the problem.

To test Tcl8.6.11 the easiest way for me is to jump to bullseye (Debian v11) which provides 8.6.11+dfsg-1

Strangely I seem to get the same ns_strcoll seg fault with that.
But if I remove misc.test temporarily, all other tests pass happily.

# uname -a
Linux ip-172-0-1-190 5.10.0-12-cloud-amd64 #1 SMP Debian 5.10.103-1 (2022-03-07) x86_64 GNU/Linux

# cat /etc/debian_version
11.3

# ls -l /lib/x86_64-linux-gnu/libc.so.6
lrwxrwxrwx 1 root root 12 Mar 17 21:37 /lib/x86_64-linux-gnu/libc.so.6 -> libc-2.31.so <http://libc-2.31.so>

# apt-cache policy tcl8.6
tcl8.6:
  Installed: 8.6.11+dfsg-1

# git clone https://bitbucket.org/naviserver/naviserver.git
Cloning into 'naviserver'...

# cd naviserver
# git checkout tags/naviserver-4.99.23
Note: switching to 'tags/naviserver-4.99.23'.

# ./autogen.sh --with-tcl=/usr/lib/tcl8.6 --enable-rpath --enable-threads --enable-symbols
# make

Compiler warning for reference:

gcc   -Wall -fPIC -g -O2 -fdebug-prefix-map=/build/tcl8.6-qxVr7a/tcl8.6-8.6.11+dfsg=. -fstack-protector-strong -Wformat -Werror=format-security -fno-unit-at-a-time -pipe -Wdate-time -D_FORTIFY_SOURCE=2 -DNDEBUG -DSYSTEM_MALLOC -DTCL_NO_DEPRECATED -std=c99 -I../include -I"/usr/include/tcl8.6"   -DHAVE_CONFIG_H   -c -o tclenv.o tclenv.c
In file included from /usr/include/string.h:495,
                 from ../include/nsthread.h:378,
                 from ../include/ns.h:46,
                 from nsd.h:38,
                 from tclenv.c:37:
In function ‘strncat’,
    inlined from ‘PutEnv’ at tclenv.c:349:13:
/usr/include/x86_64-linux-gnu/bits/string_fortified.h:136:10: warning: ‘__builtin_strncat’ specified bound depends on the length of the source argument [ ]8;;https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Wstringop-overflow=-Wstringop-overflow= ]8;;]   136 |   return __builtin___strncat_chk (__dest, __src, __len, __bos (__dest));
      |  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
tclenv.c: In function ‘PutEnv’:
tclenv.c:314:23: note: length computed here
  314 |         valueLength = strlen(value) + 1;
      |                       ^~~~~~~~~~~~~

# make memcheck TESTFLAGS="-verbose start -file misc.test"
---- ns_random-1.1 start
---- ns_fmttime-1.0 start
---- ns_fmttime-1.1 start
---- ns_trim-0.0 start
---- ns_trim-0.1 start
---- ns_trim-0.2 start
---- ns_trim-1.1 start
---- ns_trim-1.2 start
---- ns_trim-1.3 start
---- ns_trim-1.4 start
---- ns_trim-1.5 start
---- ns_trim-2.1 start
---- ns_trim-2.2 start
---- ns_quotehtml start
---- ns_strcoll-1.0.0 start
==37899== Thread 2:
==37899== Invalid read of size 8
==37899==    at 0x49E1361: strcoll_l (strcoll_l.c:260)
==37899==    by 0x48DA9FF: NsTclStrcollObjCmd (tclmisc.c:2802)
==37899==    by 0x4BBC4A1: TclNRRunCallbacks (in /usr/lib/x86_64-linux-gnu/libtcl8.6.so <http://libtcl8.6.so>) ==37899==    by 0x4BBD71F: ??? (in /usr/lib/x86_64-linux-gnu/libtcl8.6.so <http://libtcl8.6.so>) ==37899==    by 0x4C794D8: Tcl_FSEvalFileEx (in /usr/lib/x86_64-linux-gnu/libtcl8.6.so <http://libtcl8.6.so>) ==37899==    by 0x4C818AD: Tcl_MainEx (in /usr/lib/x86_64-linux-gnu/libtcl8.6.so <http://libtcl8.6.so>)
==37899==    by 0x4B745AF: NsThreadMain (thread.c:232)
==37899==    by 0x4B75A48: ThreadMain (pthread.c:870)
==37899==    by 0x521BEA6: start_thread (pthread_create.c:477)
==37899==    by 0x4A4DDEE: clone (clone.S:95)
==37899==  Address 0x18 is not stack'd, malloc'd or (recently) free'd
==37899==
==37899==
==37899== Process terminating with default action of signal 11 (SIGSEGV)
==37899==  Access not within mapped region at address 0x18
==37899==    at 0x49E1361: strcoll_l (strcoll_l.c:260)
==37899==    by 0x48DA9FF: NsTclStrcollObjCmd (tclmisc.c:2802)
==37899==    by 0x4BBC4A1: TclNRRunCallbacks (in /usr/lib/x86_64-linux-gnu/libtcl8.6.so <http://libtcl8.6.so>) ==37899==    by 0x4BBD71F: ??? (in /usr/lib/x86_64-linux-gnu/libtcl8.6.so <http://libtcl8.6.so>) ==37899==    by 0x4C794D8: Tcl_FSEvalFileEx (in /usr/lib/x86_64-linux-gnu/libtcl8.6.so <http://libtcl8.6.so>) ==37899==    by 0x4C818AD: Tcl_MainEx (in /usr/lib/x86_64-linux-gnu/libtcl8.6.so <http://libtcl8.6.so>)
==37899==    by 0x4B745AF: NsThreadMain (thread.c:232)
==37899==    by 0x4B75A48: ThreadMain (pthread.c:870)
==37899==    by 0x521BEA6: start_thread (pthread_create.c:477)
==37899==    by 0x4A4DDEE: clone (clone.S:95)
==37899==  If you believe this happened as a result of a stack
==37899==  overflow in your program's main thread (unlikely but
==37899==  possible), you can try to increase the size of the
==37899==  main thread stack using the --main-stacksize= flag.
==37899==  The main thread stack size used in this run was 8388608.
==37899==
==37899== HEAP SUMMARY:
==37899==     in use at exit: 12,499,085 bytes in 8,840 blocks
==37899==   total heap usage: 12,059 allocs, 3,219 frees, 29,314,210 bytes allocated
==37899==
==37899== LEAK SUMMARY:
==37899==    definitely lost: 131 bytes in 1 blocks
==37899==    indirectly lost: 0 bytes in 0 blocks
==37899==      possibly lost: 10,466,879 bytes in 2,978 blocks
==37899==    still reachable: 2,032,075 bytes in 5,861 blocks
==37899==         suppressed: 0 bytes in 0 blocks
==37899== Rerun with --leak-check=full to see details of leaked memory
==37899==
==37899== For lists of detected and suppressed errors, rerun with: -s
==37899== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
Segmentation fault
make: *** [Makefile:273: memcheck] Error 139



On Wed, 6 Apr 2022 at 16:58, Gustaf Neumann <neum...@wu.ac.at> wrote:


    On 06.04.22 16:46, David Osborne wrote:

    On Wed, 6 Apr 2022 at 14:53, Gustaf Neumann <neum...@wu.ac.at> wrote:

        Hi David,

        i will setup a VM for testing in your configuration, but
        first i have to
        understand, what pt1/pt2 means.

    *
    *
    *Sorry that is just an abbreviation for "part1" and "part2" of a
    2 part email.
    *

    ok, i thought there is a version called "Debian Buster pt1"....
    but could not find insights via googling :)

    *"tcl8.6" debian supplied package version 8.6.9+dfsg-2*

    This seems to be a part of the problem. Tcl 8.6.9 was released in
    nov 2018 and has
    probably some issues with UTF-8 which were fixed in later releases.

    i have just now installed NaviServer on a fresh Debian Buster
    machine using my usual install script [1] (using Tcl 8.6.11) and
    everything looks ok. It is not unlikely that the problem with
    ns_strcoll is related, since one has to translate the "internal"
    UTF-8 to the external variant before calling "strcoll_l()", so,
    when this step is broken, then there might be some invalid memory
    around.

    For you, it would the best to use a newer version of Tcl. There
    are newer Debian packages of Tcl around...

    https://packages.debian.org/search?keywords=tcl

    Is this an option for you?

    Not sure, how NaviServer could address the problem. Deactivating
    the ns_strcoll command in NaviServer when it is compiled with Tcl
    8.6.9 or older, is probably no good option, since the
    UTF-to-external conversion is now all over the place and the
    problem will pop up at other places. We can consider deactivating
    the UTF-to-external conversion altogether for older Tcl version
    (requires several changes, including PostgreSQL driver) ... but
    the many tests will fail as well, which have to be deactivated as
    well.

    What do you think?

    -gn



_______________________________________________
naviserver-devel mailing list
naviserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/naviserver-devel

--
Univ.Prof. Dr. Gustaf Neumann
Head of the Institute of Information Systems and New Media
of Vienna University of Economics and Business
Program Director of MSc "Information Systems"
_______________________________________________
naviserver-devel mailing list
naviserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/naviserver-devel

Reply via email to