Re: bug#51144: GNU grep 3.7 fails to build on FreeBSD

2021-10-17 Thread Bruno Haible
Alexey Dokuchaev wrote:
> I wonder why it's not in our template if it's from 2003.

Just guessing: Maybe because some kernel-related FreeBSD packages want
'amd64'? In other words, don't you need to distinguish original FreeBSD
packages from other packages?

Bruno






Re: bug#51144: GNU grep 3.7 fails to build on FreeBSD

2021-10-17 Thread Alexey Dokuchaev via Gnulib discussion list
On Sun, Oct 17, 2021 at 11:20:12PM +0200, Bruno Haible wrote:
> Alexey Dokuchaev wrote in
> ...
> > All we do is
> > use our pre-built templates for config.{guess,site,sub} and pass the
> > --build=amd64-portbld-freebsd$(version) argument to configure scripts
> > if they are generated by GNU autotools.
> 
> This is a recipe for major hassle. The output of config.{guess,sub}
> is a *canonicalized* triple. See this comment in config.sub:
> 
>   # The goal of this file is to map all the various variations of a given
>   # machine specification into a single specification in the form:
>   #   CPU_TYPE-MANUFACTURER-OPERATING_SYSTEM
>   # or in some cases, the newer four-part form:
>   #   CPU_TYPE-MANUFACTURER-KERNEL-OPERATING_SYSTEM
> 
> and later:
> 
> # Here we normalize CPU types irrespective of the vendor
> amd64-*)
> cpu=x86_64
> ;;

Hmm, there's no such normalization code in our /usr/ports/Templates/config.sub
with timestamp='2018-05-24'.

> You can architecture the FreeBSD ports collection and its build system
> in the way you like. But you cannot expect dozens of GNU packages to
> support a different name for a CPU than the canonical name that GNU
> picked 18 years ago:
> 
> 2003-05-09  Andreas Jaeger  
> 
>   * config.sub (maybe_os): Add alias amd64 for x86_64.

I wonder why it's not in our template if it's from 2003.

> Paul Eggert asked:
> > > would you also consider adding "amd64" as a synonym to "x86_64" in
> > > that switch/case check?
> >
> > Yes I suppose we could do that. Bruno, what do you think? You wrote most
> > of those "x86_64"s.
> 
> A firm "no!" from my part.

Fair enough; I guess we can live with local patches to configure for our
diffutils and grep ports (for now).

./danfe



Re: gnulib does not always detect need for iconv() hack on musl

2021-10-17 Thread Bruno Haible
> The current code in config.guess is a heuristic (that has been working
> on Alpine Linux up to 3.13)

It works also in Alpine Linux 3.14.2. Which distro are you using?

Bruno






Re: bug#51144: GNU grep 3.7 fails to build on FreeBSD

2021-10-17 Thread Bruno Haible
Alexey Dokuchaev wrote in
 and
:

> > >Ports framework does several things which affect GNU configure
> > >scripts, particularly, it replaces build-aux/config.guess file
> > >with our own, where host/build tuples are derived from.
> > >
> > >x86_64 is spelled as amd64 in FreeBSD
...
> All we do is
> use our pre-built templates for config.{guess,site,sub} and pass the
> --build=amd64-portbld-freebsd$(version) argument to configure scripts
> if they are generated by GNU autotools.

This is a recipe for major hassle. The output of config.{guess,sub}
is a *canonicalized* triple. See this comment in config.sub:

  # The goal of this file is to map all the various variations of a given
  # machine specification into a single specification in the form:
  #   CPU_TYPE-MANUFACTURER-OPERATING_SYSTEM
  # or in some cases, the newer four-part form:
  #   CPU_TYPE-MANUFACTURER-KERNEL-OPERATING_SYSTEM

and later:

# Here we normalize CPU types irrespective of the vendor
amd64-*)
cpu=x86_64
;;

The rationale for this canonicalization is that on the C preprocessor
level, significant synonyms exist (see [1] for the list), and this has
caused many portability issues over time. So, at the autoconf level,
the GNU project has decided to canonicalize the elements of $host.
Linux prefers x86_64, FreeBSD prefers amd64, Windows prefers x64,
and so on. The canonicalization
1) allows the GNU packages to recognize just one (x86_64) instead of
   multiple ones,
2) allows GNU packages that include arch-dependent files (e.g.
   GNU gmp, GNU lightning, GNU libffcall) to name these files consistently.
   Not asm-x86_64.c in one package and asm-amd64.c in another package.

You can architecture the FreeBSD ports collection and its build system
in the way you like. But you cannot expect dozens of GNU packages to
support a different name for a CPU than the canonical name that GNU
picked 18 years ago:

2003-05-09  Andreas Jaeger  

* config.sub (maybe_os): Add alias amd64 for x86_64.

You can replace the config.guess script before building a package
in the FreeBSD ports collection. But when it's a GNU package you should
better make sure that this replacement script produces the same results
as the GNU config.guess does.

The same holds for the non-GNU non-FreeBSD packages that use Autoconf:
If you force non-GNU-canonical names on them, they may curse the GNU
build system, but in fact the culprit (= origin of the issue) would
still be the choices made in the FreeBSD ports build system.

Paul Eggert asked:
> > would you also consider
> > adding "amd64" as a synonym to "x86_64" in that switch/case check?
>
> Yes I suppose we could do that. Bruno, what do you think? You wrote most 
> of those "x86_64"s.

A firm "no!" from my part.

Btw, a similar problem exists for the aarch64 / arm64 CPU type.
'aarch64' is the name chosen by the GCC people, whereas 'arm64' is the
name chosen by Linux [2] and by Debian [3]. You need to obey the
canonicalization in effect at the level at which you are working;
at the $host_cpu level you should expect to see 'aarch64', never 'arm64' —
because passing --host=arm64-... is wrong.

Bruno

[1] https://sourceforge.net/p/predef/wiki/Architectures/
[2] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch?h=v5.0
[3] http://ftp.debian.org/debian/dists/Debian10.11/main/






Re: gnulib does not always detect need for iconv() hack on musl

2021-10-17 Thread Bruno Haible
Sergei Trofimovich wrote:
> Aha, 'config.guess' clearly detects wrong libc here:
> 
>   checking build system type... x86_64-pc-linux-gnu
>   checking host system type... x86_64-pc-linux-gnu

Yes, for a musl system, that's wrong.

The problem may come from your environment. Which of the environment
variables CC_FOR_BUILD, HOST_CC, CC, CONFIG_SITE did you have defined,
and to which values?

> I did not realize 'config.guess' has the code to detect libc but it
> clearly does. I'll dig from there and complain elsewhere.

The mailing list is https://lists.gnu.org/mailman/listinfo/config-patches .

The current code in config.guess is a heuristic (that has been working
on Alpine Linux up to 3.13), because the musl libc people refuse to have
their libc identify itself. [1]

Bruno

[1] 
https://wiki.musl-libc.org/faq.html#Q:-Why-is-there-no-%3Ccode%3E__MUSL__%3C/code%3E-macro?






Re: gnulib does not always detect need for iconv() hack on musl

2021-10-17 Thread Sergei Trofimovich
On Sun, Oct 17, 2021 at 07:18:51PM +0200, Bruno Haible wrote:
> Hello Sergei,
> 
> Sergei Trofimovich wrote:
> > The following fails bison-3.8.2 tests:
> > $ ./configure && make && make check
> > The following succeeds:
> > $ ./configure --host=x86_64-unknown-linux-musl && make && make check
> > 
> > The failure happens due to unexpected '*' output in report logs instead
> > of '%empty' on 'ASCII' locales.
> > 
> > These unexpected '*' pop back again because gnulib relies on '--host='
> > parameter for './configure' to detect musl target (for lack of better
> > signal?):
> > 
> >   https://git.savannah.gnu.org/cgit/gnulib.git/tree/m4/musl.m4#n16
> > 
> > case "$host_os" in
> >   *-musl*) AC_DEFINE([MUSL_LIBC], [1], [Define to 1 on musl libc.]) ;;
> > 
> >   https://git.savannah.gnu.org/cgit/gnulib.git/tree/lib/unicodeio.c#n151
> > 
> > /* FreeBSD iconv(), NetBSD iconv(), and Solaris 11 iconv() insert
> >a '?' if they cannot convert.  */
> > # if !defined _LIBICONV_VERSION
> >   || (res > 0 && outptr - outbuf == 1 && *outbuf == '?')
> > # endif
> >   /* musl libc iconv() inserts a '*' if it cannot convert.  */
> > # if !defined _LIBICONV_VERSION && MUSL_LIBC
> >   || (res > 0 && outptr - outbuf == 1 && *outbuf == '*')
> > # endif
> >  )
> > return failure (code, NULL, callback_arg);
> > 
> > What do you think of enabling the workaround regardless of MUSL_LIBC
> > define?
> 
> The MUSL_LIBC symbol is supposed to be set on musl platforms; this is
> what musl.m4 is for. The difference between your two invocations is that
> in the first case, it used a $host triple inferred by config.guess,
> while in the second case, it used the $host that you specified on the
> command line.
> 
> When I try your two commands (just the configure step), the first one
> prints
>   checking for host system type... x86_64-pc-linux-musl
> while the second one prints
>   checking for host system type... x86_64-unknown-linux-musl
> 
> The next steps of the investigation are: In the first case,
>   - What did the "checking for host system type..." line look like?
>   - Which of the environment variables CC_FOR_BUILD, HOST_CC, CC,
> CONFIG_SITE did you have defined, and to which values?

Aha, 'config.guess' clearly detects wrong libc here:

  checking build system type... x86_64-pc-linux-gnu
  checking host system type... x86_64-pc-linux-gnu

I did not realize 'config.guess' has the code to detect libc but it
clearly does. I'll dig from there and complain elsewhere.

Thank you!

-- 

  Sergei



Re: double _close()?

2021-10-17 Thread Bruno Haible
Hi Gisle,

> > Thus, skipping the fclose_nothrow call introduces a memory leak.
> 
> Right. But I'd rather have leaks than a lot of exceptions.
> 
> So a 'diff -r dir1 dir2' is using mostly read() and
> close(). Changing to:
>   MSVC_INVALID_PARAMETER_HANDLING == HAIRY_LIBRARY_HANDLING
> 
> and counting number of 'rpl_flose()' calls and 'fclose_nothrow()'
> catches, I find only 2 (no matter how many files I'm diffing).

I agree that 2 file stream buffers is not a noteworthy memory leak.
Therefore it's OK for you to use the modifications that you propose.

But it's not something we can do in Gnulib, since the fact that there
are only 2 (not N) such lost buffers is something that comes from the
GNU diff source code; other programs will behave differently.

> I'm just trying to speed-up GNU-diff; using 'Process Explorer',
> I find that the read-rate is 300 kByte/s on average (with some
> peaks at ~5 MByte/s) and CPU < 1%. IMHO this is patheticly slow.

Since GNU diff is a single-threaded program, the low CPU percentage
indicates that it is mostly stuck in the I/O calls. Why these produce
only 300 Kb/s, may have various causes:
  - slow hardware (e.g. if you read from an USB 2.0 stick or some
old hard disk),
  - Windows process management,
  - the libc that you are using (mingw or MSVC DLLs).

> I've been trying several things to speed it up:
>   1) SetPriorityClass (GetCurrentProcess(), REALTIME_PRIORITY_CLASS);
>   2) Hinting the Windows-cache manager with a 'O_SEQUENTIAL'
>  in open (file, O_RDONLY | O_SEQUENTIAL, 0);

Can't comment, as I am not a Windows expert.

> Maybe a memory-mapped I/O in GNU-diff could improve the speed?

The problem here is:
  - On one hand, we try to use POSIX APIs in GNU programs, i.e. it would be
an mmap() function,
  - But we don't have a native Windows emulation of mmap() in Gnulib so far,
because of different semantics between mmap() and VirtualAlloc
(regarding "reserved" address ranges).
A "limited-use" mmap() is possible, but we could not call it 'mmap' since
it would not be a 100% mmap() emulation.

Bruno






Re: gnulib does not always detect need for iconv() hack on musl

2021-10-17 Thread Bruno Haible
Hello Sergei,

Sergei Trofimovich wrote:
> The following fails bison-3.8.2 tests:
> $ ./configure && make && make check
> The following succeeds:
> $ ./configure --host=x86_64-unknown-linux-musl && make && make check
> 
> The failure happens due to unexpected '*' output in report logs instead
> of '%empty' on 'ASCII' locales.
> 
> These unexpected '*' pop back again because gnulib relies on '--host='
> parameter for './configure' to detect musl target (for lack of better
> signal?):
> 
>   https://git.savannah.gnu.org/cgit/gnulib.git/tree/m4/musl.m4#n16
> 
> case "$host_os" in
>   *-musl*) AC_DEFINE([MUSL_LIBC], [1], [Define to 1 on musl libc.]) ;;
> 
>   https://git.savannah.gnu.org/cgit/gnulib.git/tree/lib/unicodeio.c#n151
> 
> /* FreeBSD iconv(), NetBSD iconv(), and Solaris 11 iconv() insert
>a '?' if they cannot convert.  */
> # if !defined _LIBICONV_VERSION
>   || (res > 0 && outptr - outbuf == 1 && *outbuf == '?')
> # endif
>   /* musl libc iconv() inserts a '*' if it cannot convert.  */
> # if !defined _LIBICONV_VERSION && MUSL_LIBC
>   || (res > 0 && outptr - outbuf == 1 && *outbuf == '*')
> # endif
>  )
> return failure (code, NULL, callback_arg);
> 
> What do you think of enabling the workaround regardless of MUSL_LIBC
> define?

The MUSL_LIBC symbol is supposed to be set on musl platforms; this is
what musl.m4 is for. The difference between your two invocations is that
in the first case, it used a $host triple inferred by config.guess,
while in the second case, it used the $host that you specified on the
command line.

When I try your two commands (just the configure step), the first one
prints
  checking for host system type... x86_64-pc-linux-musl
while the second one prints
  checking for host system type... x86_64-unknown-linux-musl

The next steps of the investigation are: In the first case,
  - What did the "checking for host system type..." line look like?
  - Which of the environment variables CC_FOR_BUILD, HOST_CC, CC,
CONFIG_SITE did you have defined, and to which values?

> Or perhaps gnulib should perform runtime testing to detect the need for
> a hack? Here is how musl mangles symbols:
> 
>   https://git.musl-libc.org/cgit/musl/tree/src/locale/iconv.c#n545
> 
> case US_ASCII:
> if (c > 0x7f) subst: x++, c='*';
> 
> Below implements unconditional workaround.

Thanks for the suggestion. But we try to limit the performance implications
of hacks/workarounds needed for one platform (here: musl) on other platforms
(especially glibc platforms).

Bruno






gnulib does not always detect need for iconv() hack on musl

2021-10-17 Thread Sergei Trofimovich
Hi gnulib! The problem:

The following fails bison-3.8.2 tests:
$ ./configure && make && make check
The following succeeds:
$ ./configure --host=x86_64-unknown-linux-musl && make && make check

The failure happens due to unexpected '*' output in report logs instead
of '%empty' on 'ASCII' locales.

These unexpected '*' pop back again because gnulib relies on '--host='
parameter for './configure' to detect musl target (for lack of better
signal?):

  https://git.savannah.gnu.org/cgit/gnulib.git/tree/m4/musl.m4#n16

case "$host_os" in
  *-musl*) AC_DEFINE([MUSL_LIBC], [1], [Define to 1 on musl libc.]) ;;

  https://git.savannah.gnu.org/cgit/gnulib.git/tree/lib/unicodeio.c#n151

/* FreeBSD iconv(), NetBSD iconv(), and Solaris 11 iconv() insert
   a '?' if they cannot convert.  */
# if !defined _LIBICONV_VERSION
  || (res > 0 && outptr - outbuf == 1 && *outbuf == '?')
# endif
  /* musl libc iconv() inserts a '*' if it cannot convert.  */
# if !defined _LIBICONV_VERSION && MUSL_LIBC
  || (res > 0 && outptr - outbuf == 1 && *outbuf == '*')
# endif
 )
return failure (code, NULL, callback_arg);

What do you think of enabling the workaround regardless of MUSL_LIBC
define?

Or perhaps gnulib should perform runtime testing to detect the need for
a hack? Here is how musl mangles symbols:

  https://git.musl-libc.org/cgit/musl/tree/src/locale/iconv.c#n545

case US_ASCII:
if (c > 0x7f) subst: x++, c='*';

Below implements unconditional workaround.

Thank you!

--- a/lib/unicodeio.c
+++ b/lib/unicodeio.c
@@ -148,7 +148,7 @@ unicode_to_mb (unsigned int code,
   || (res > 0 && outptr - outbuf == 1 && *outbuf == '?')
 # endif
   /* musl libc iconv() inserts a '*' if it cannot convert.  */
-# if !defined _LIBICONV_VERSION && MUSL_LIBC
+# if !defined _LIBICONV_VERSION
   || (res > 0 && outptr - outbuf == 1 && *outbuf == '*')
 # endif
  )



heap-buffer overflow when searching for regex @\*

2021-10-17 Thread Benno Schulenberg

Hi,

When compiling the 'info' program or GNU nano with -fsanitize=address,
then searching in either of the programs for the regex "@\*" (without
the quotes) causes an abortion in gnulib's re_search_internal() at
lib/regexec.c:764.

To reproduce, configure texinfo-6.8 with CFLAGS="-g -O0 -march=native
-fsanitize=address", compile, and then run 'info/ginfo texinfo 2>TRAIL'
and search for "@\*".  In other words, type: /@\*.  Then type
five times Shift+}.  Result: info aborts.  See the attached output.

To reproduce with nano, first run 'makeinfo --plain doc/texinfo.texi
>thetext' in the texinfo-6.8 directory, then configure nano-5.9 with
the same CFLAGS, compile, and then run 'src/nano +1 thetext 2>TRAIL'
and type: Ctrl+W Alt+R @\*.  Type type six times Alt+W.  Result:
nano aborts.  See the attached output.

Problem still occurs when using a current checkout of gnulib.

Benno
=
==15833==ERROR: AddressSanitizer: heap-buffer-overflow on address 
0x602429f6 at pc 0x55571a3caf51 bp 0x7ffdbabfd5f0 sp 0x7ffdbabfd5e0
READ of size 1 at 0x602429f6 thread T0
#0 0x55571a3caf50 in re_search_internal 
/home/ben/Programoj/texinfo-6.8/gnulib/lib/regexec.c:764
#1 0x55571a3c88d8 in rpl_regexec 
/home/ben/Programoj/texinfo-6.8/gnulib/lib/regexec.c:219
#2 0x55571a37a8f3 in extend_matches 
/home/ben/Programoj/texinfo-6.8/info/search.c:142
#3 0x55571a37b1cf in regexp_search 
/home/ben/Programoj/texinfo-6.8/info/search.c:214
#4 0x55571a38dfcd in info_search_in_node_internal 
/home/ben/Programoj/texinfo-6.8/info/session.c:3956
#5 0x55571a38ed01 in info_search_internal 
/home/ben/Programoj/texinfo-6.8/info/session.c:4087
#6 0x55571a392477 in info_search_next 
/home/ben/Programoj/texinfo-6.8/info/session.c:4688
#7 0x55571a37e9b3 in info_read_and_dispatch 
/home/ben/Programoj/texinfo-6.8/info/session.c:252
#8 0x55571a37e797 in info_session 
/home/ben/Programoj/texinfo-6.8/info/session.c:220
#9 0x55571a365a26 in main /home/ben/Programoj/texinfo-6.8/info/info.c:1079
#10 0x7fca41f5bbf6 in __libc_start_main 
(/lib/x86_64-linux-gnu/libc.so.6+0x21bf6)
#11 0x55571a3457e9 in _start (/usr/local/bin/info+0x237e9)

0x602429f6 is located 0 bytes to the right of 6-byte region 
[0x602429f0,0x602429f6)
allocated by thread T0 here:
#0 0x7fca42633f30 in realloc 
(/usr/lib/x86_64-linux-gnu/libasan.so.4+0xdef30)
#1 0x55571a3a8c0e in re_string_realloc_buffers 
/home/ben/Programoj/texinfo-6.8/gnulib/lib/regex_internal.c:168
#2 0x55571a3a82e9 in re_string_allocate 
/home/ben/Programoj/texinfo-6.8/gnulib/lib/regex_internal.c:61
#3 0x55571a3ca27b in re_search_internal 
/home/ben/Programoj/texinfo-6.8/gnulib/lib/regexec.c:636
#4 0x55571a3c88d8 in rpl_regexec 
/home/ben/Programoj/texinfo-6.8/gnulib/lib/regexec.c:219
#5 0x55571a37a8f3 in extend_matches 
/home/ben/Programoj/texinfo-6.8/info/search.c:142
#6 0x55571a37b1cf in regexp_search 
/home/ben/Programoj/texinfo-6.8/info/search.c:214
#7 0x55571a38dfcd in info_search_in_node_internal 
/home/ben/Programoj/texinfo-6.8/info/session.c:3956
#8 0x55571a38ed01 in info_search_internal 
/home/ben/Programoj/texinfo-6.8/info/session.c:4087
#9 0x55571a392477 in info_search_next 
/home/ben/Programoj/texinfo-6.8/info/session.c:4688
#10 0x55571a37e9b3 in info_read_and_dispatch 
/home/ben/Programoj/texinfo-6.8/info/session.c:252
#11 0x55571a37e797 in info_session 
/home/ben/Programoj/texinfo-6.8/info/session.c:220
#12 0x55571a365a26 in main /home/ben/Programoj/texinfo-6.8/info/info.c:1079
#13 0x7fca41f5bbf6 in __libc_start_main 
(/lib/x86_64-linux-gnu/libc.so.6+0x21bf6)

SUMMARY: AddressSanitizer: heap-buffer-overflow 
/home/ben/Programoj/texinfo-6.8/gnulib/lib/regexec.c:764 in re_search_internal
Shadow bytes around the buggy address:
  0x0c0484e0: fa fa fd fa fa fa fd fd fa fa fd fa fa fa fd fa
  0x0c0484f0: fa fa fd fd fa fa fd fa fa fa fd fd fa fa fd fa
  0x0c048500: fa fa fd fa fa fa fd fd fa fa fd fa fa fa 04 fa
  0x0c048510: fa fa fd fa fa fa fd fd fa fa fd fa fa fa fd fa
  0x0c048520: fa fa 00 fa fa fa 00 fa fa fa 00 fa fa fa fd fa
=>0x0c048530: fa fa 00 fa fa fa 00 fa fa fa 00 00 fa fa[06]fa
  0x0c048540: fa fa fd fa fa fa fd fd fa fa 00 fa fa fa 00 fa
  0x0c048550: fa fa 00 00 fa fa fd fa fa fa fd fd fa fa 00 fa
  0x0c048560: fa fa 00 fa fa fa 00 00 fa fa fa fa fa fa fa fa
  0x0c048570: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c048580: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:   00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:   fa
  Freed heap region:   fd
  Stack left redzone:  f1
  Stack mid redzone:   f2
  Stack right redzone: f3
  Stack after return:  f5
  Stack use after scope:   f8
  Global redzone:  f9
  Global init o