Re: *alloc-gnu on glibc

2021-05-15 Thread Bruno Haible
Pádraig Brady wrote:
> Ah this issue is restricted to glibc's MALLOC_CHECK_ implementation.

Ah, that explains why I got different results than you did.

> I've reported the glibc bug at:
> https://sourceware.org/bugzilla/show_bug.cgi?id=27870

Thanks!

> The attached avoids this part of the test if it's being used.

Applied, with a comment tweak.

Bruno




Re: *alloc-gnu on glibc

2021-05-15 Thread Pádraig Brady

On 15/05/2021 16:49, Bruno Haible wrote:

Hi Pádraig,


On glibc-2.31-5.fc32.x86_64 I'm seeing this test failure with coreutils,
since the tests are now checking for a specific errno.
I just checked a later Fedora 34 system which has the same issue.


For me, a testdir created through

./gnulib-tool --create-testdir --dir=../testdir --single-configure \
   malloc-gnu realloc-gnu calloc-gnu reallocarray

passes all tests on Fedora 32 (glibc 2.31) and Fedora 33 (glibc 2.32).


Oh. Ah this issue is restricted to glibc's MALLOC_CHECK_ implementation.
The attached avoids this part of the test if it's being used.
I've reported the glibc bug at:
https://sourceware.org/bugzilla/show_bug.cgi?id=27870

With this applied, your command above passes all tests.

cheers,
Pádraig
>From defe5bebcd1e2bb75f15eb54ae1135afaae3c477 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?P=C3=A1draig=20Brady?= 
Date: Sat, 15 May 2021 17:50:33 +0100
Subject: [PATCH] realloc-gnu: avoid glibc MALLOC_CHECK_ issue

* tests/test-realloc-gnu.c (main): if MALLOC_CHECK_ env var
is set then don't check ENOMEM is returned from realloc().
See https://sourceware.org/bugzilla/show_bug.cgi?id=27870
Note it doesn't suffice to unsetenv() this var within the program,
as the hooks have already been set up at that stage.
---
 ChangeLog| 9 +
 tests/test-realloc-gnu.c | 4 +++-
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/ChangeLog b/ChangeLog
index b802233cf..30663cb38 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,12 @@
+2021-05-15  Pádraig Brady  
+
+	realloc-gnu: avoid glibc MALLOC_CHECK_ issue
+	* tests/test-realloc-gnu.c (main): if MALLOC_CHECK_ env var
+	is set then don't check ENOMEM is returned from realloc().
+	See https://sourceware.org/bugzilla/show_bug.cgi?id=27870
+	Note it doesn't suffice to unsetenv() this var within the program,
+	as the hooks have already been set up at that stage.
+
 2021-05-14  Paul Eggert  
 
 	c-stack: work around Solaris 11 bugs
diff --git a/tests/test-realloc-gnu.c b/tests/test-realloc-gnu.c
index 3a787ed91..5534f2ea1 100644
--- a/tests/test-realloc-gnu.c
+++ b/tests/test-realloc-gnu.c
@@ -38,7 +38,9 @@ main (int argc, char **argv)
   size_t one = argc != 12345;
   p = realloc (p, PTRDIFF_MAX + one);
   ASSERT (p == NULL);
-  ASSERT (errno == ENOMEM);
+  /* Avoid https://sourceware.org/bugzilla/show_bug.cgi?id=27870  */
+  if (!getenv ("MALLOC_CHECK_"))
+ASSERT (errno == ENOMEM);
 }
 
   free (p);
-- 
2.26.2



Re: *alloc-gnu on glibc

2021-05-15 Thread Bruno Haible
Hi Pádraig,

> On glibc-2.31-5.fc32.x86_64 I'm seeing this test failure with coreutils,
> since the tests are now checking for a specific errno.
> I just checked a later Fedora 34 system which has the same issue.

For me, a testdir created through

   ./gnulib-tool --create-testdir --dir=../testdir --single-configure \
  malloc-gnu realloc-gnu calloc-gnu reallocarray

passes all tests on Fedora 32 (glibc 2.31) and Fedora 33 (glibc 2.32).

> Specifically test-realloc-gnu is enabled in coreutils
> and it's failing as realloc is returning NULL as expected, but errno is 0.
> Specifically on glibc realloc(NULL, PTRDIFF_MAX+1) does return 0,errno=ENOMEM,
> but realloc(non_null, PTRDIFF_MAX+1) returns 0,errno=0.

POSIX 
says two things:

  1) By enumerating the error values, it implies that when realloc() fails,
 errno gets set to an error number. See
 

  2) "If there is not enough available memory, realloc() shall return a null
  pointer and set errno to [ENOMEM]."

If realloc() is failing but does not set errno, or sets errno to 0, that's a
POSIX violation. To be reported in the glibc bugzilla.

If realloc() is failing and sets errno to some value != ENOMEM, that is
technically valid (since you can debate which is the "cause" of the failure
and which errno value provides the best information). Personally I think
that
  - for an argument > PTRDIFF_MAX, EOVERFLOW is a reasonable error code,
  - EAGAIN is not a good error code, since "Out of memory" is a better error
message than "Resource temporarily unavailable".

> In any case I'd be in favor of relaxing the test,

I'm in favour of reporting the POSIX violation in the first place.

> some notes on solaris below...
> ...
> but I did check the code in isolation on a 64 bit solaris 10 system.
> ...
> The following shows that it's important to ensure the compiler
> is explicitly running in 64 bit mode, as the default is 32 bit,

My knowledge about how to invoke the compiler is here:
https://gitlab.com/ghwiki/gnow-how/-/wikis/Platforms/Configuration
If you think some of the table entries are wrong, please say so.

> Both the above returned immediately for me.

Solaris 11 apparently behaves differently than Solaris 10.

Bruno




Re: *alloc-gnu on glibc

2021-05-15 Thread Paul Eggert

On 5/15/21 9:04 AM, Pádraig Brady wrote:


I'm wondering should we be more relaxed here as,
there is only one standardised ENOMEM error from realloc,
so code doesn't have to behave differently based on what the errno is. 


We've already relaxed the errno test for reallocarray, where NetBSD 
fails with EOVERFLOW on ptrdiff_t overflow. (My own feeling is that 
NetBSD is right and glibc is wrong here; perhaps I should file a glibc 
bug report.)


The allocator modules are in some sense backwards. malloc-gnu requires 
malloc-posix because GNU guarantees are a superset of POSIX guarantees. 
And yet GNU programs often don't care about the POSIX guarantee of errno 
being set, whereas some do care about the GNU guarantee that malloc 
returns NULL only on failure.



Now I accept that error messages may be more accurate,
but one should opt into that I think. 


That opt-in was what malloc-posix was supposed to be.

I see three options:

1. Make the realloc-gnu test more generous.

2. Fix realloc-posix to check for this glibc bug and replace realloc if 
the test fails.


3. Complicate the set of Gnulib allocator modules so that apps can 
more-accurately express their needs.


Although (3) is in some sense the "best" it's probably too complicated. 
What we have is already too complicated. I'd rather have just malloc-gnu 
and not worry about non-GNU behavior (that's the Gnulib way, right?).


(2) is surely the right way to go in the long run. The glibc behavior is 
simply a bug.


In the short run I can see the temptation of (1).




Re: New way to integrate gnulib into projects?

2021-05-15 Thread Bruce Korb

On 5/15/21 1:01 AM, Simon Josefsson via Gnulib discussion list wrote:

However, running ./bootstrap remains painfully slow.

Amen!!!

Then a second run of ./bootstrap in the directory would avoid the
gnulib-tool step, and only run autoreconf and friends.  A new cfg.mk
variable to tell ./bootstrap to avoid checking out the gnulib repository
and run gnulib-tool by default would be needed.

Thoughts?

I think most of the functionality I want could be achieved today by
clever use of 'AUTORECONF=true AUTOPOINT=true LIBTOOLIZE=true
./bootstrap' and clever cfg.mk instrumenting with some hook together
with './bootstrap --no-git'.  So I could probably test my approach with
only minimal changes to ./bootstrap now.


I'd add it as "./bootstrap --brief" and no-op the commands internally.

I still think the best solution is to put much of the gnulib stuff into 
an installable gnulib-dev package that would be required by the 
bootstrap script. I proposed that about 2 decades ago, but we're still 
spending gobs of compute and wall clock resources recomputing the same 
stuff. :(





Re: *alloc-gnu on glibc

2021-05-15 Thread Pádraig Brady

On glibc-2.31-5.fc32.x86_64 I'm seeing this test failure with coreutils,
since the tests are now checking for a specific errno.
I just checked a later Fedora 34 system which has the same issue.

Specifically test-realloc-gnu is enabled in coreutils
and it's failing as realloc is returning NULL as expected, but errno is 0.
Specifically on glibc realloc(NULL, PTRDIFF_MAX+1) does return 0,errno=ENOMEM,
but realloc(non_null, PTRDIFF_MAX+1) returns 0,errno=0.

I'm wondering should we be more relaxed here as,
there is only one standardised ENOMEM error from realloc,
so code doesn't have to behave differently based on what the errno is.
Especially if it's using xrealloc etc.
Now I accept that error messages may be more accurate,
but one should opt into that I think.

Should we adjust things so depending on realloc-gnu
would not have the more stringent requirement of ENOMEM
being returned from realloc(), and for projects that wanted that,
they could depend on the realloc-posix module instead?

In any case I'd be in favor of relaxing the test,
rather than replacing realloc everywhere.

some notes on solaris below...

On 14/05/2021 18:04, Bruno Haible wrote:

On Solaris 11.3 (the machine gcc211.fsffrance.org, in 64-bit mode), there
are test failures:
   FAIL: test-calloc-gnu
   FAIL: test-malloc-gnu
   FAIL: test-realloc-gnu
   FAIL: test-reallocarray
Each of these tests takes about 30 seconds before failing, and is not
immediately reactive to Ctrl-C.


These EAGAIN failure modes sound like the value passed to realloc() etc.
is too small to ensure ENOMEM was return immediately.
I've not looked into how the compiler for the m4 test is invoked,
but I did check the code in isolation on a 64 bit solaris 10 system.
The following shows that it's important to ensure the compiler
is explicitly running in 64 bit mode, as the default is 32 bit,
resulting in 32 bit longs and hence too small of values passed in:

> gcc malloc-m4.c
> ./a.out
PTRDIFF_MAX = 2147483647
p=20f90 errno=0
p=0 errno=12
p=0 errno=12

> gcc -m64 malloc-m4.c
> ./a.out
PTRDIFF_MAX = 9223372036854775807
p=0 errno=12
p=0 errno=12
p=0 errno=12

Both the above returned immediately for me.
Though I also notice the existing note in malloc.m4 on this
check being too dangerous to run in general due to this issue.

cheers,
Pádraig



New way to integrate gnulib into projects?

2021-05-15 Thread Simon Josefsson via Gnulib discussion list
Hi.  There are a couple of different ways to use gnulib in projects:

https://www.gnu.org/software/gnulib/manual/html_node/VCS-Issues.html

At some point in some project I've used all variants, but today most of
my projects are converted into not commiting any gnulib files at all and
use gnulib's bootstrap.  This works fairly well, even in projects that
need multiple gnulib instances.

However, running ./bootstrap remains painfully slow.  For my projects,
it is taking the majority of build time.  I'm now looking at refreshing
'libidn' which is an old project that is using approach 1 in the link
above.  That is, no ./bootstrap but only 'gnulib-tool --add-update' and
put all generated files in git.  Building it is quick since no bootstrap
phase is needed, just autoreconf (which is the slowest part of that
build).  The problem is that updating of gnulib-tool-generated files
into VCS is a manual process.

FWIW, for libidn, on my laptop, it takes 30 seconds to run gnulib-tool,
20 seconds to run autoreconf etc, 20 seconds to run ./configure, 35
seconds to run make, 20 seconds to run make check (without valgrind).
For a project with excessive gnulib usage, like oath-toolkit, running
gnulib-tool takes 2 minutes and 42 seconds since there are multiple
gnulib-instances in it.

I'm thinking there may be room for another way of using gnulib in
projects: add one new option to ./bootstrap to cause it to retrieve all
gnulib files but NOT run autoreconf and friends.  I would then use
./bootstrap to retrieve all gnulib-related source files, and commit the
result in git.  Downloading a tarball-copy of that git repository would
allow people to do autoreconf && ./configure && make from a source-only
distribution, which is sometimes requested.

Then a second run of ./bootstrap in the directory would avoid the
gnulib-tool step, and only run autoreconf and friends.  A new cfg.mk
variable to tell ./bootstrap to avoid checking out the gnulib repository
and run gnulib-tool by default would be needed.

Thoughts?

I think most of the functionality I want could be achieved today by
clever use of 'AUTORECONF=true AUTOPOINT=true LIBTOOLIZE=true
./bootstrap' and clever cfg.mk instrumenting with some hook together
with './bootstrap --no-git'.  So I could probably test my approach with
only minimal changes to ./bootstrap now.

/Simon


signature.asc
Description: PGP signature


Re: stackovf.test fails on Solaris 11/sparc64

2021-05-15 Thread Paul Eggert

On 5/14/21 5:07 PM, Bruno Haible wrote:

I won't spend any more time on fixing this Solaris-only code — when we
have code that works on all platforms in GNU libsigsegv.


At first I fixed this by having c-stack simply defer to libsigsegv if 
installed, but I found that this didn't work on gcc211 in the GCC 
compile farm, because its version of Solaris 11 has libsigsegv 2.12 and 
this older libsigsegv has a similar bug that also causes that test to 
fail. So I installed the attached patch instead; please give it a try.


With this patch, I expect Gnulib test-c-stack2.sh to fail on Solaris 11 
sparc with older libsigsegv installed, because it'll report "cannot tell 
stack overflow from crash, in spite of libsigsegv". I expect that's good 
enough, as the failure report is accurate for that platform and people 
can avoid the failure by updating to current libsigsegv.


I hear what you're saying about Solaris SPARC not being worth a lot of 
our time nowadays. Still, Oracle says it'll support it through 2034(!) 
though you will need a SPARC T4 or SPARC64 X or better, and I suppose we 
should continue to keep Solaris SPARC working if it's easy.
>From b4bb5c8ae79eebc3d5ec829b932c04df35881ef2 Mon Sep 17 00:00:00 2001
From: Paul Eggert 
Date: Fri, 14 May 2021 22:48:20 -0700
Subject: [PATCH] c-stack: work around Solaris 11 bugs

Problem reported by Bruno Haible in:
https://lists.gnu.org/r/bug-gnulib/2021-05/msg00062.html
* lib/c-stack.c: Always include sigsegv.h if HAVE_LIBSIGSEGV.
(USE_LIBSIGSEGV): Do not use libsigsegv if the kernel
has the si_addr bug and libsigsegv is too old to work
around it.
(segv_handler) [!USE_LIBSIGSEGV]: Do not trust si_addr
if BOGUS_SI_ADDR_UPON_STACK_OVERFLOW.
* m4/c-stack.m4 (AC_SYS_XSI_STACK_OVERFLOW_HEURISTIC):
Define BOGUS_SI_ADDR_UPON_STACK_OVERFLOW on Solaris 2.11 SPARC.
And do not define HAVE_XSI_STACK_OVERFLOW_HEURISTIC.
---
 ChangeLog | 15 +++
 lib/c-stack.c | 25 +
 m4/c-stack.m4 | 14 --
 3 files changed, 44 insertions(+), 10 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index e3f1d8220..b802233cf 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,18 @@
+2021-05-14  Paul Eggert  
+
+	c-stack: work around Solaris 11 bugs
+	Problem reported by Bruno Haible in:
+	https://lists.gnu.org/r/bug-gnulib/2021-05/msg00062.html
+	* lib/c-stack.c: Always include sigsegv.h if HAVE_LIBSIGSEGV.
+	(USE_LIBSIGSEGV): Do not use libsigsegv if the kernel
+	has the si_addr bug and libsigsegv is too old to work
+	around it.
+	(segv_handler) [!USE_LIBSIGSEGV]: Do not trust si_addr
+	if BOGUS_SI_ADDR_UPON_STACK_OVERFLOW.
+	* m4/c-stack.m4 (AC_SYS_XSI_STACK_OVERFLOW_HEURISTIC):
+	Define BOGUS_SI_ADDR_UPON_STACK_OVERFLOW on Solaris 2.11 SPARC.
+	And do not define HAVE_XSI_STACK_OVERFLOW_HEURISTIC.
+
 2021-05-14  Bruno Haible  
 
 	fcntl tests: Avoid failure in MacPorts.
diff --git a/lib/c-stack.c b/lib/c-stack.c
index c0da7f404..3e8cd2565 100644
--- a/lib/c-stack.c
+++ b/lib/c-stack.c
@@ -66,14 +66,19 @@ typedef struct sigaltstack stack_t;
 #include "gettext.h"
 #define _(msgid) gettext (msgid)
 
-/* Use libsigsegv only if needed; kernels like Solaris can detect
-   stack overflow without the overhead of an external library.  */
-#define USE_LIBSIGSEGV (!HAVE_XSI_STACK_OVERFLOW_HEURISTIC && HAVE_LIBSIGSEGV)
-
-#if USE_LIBSIGSEGV
+#if HAVE_LIBSIGSEGV
 # include 
 #endif
 
+/* Use libsigsegv only if needed; kernels like Solaris can detect
+   stack overflow without the overhead of an external library.
+   However, BOGUS_SI_ADDR_ON_STACK_OVERFLOW indicates a buggy
+   Solaris kernel, and a too-small LIBSIGSEGV_VERSION indicates a
+   libsigsegv so old that it does not work around the bug.  */
+#define USE_LIBSIGSEGV (!HAVE_XSI_STACK_OVERFLOW_HEURISTIC && HAVE_LIBSIGSEGV \
+&& ! (BOGUS_SI_ADDR_UPON_STACK_OVERFLOW \
+  && LIBSIGSEGV_VERSION < 0x020D))
+
 #include "exitfail.h"
 #include "ignore-value.h"
 #include "intprops.h"
@@ -277,6 +282,9 @@ segv_handler (int signo, siginfo_t *info, void *context _GL_UNUSED)
  of the current stack.  */
   if (!cannot_be_stack_overflow)
 {
+#  if BOGUS_SI_ADDR_UPON_STACK_OVERFLOW
+  signo = 0;
+#  else
   /* If the faulting address is within the stack, or within one
  page of the stack, assume that it is a stack overflow.  */
   uintptr_t faulting_address = (uintptr_t) info->si_addr;
@@ -286,12 +294,12 @@ segv_handler (int signo, siginfo_t *info, void *context _GL_UNUSED)
  pages might be in the stack.  */
   void *stack_base = (void *) (uintptr_t) page_size;
   uintptr_t stack_size = 0; stack_size -= page_size;
-#  if HAVE_XSI_STACK_OVERFLOW_HEURISTIC
+#   if HAVE_XSI_STACK_OVERFLOW_HEURISTIC
   /* Tighten the stack bounds via the XSI heuristic.  */
   ucontext_t const *user_context = context;
   stack_base = user_context->uc_stack.ss_sp;
   stack_size = user_context->uc_stack.ss_size;
-#  endif
+#   endif