Re: sys_siglist[] is causing us trouble again

2020-07-16 Thread Andrew Dunstan


On 7/15/20 7:36 PM, Tom Lane wrote:
> I guess rawhide is the RH thing that tracks the bleeding edge?
> Yup.  Possibly we should recommend that buildfarm owners running on
> non-stable platforms disable autoconf result caching --- I believe
> that's "use_accache => undef" in the configuration file.
>
> Alternatively, maybe it'd be bright for the buildfarm script to
> discard that cache after any failure (or at least configure or
> build failures).



Yeah, these lines will be added to the upcoming client code release in
run_build.pl Search for 'obsolete' and you'll find where to put it if
you want to be ahead of the curve.


my $last_stage = get_last_stage() || "";
$obsolete ||=
    $last_stage =~ /^(Make|Configure|Contrib|.*-build)$/;


cheers


andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services





Re: sys_siglist[] is causing us trouble again

2020-07-15 Thread Filipe Rosset
On Wed, Jul 15, 2020 at 7:48 PM Tom Lane  wrote:

> As of a couple days ago, buildfarm member caiman (Fedora rawhide)
> is failing like this in all the pre-v12 branches:
>
> ccache gcc -Wall -Wmissing-prototypes -Wpointer-arith
> -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute
> -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard
> -Wno-format-truncation -Wno-stringop-truncation -g -O2 -DFRONTEND
> -I../../src/include -D_GNU_SOURCE -I/usr/include/libxml2   -c -o
> wait_error.o wait_error.c
> wait_error.c: In function \342\200\230wait_result_to_str\342\200\231:
> wait_error.c:71:6: error: \342\200\230sys_siglist\342\200\231 undeclared
> (first use in this function)
>71 |  sys_siglist[WTERMSIG(exitstatus)] : "(unknown)");
>   |  ^~~
> wait_error.c:71:6: note: each undeclared identifier is reported only once
> for each function it appears in
> make[2]: *** [: wait_error.o] Error 1
>
> We haven't changed anything, ergo something changed at the OS level.
>
> Oddly, we'd not get to this code unless configure set
> HAVE_DECL_SYS_SIGLIST, so it's defined *somewhere*.  I suspect the root
> issue here is some rearrangement of system header files combined with
> wait_error.c (and maybe other places?) not including exactly the same
> headers that configure tested.
>
> Anyway, rather than installing rawhide and trying to debug this,
> I'd like to make a modest proposal: let's back-patch the v12
> patches that made us stop relying on sys_siglist[], viz a73d08319
> and cc92cca43.  Per the discussions that led to those patches,
> it's been decades since any platform didn't have POSIX-compliant
> strsignal(), so we'd be much better off relying on that.
>
> regards, tom lane
>

 I believe it's related with these recent glibc changes at rawhide.
https://src.fedoraproject.org/rpms/glibc/c/0aab7eb58528999277c626fc16682da179de03d0?branch=master

  - signal: Move sys_errlist to a compat symbol
  - signal: Move sys_siglist to a compat symbol
SHA512 (glibc-2.31.9000-683-gffb17e7ba3.tar.xz) =
103ff3c04de5dc149df93e5399de1630f6fff1b8d7f127881d6e530492b8b953a8064205ceecb311a77c0a10de3a5ab2056121fd1fa833a30327c6b1f08beacc


Re: sys_siglist[] is causing us trouble again

2020-07-15 Thread Tom Lane
Thomas Munro  writes:
> On Thu, Jul 16, 2020 at 10:48 AM Tom Lane  wrote:
>> We haven't changed anything, ergo something changed at the OS level.

> It looks like glibc very recently decided[1] to hide the declaration,
> but we're using a cached configure test result.

Right.  So, modulo the mis-cached result, what would happen if we do
nothing is that the back branches would lose the ability to translate
signal numbers to strings on bleeding-edge glibc.  I don't think we
want that, so we need to back-patch.  Attached is a lightly tested
patch for v11.  (This includes 7570df0f3 as well, so that
pgstrsignal.c will be the same in all branches.)

regards, tom lane

diff --git a/configure b/configure
index 4e1b4be7fb..7382a34d60 100755
--- a/configure
+++ b/configure
@@ -15004,7 +15004,7 @@ fi
 LIBS_including_readline="$LIBS"
 LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`
 
-for ac_func in cbrt clock_gettime dlopen fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll posix_fallocate pstat pthread_is_threaded_np readlink setproctitle setsid shm_open symlink sync_file_range uselocale utime utimes wcstombs_l
+for ac_func in cbrt clock_gettime dlopen fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll posix_fallocate pstat pthread_is_threaded_np readlink setproctitle setsid shm_open strsignal symlink sync_file_range uselocale utime utimes wcstombs_l
 do :
   as_ac_var=`$as_echo "ac_cv_func_$ac_func" | $as_tr_sh`
 ac_fn_c_check_func "$LINENO" "$ac_func" "$as_ac_var"
@@ -15893,24 +15893,6 @@ esac
 
 fi
 
-ac_fn_c_check_decl "$LINENO" "sys_siglist" "ac_cv_have_decl_sys_siglist" "#include 
-/* NetBSD declares sys_siglist in unistd.h.  */
-#ifdef HAVE_UNISTD_H
-# include 
-#endif
-
-"
-if test "x$ac_cv_have_decl_sys_siglist" = xyes; then :
-  ac_have_decl=1
-else
-  ac_have_decl=0
-fi
-
-cat >>confdefs.h <<_ACEOF
-#define HAVE_DECL_SYS_SIGLIST $ac_have_decl
-_ACEOF
-
-
 ac_fn_c_check_func "$LINENO" "syslog" "ac_cv_func_syslog"
 if test "x$ac_cv_func_syslog" = xyes; then :
   ac_fn_c_check_header_mongrel "$LINENO" "syslog.h" "ac_cv_header_syslog_h" "$ac_includes_default"
diff --git a/configure.in b/configure.in
index fab1658bca..defcb8ff99 100644
--- a/configure.in
+++ b/configure.in
@@ -1622,6 +1622,7 @@ AC_CHECK_FUNCS(m4_normalize([
 	setproctitle
 	setsid
 	shm_open
+	strsignal
 	symlink
 	sync_file_range
 	uselocale
@@ -1821,14 +1822,6 @@ if test "$PORTNAME" = "cygwin"; then
   AC_LIBOBJ(dirmod)
 fi
 
-AC_CHECK_DECLS([sys_siglist], [], [],
-[#include 
-/* NetBSD declares sys_siglist in unistd.h.  */
-#ifdef HAVE_UNISTD_H
-# include 
-#endif
-])
-
 AC_CHECK_FUNC(syslog,
   [AC_CHECK_HEADER(syslog.h,
[AC_DEFINE(HAVE_SYSLOG, 1, [Define to 1 if you have the syslog interface.])])])
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 252220c770..08577f5e5f 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -596,17 +596,10 @@ pgarch_archiveXlog(char *xlog)
 	 errhint("See C include file \"ntstatus.h\" for a description of the hexadecimal value."),
 	 errdetail("The failed archive command was: %s",
 			   xlogarchcmd)));
-#elif defined(HAVE_DECL_SYS_SIGLIST) && HAVE_DECL_SYS_SIGLIST
-			ereport(lev,
-	(errmsg("archive command was terminated by signal %d: %s",
-			WTERMSIG(rc),
-			WTERMSIG(rc) < NSIG ? sys_siglist[WTERMSIG(rc)] : "(unknown)"),
-	 errdetail("The failed archive command was: %s",
-			   xlogarchcmd)));
 #else
 			ereport(lev,
-	(errmsg("archive command was terminated by signal %d",
-			WTERMSIG(rc)),
+	(errmsg("archive command was terminated by signal %d: %s",
+			WTERMSIG(rc), pg_strsignal(WTERMSIG(rc))),
 	 errdetail("The failed archive command was: %s",
 			   xlogarchcmd)));
 #endif
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 3bfc299be1..75a9e07041 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -3563,6 +3563,7 @@ LogChildExit(int lev, const char *procname, int pid, int exitstatus)
 		procname, pid, WEXITSTATUS(exitstatus)),
  activity ? errdetail("Failed process was running: %s", activity) : 0));
 	else if (WIFSIGNALED(exitstatus))
+	{
 #if defined(WIN32)
 		ereport(lev,
 
@@ -3573,7 +3574,7 @@ LogChildExit(int lev, const char *procname, int pid, int exitstatus)
 		procname, pid, WTERMSIG(exitstatus)),
  errhint("See C include file \"ntstatus.h\" for a description of the hexadecimal value."),
  activity ? errdetail("Failed process was running: %s", activity) : 0));
-#elif defined(HAVE_DECL_SYS_SIGLIST) && HAVE_DECL_SYS_SIGLIST
+#else
 		ereport(lev,
 
 		/*--
@@ -3581,19 +3582,10 @@ LogChildExit(int lev, const char *procname, int pid, int exitstatus)
 		  "server process" */
 (errmsg("%s (PID %d) was terminated by signal %d: %s",
 		procname, p

Re: sys_siglist[] is causing us trouble again

2020-07-15 Thread Tom Lane
Thomas Munro  writes:
> On Thu, Jul 16, 2020 at 10:48 AM Tom Lane  wrote:
>> Oddly, we'd not get to this code unless configure set
>> HAVE_DECL_SYS_SIGLIST, so it's defined *somewhere*.

> It looks like glibc very recently decided[1] to hide the declaration,
> but we're using a cached configure test result.

Ah, of course.  I was thinking that Peter had just changed configure
in the last day or so, but that did not affect the back branches.
So it's unsurprising for buildfarm animals to be using cached configure
results.

> I guess rawhide is the RH thing that tracks the bleeding edge?

Yup.  Possibly we should recommend that buildfarm owners running on
non-stable platforms disable autoconf result caching --- I believe
that's "use_accache => undef" in the configuration file.

Alternatively, maybe it'd be bright for the buildfarm script to
discard that cache after any failure (or at least configure or
build failures).

regards, tom lane




Re: sys_siglist[] is causing us trouble again

2020-07-15 Thread Thomas Munro
On Thu, Jul 16, 2020 at 10:48 AM Tom Lane  wrote:
> We haven't changed anything, ergo something changed at the OS level.
>
> Oddly, we'd not get to this code unless configure set
> HAVE_DECL_SYS_SIGLIST, so it's defined *somewhere*.  I suspect the root
> issue here is some rearrangement of system header files combined with
> wait_error.c (and maybe other places?) not including exactly the same
> headers that configure tested.

It looks like glibc very recently decided[1] to hide the declaration,
but we're using a cached configure test result.  I guess rawhide is
the RH thing that tracks the bleeding edge?

> Anyway, rather than installing rawhide and trying to debug this,
> I'd like to make a modest proposal: let's back-patch the v12
> patches that made us stop relying on sys_siglist[], viz a73d08319
> and cc92cca43.  Per the discussions that led to those patches,
> it's been decades since any platform didn't have POSIX-compliant
> strsignal(), so we'd be much better off relying on that.

Seems sensible.  Despite the claims of the glibc manual[2], it's not
really a GNU extension, and the BSDs have it (for decades).

[1] 
https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=b1ccfc061feee9ce616444ded8e1cd5acf9fa97f
[2] https://www.gnu.org/software/libc/manual/html_node/Signal-Messages.html




sys_siglist[] is causing us trouble again

2020-07-15 Thread Tom Lane
As of a couple days ago, buildfarm member caiman (Fedora rawhide)
is failing like this in all the pre-v12 branches:

ccache gcc -Wall -Wmissing-prototypes -Wpointer-arith 
-Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute 
-Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard 
-Wno-format-truncation -Wno-stringop-truncation -g -O2 -DFRONTEND 
-I../../src/include -D_GNU_SOURCE -I/usr/include/libxml2   -c -o wait_error.o 
wait_error.c
wait_error.c: In function \342\200\230wait_result_to_str\342\200\231:
wait_error.c:71:6: error: \342\200\230sys_siglist\342\200\231 undeclared (first 
use in this function)
   71 |  sys_siglist[WTERMSIG(exitstatus)] : "(unknown)");
  |  ^~~
wait_error.c:71:6: note: each undeclared identifier is reported only once for 
each function it appears in
make[2]: *** [: wait_error.o] Error 1

We haven't changed anything, ergo something changed at the OS level.

Oddly, we'd not get to this code unless configure set
HAVE_DECL_SYS_SIGLIST, so it's defined *somewhere*.  I suspect the root
issue here is some rearrangement of system header files combined with
wait_error.c (and maybe other places?) not including exactly the same
headers that configure tested.

Anyway, rather than installing rawhide and trying to debug this,
I'd like to make a modest proposal: let's back-patch the v12
patches that made us stop relying on sys_siglist[], viz a73d08319
and cc92cca43.  Per the discussions that led to those patches,
it's been decades since any platform didn't have POSIX-compliant
strsignal(), so we'd be much better off relying on that.

regards, tom lane