Re: sys_siglist[] is causing us trouble again
On 7/15/20 7:36 PM, Tom Lane wrote: > I guess rawhide is the RH thing that tracks the bleeding edge? > Yup. Possibly we should recommend that buildfarm owners running on > non-stable platforms disable autoconf result caching --- I believe > that's "use_accache => undef" in the configuration file. > > Alternatively, maybe it'd be bright for the buildfarm script to > discard that cache after any failure (or at least configure or > build failures). Yeah, these lines will be added to the upcoming client code release in run_build.pl Search for 'obsolete' and you'll find where to put it if you want to be ahead of the curve. my $last_stage = get_last_stage() || ""; $obsolete ||= $last_stage =~ /^(Make|Configure|Contrib|.*-build)$/; cheers andrew -- Andrew Dunstanhttps://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: sys_siglist[] is causing us trouble again
On Wed, Jul 15, 2020 at 7:48 PM Tom Lane wrote: > As of a couple days ago, buildfarm member caiman (Fedora rawhide) > is failing like this in all the pre-v12 branches: > > ccache gcc -Wall -Wmissing-prototypes -Wpointer-arith > -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute > -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard > -Wno-format-truncation -Wno-stringop-truncation -g -O2 -DFRONTEND > -I../../src/include -D_GNU_SOURCE -I/usr/include/libxml2 -c -o > wait_error.o wait_error.c > wait_error.c: In function \342\200\230wait_result_to_str\342\200\231: > wait_error.c:71:6: error: \342\200\230sys_siglist\342\200\231 undeclared > (first use in this function) >71 | sys_siglist[WTERMSIG(exitstatus)] : "(unknown)"); > | ^~~ > wait_error.c:71:6: note: each undeclared identifier is reported only once > for each function it appears in > make[2]: *** [: wait_error.o] Error 1 > > We haven't changed anything, ergo something changed at the OS level. > > Oddly, we'd not get to this code unless configure set > HAVE_DECL_SYS_SIGLIST, so it's defined *somewhere*. I suspect the root > issue here is some rearrangement of system header files combined with > wait_error.c (and maybe other places?) not including exactly the same > headers that configure tested. > > Anyway, rather than installing rawhide and trying to debug this, > I'd like to make a modest proposal: let's back-patch the v12 > patches that made us stop relying on sys_siglist[], viz a73d08319 > and cc92cca43. Per the discussions that led to those patches, > it's been decades since any platform didn't have POSIX-compliant > strsignal(), so we'd be much better off relying on that. > > regards, tom lane > I believe it's related with these recent glibc changes at rawhide. https://src.fedoraproject.org/rpms/glibc/c/0aab7eb58528999277c626fc16682da179de03d0?branch=master - signal: Move sys_errlist to a compat symbol - signal: Move sys_siglist to a compat symbol SHA512 (glibc-2.31.9000-683-gffb17e7ba3.tar.xz) = 103ff3c04de5dc149df93e5399de1630f6fff1b8d7f127881d6e530492b8b953a8064205ceecb311a77c0a10de3a5ab2056121fd1fa833a30327c6b1f08beacc
Re: sys_siglist[] is causing us trouble again
Thomas Munro writes: > On Thu, Jul 16, 2020 at 10:48 AM Tom Lane wrote: >> We haven't changed anything, ergo something changed at the OS level. > It looks like glibc very recently decided[1] to hide the declaration, > but we're using a cached configure test result. Right. So, modulo the mis-cached result, what would happen if we do nothing is that the back branches would lose the ability to translate signal numbers to strings on bleeding-edge glibc. I don't think we want that, so we need to back-patch. Attached is a lightly tested patch for v11. (This includes 7570df0f3 as well, so that pgstrsignal.c will be the same in all branches.) regards, tom lane diff --git a/configure b/configure index 4e1b4be7fb..7382a34d60 100755 --- a/configure +++ b/configure @@ -15004,7 +15004,7 @@ fi LIBS_including_readline="$LIBS" LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'` -for ac_func in cbrt clock_gettime dlopen fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll posix_fallocate pstat pthread_is_threaded_np readlink setproctitle setsid shm_open symlink sync_file_range uselocale utime utimes wcstombs_l +for ac_func in cbrt clock_gettime dlopen fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll posix_fallocate pstat pthread_is_threaded_np readlink setproctitle setsid shm_open strsignal symlink sync_file_range uselocale utime utimes wcstombs_l do : as_ac_var=`$as_echo "ac_cv_func_$ac_func" | $as_tr_sh` ac_fn_c_check_func "$LINENO" "$ac_func" "$as_ac_var" @@ -15893,24 +15893,6 @@ esac fi -ac_fn_c_check_decl "$LINENO" "sys_siglist" "ac_cv_have_decl_sys_siglist" "#include -/* NetBSD declares sys_siglist in unistd.h. */ -#ifdef HAVE_UNISTD_H -# include -#endif - -" -if test "x$ac_cv_have_decl_sys_siglist" = xyes; then : - ac_have_decl=1 -else - ac_have_decl=0 -fi - -cat >>confdefs.h <<_ACEOF -#define HAVE_DECL_SYS_SIGLIST $ac_have_decl -_ACEOF - - ac_fn_c_check_func "$LINENO" "syslog" "ac_cv_func_syslog" if test "x$ac_cv_func_syslog" = xyes; then : ac_fn_c_check_header_mongrel "$LINENO" "syslog.h" "ac_cv_header_syslog_h" "$ac_includes_default" diff --git a/configure.in b/configure.in index fab1658bca..defcb8ff99 100644 --- a/configure.in +++ b/configure.in @@ -1622,6 +1622,7 @@ AC_CHECK_FUNCS(m4_normalize([ setproctitle setsid shm_open + strsignal symlink sync_file_range uselocale @@ -1821,14 +1822,6 @@ if test "$PORTNAME" = "cygwin"; then AC_LIBOBJ(dirmod) fi -AC_CHECK_DECLS([sys_siglist], [], [], -[#include -/* NetBSD declares sys_siglist in unistd.h. */ -#ifdef HAVE_UNISTD_H -# include -#endif -]) - AC_CHECK_FUNC(syslog, [AC_CHECK_HEADER(syslog.h, [AC_DEFINE(HAVE_SYSLOG, 1, [Define to 1 if you have the syslog interface.])])]) diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c index 252220c770..08577f5e5f 100644 --- a/src/backend/postmaster/pgarch.c +++ b/src/backend/postmaster/pgarch.c @@ -596,17 +596,10 @@ pgarch_archiveXlog(char *xlog) errhint("See C include file \"ntstatus.h\" for a description of the hexadecimal value."), errdetail("The failed archive command was: %s", xlogarchcmd))); -#elif defined(HAVE_DECL_SYS_SIGLIST) && HAVE_DECL_SYS_SIGLIST - ereport(lev, - (errmsg("archive command was terminated by signal %d: %s", - WTERMSIG(rc), - WTERMSIG(rc) < NSIG ? sys_siglist[WTERMSIG(rc)] : "(unknown)"), - errdetail("The failed archive command was: %s", - xlogarchcmd))); #else ereport(lev, - (errmsg("archive command was terminated by signal %d", - WTERMSIG(rc)), + (errmsg("archive command was terminated by signal %d: %s", + WTERMSIG(rc), pg_strsignal(WTERMSIG(rc))), errdetail("The failed archive command was: %s", xlogarchcmd))); #endif diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c index 3bfc299be1..75a9e07041 100644 --- a/src/backend/postmaster/postmaster.c +++ b/src/backend/postmaster/postmaster.c @@ -3563,6 +3563,7 @@ LogChildExit(int lev, const char *procname, int pid, int exitstatus) procname, pid, WEXITSTATUS(exitstatus)), activity ? errdetail("Failed process was running: %s", activity) : 0)); else if (WIFSIGNALED(exitstatus)) + { #if defined(WIN32) ereport(lev, @@ -3573,7 +3574,7 @@ LogChildExit(int lev, const char *procname, int pid, int exitstatus) procname, pid, WTERMSIG(exitstatus)), errhint("See C include file \"ntstatus.h\" for a description of the hexadecimal value."), activity ? errdetail("Failed process was running: %s", activity) : 0)); -#elif defined(HAVE_DECL_SYS_SIGLIST) && HAVE_DECL_SYS_SIGLIST +#else ereport(lev, /*-- @@ -3581,19 +3582,10 @@ LogChildExit(int lev, const char *procname, int pid, int exitstatus) "server process" */ (errmsg("%s (PID %d) was terminated by signal %d: %s", procname, p
Re: sys_siglist[] is causing us trouble again
Thomas Munro writes: > On Thu, Jul 16, 2020 at 10:48 AM Tom Lane wrote: >> Oddly, we'd not get to this code unless configure set >> HAVE_DECL_SYS_SIGLIST, so it's defined *somewhere*. > It looks like glibc very recently decided[1] to hide the declaration, > but we're using a cached configure test result. Ah, of course. I was thinking that Peter had just changed configure in the last day or so, but that did not affect the back branches. So it's unsurprising for buildfarm animals to be using cached configure results. > I guess rawhide is the RH thing that tracks the bleeding edge? Yup. Possibly we should recommend that buildfarm owners running on non-stable platforms disable autoconf result caching --- I believe that's "use_accache => undef" in the configuration file. Alternatively, maybe it'd be bright for the buildfarm script to discard that cache after any failure (or at least configure or build failures). regards, tom lane
Re: sys_siglist[] is causing us trouble again
On Thu, Jul 16, 2020 at 10:48 AM Tom Lane wrote: > We haven't changed anything, ergo something changed at the OS level. > > Oddly, we'd not get to this code unless configure set > HAVE_DECL_SYS_SIGLIST, so it's defined *somewhere*. I suspect the root > issue here is some rearrangement of system header files combined with > wait_error.c (and maybe other places?) not including exactly the same > headers that configure tested. It looks like glibc very recently decided[1] to hide the declaration, but we're using a cached configure test result. I guess rawhide is the RH thing that tracks the bleeding edge? > Anyway, rather than installing rawhide and trying to debug this, > I'd like to make a modest proposal: let's back-patch the v12 > patches that made us stop relying on sys_siglist[], viz a73d08319 > and cc92cca43. Per the discussions that led to those patches, > it's been decades since any platform didn't have POSIX-compliant > strsignal(), so we'd be much better off relying on that. Seems sensible. Despite the claims of the glibc manual[2], it's not really a GNU extension, and the BSDs have it (for decades). [1] https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=b1ccfc061feee9ce616444ded8e1cd5acf9fa97f [2] https://www.gnu.org/software/libc/manual/html_node/Signal-Messages.html
sys_siglist[] is causing us trouble again
As of a couple days ago, buildfarm member caiman (Fedora rawhide) is failing like this in all the pre-v12 branches: ccache gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -Wno-format-truncation -Wno-stringop-truncation -g -O2 -DFRONTEND -I../../src/include -D_GNU_SOURCE -I/usr/include/libxml2 -c -o wait_error.o wait_error.c wait_error.c: In function \342\200\230wait_result_to_str\342\200\231: wait_error.c:71:6: error: \342\200\230sys_siglist\342\200\231 undeclared (first use in this function) 71 | sys_siglist[WTERMSIG(exitstatus)] : "(unknown)"); | ^~~ wait_error.c:71:6: note: each undeclared identifier is reported only once for each function it appears in make[2]: *** [: wait_error.o] Error 1 We haven't changed anything, ergo something changed at the OS level. Oddly, we'd not get to this code unless configure set HAVE_DECL_SYS_SIGLIST, so it's defined *somewhere*. I suspect the root issue here is some rearrangement of system header files combined with wait_error.c (and maybe other places?) not including exactly the same headers that configure tested. Anyway, rather than installing rawhide and trying to debug this, I'd like to make a modest proposal: let's back-patch the v12 patches that made us stop relying on sys_siglist[], viz a73d08319 and cc92cca43. Per the discussions that led to those patches, it's been decades since any platform didn't have POSIX-compliant strsignal(), so we'd be much better off relying on that. regards, tom lane