On Sat, Jan 07, 2023 at 12:39:11AM +1300, Thomas Munro wrote:
> On Wed, Nov 9, 2022 at 2:04 PM Justin Pryzby <pry...@telsasoft.com> wrote:
> > +data_sync_retry = on
> 
> Sharing with the list some clues that Justin and I figured out about
> what that part is doing.  Without it, you get failures like:
> 
>   PANIC:  could not open file "pg_logical/snapshots/0-14FE6B0.snap":
> No such file or directory
> 
> That's been seen before:
> 
>   https://www.postgresql.org/message-id/flat/17827.1549866683%40sss.pgh.pa.us
> 
> That thread concluded that the operating system must have a non-atomic
> rename(), ie a kernel bug.  I don't know why Cygwin would display that
> behaviour and our native Windows build not; maybe timing, or maybe our
> own open() and rename() wrappers for Windows do something important
> differently than Cygwin's open() and rename().
> 
> On reflection, that seems a bit too flimsy to have in-tree without
> more investigation, which I won't have time for myself, so I'm going
> to withdraw this entry.

Not so fast :)

Here's my latest copy of the patch.  Most recently, rather than setting
data_sync_retry=no, I changed to call fsync_fname_ext() rather than
fsync_fname(), which uses PANIC (except when data_sync_retry is
disabled).  That seems to work, showing that the problem is limited to
SnapBuildSerialize(), and not a problem with all fsync()...

https://cirrus-ci.com/task/5990885733695488

Thomas raised a good question, which was how the tests were passing when
SnapBuildSerialize() was raising an error, which is what it would've
been doing when I used data_sync_retry=no.

So .. why is wal_sync_method being used to control fsync for things
other than WAL?

See 6dc7760ac (c. 2005) which added wal_fsync_writethrough, at which
point (since 9b178555f, c. 2004) wal_sync_method was already being used
for SLOG.

Now, it's also being used for logical decoding (since b89e1510 and
858ec1185, c. 2014) in rewriteheap.c/snapbuild.c.  And pidfiles (since
ee0e525bf, 2010).  And the control file (8b938d36f7, 2019).  Note that
data_sync_retry wasn't added until 9ccdd7f66 (c.  2018)

It looks like logical decoding may be the "most wrong" place that
wal_sync_method is being used, so maybe my change is reasonable to
consider, and not just a workaround.

I'm going to re-open the CF entry to let this run for a while to see how
it works out.

-- 
Justin
>From b07add11b8bf39f5bfbae4f9072470f31da97360 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.mu...@gmail.com>
Date: Mon, 25 Jul 2022 23:05:10 +1200
Subject: [PATCH] WIP CI support for Cygwin.

ci-os-only: cygwin

See also: d8e78714-dc77-4a64-783f-e863ba4d9...@2ndquadrant.com

https://cirrus-ci.com/task/5145086722834432

XXX This should use a canned Docker image with all the right packages
installed?  But if the larger image is slower to start, then maybe not...
---
 .cirrus.yml                                 | 83 +++++++++++++++++++++
 configure                                   |  2 +-
 configure.ac                                |  2 +-
 src/backend/replication/logical/snapbuild.c |  4 +-
 src/test/perl/PostgreSQL/Test/Cluster.pm    |  4 +-
 src/test/perl/PostgreSQL/Test/Utils.pm      | 12 ++-
 src/test/recovery/t/020_archive_status.pl   |  2 +-
 src/tools/ci/cores_backtrace.sh             | 19 ++++-
 8 files changed, 118 insertions(+), 10 deletions(-)

diff --git a/.cirrus.yml b/.cirrus.yml
index d13726ed893..4507f734e94 100644
--- a/.cirrus.yml
+++ b/.cirrus.yml
@@ -737,6 +737,89 @@ task:
       type: text/plain
 
 
+task:
+  name: Windows - Cygwin
+
+  # due to resource constraints we don't run this task by default for now
+  trigger_type: manual
+  # worth using only_if despite being manual, otherwise this task will show up
+  # when e.g. ci-os-only: linux is used.
+  only_if: $CIRRUS_CHANGE_MESSAGE !=~ '.*\nci-os-only:.*' || $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:[^\n]*cygwin.*'
+  # otherwise it'll be sorted before other tasks
+  depends_on: SanityCheck
+
+  #XXX only_if: $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:[^\n]*cygwin.*'
+  #timeout_in: 120m
+
+  env:
+    CPUS: 4
+    BUILD_JOBS: $CPUS
+    TEST_JOBS: $CPUS
+    CCACHE_DIR: /tmp/ccache
+    CCACHE_LOGFILE: ccache.log
+    # --disable-dynamicbase
+    # --with-gssapi
+    CONFIGURE_CACHE: /tmp/ccache/configure.cache
+    PG_TEST_USE_UNIX_SOCKETS: 1
+    EXTRA_REGRESS_OPTS: --max-connections=1
+    PG_TEST_EXTRA: ldap ssl # disable kerberos
+    CC: ccache gcc
+    CFLAGS: -Og -ggdb
+    BASH: C:\tools\cygwin\bin\bash.exe --login
+
+  #windows_container:
+    #image: cirrusci/windowsservercore:2019-2022.06.23
+    #os_version: 2019
+  compute_engine_instance:
+    image_project: $IMAGE_PROJECT
+    image: family/pg-ci-windows-ci-vs-2019
+    platform: windows
+    cpu: $CPUS
+    memory: 4G
+
+  setup_additional_packages_script: |
+    choco install -y --no-progress cygwin
+    C:\tools\cygwin\cygwinsetup.exe -q -P cygrunsrv,make,gcc-core,ccache,binutils,libtool,pkg-config,flex,bison,zlib-devel,libxml2-devel,libxslt-devel,libssl-devel,openldap-devel,libreadline-devel,perl,meson,ninja,perl-IPC-Run
+    REM libkrb5-devel,krb5-server
+    %BASH% -c "cygserver-config -y"
+    %BASH% -c "echo 'kern.ipc.semmni 1024' >> /etc/cygserver.conf"
+    %BASH% -c "echo 'kern.ipc.semmns 1024' >> /etc/cygserver.conf"
+    %BASH% -c "net start cygserver"
+
+  sysinfo_script: |
+    chcp
+    systeminfo
+    powershell -Command get-psdrive -psprovider filesystem
+    set
+    %BASH% -c "id; uname -a; ulimit -a -H; ulimit -a -S; export"
+
+  ccache_cache:
+    folder: C:\tools\cygwin\tmp\ccache
+    fingerprint_key: ccache/cygwin
+    reupload_on_changes: true
+
+  configure_script: |
+    %BASH% -c "cd '%cd%' && meson setup --buildtype=debug -Dcassert=true -Dssl=openssl -Duuid=e2fs -DPG_TEST_EXTRA='$PG_TEST_EXTRA' build"
+
+  build_script: |
+    %BASH% -c "cd '%cd%' && ninja -C build -j${BUILD_JOBS}"
+    %BASH% -c "ccache --show-stats"
+
+  always:
+    upload_caches: ccache
+
+    #%BASH% -c "cd '%cd%' && echo 'data_sync_retry = on' >> src/tools/ci/pg_ci_base.conf"
+    #%BASH% -c "cd '%cd%' && echo 'wal_sync_method = fdatasync' >> src/tools/ci/pg_ci_base.conf"
+    # --repeat 9
+  test_world_script: |
+    %BASH% -c "cd '%cd%' && meson test $MTEST_ARGS --num-processes ${TEST_JOBS}"
+
+  on_failure:
+    <<: *on_failure_meson
+    cores_script: |
+      %BASH% -c "cd '%cd%' && src/tools/ci/cores_backtrace.sh cygwin ."
+
+
 task:
   name: CompilerWarnings
   # task that did not run, count as a success, so we need to recheck Linux'
diff --git a/configure b/configure
index 5d07fd0bb91..72d56f00534 100755
--- a/configure
+++ b/configure
@@ -16477,7 +16477,7 @@ fi
 
 # mingw has adopted a GNU-centric interpretation of optind/optreset,
 # so always use our version on Windows.
-if test "$PORTNAME" = "win32"; then
+if test "$PORTNAME" = "win32" -o "$PORTNAME" = "cygwin"; then
   case " $LIBOBJS " in
   *" getopt.$ac_objext "* ) ;;
   *) LIBOBJS="$LIBOBJS getopt.$ac_objext"
diff --git a/configure.ac b/configure.ac
index e9b74ced6ca..e0a9c332060 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1899,7 +1899,7 @@ fi
 
 # mingw has adopted a GNU-centric interpretation of optind/optreset,
 # so always use our version on Windows.
-if test "$PORTNAME" = "win32"; then
+if test "$PORTNAME" = "win32" -o "$PORTNAME" = "cygwin"; then
   AC_LIBOBJ(getopt)
   AC_LIBOBJ(getopt_long)
 fi
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index 829c5681120..f0929600fcc 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -1812,7 +1812,9 @@ SnapBuildSerialize(SnapBuild *builder, XLogRecPtr lsn)
 	}
 
 	/* make sure we persist */
-	fsync_fname(path, false);
+	if (fsync_fname_ext(path, false, false, ERROR))
+		elog(ERROR, "failed to fsync");
+
 	fsync_fname("pg_logical/snapshots", true);
 
 	/*
diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm
index 04921ca3a3d..31ac1d020a4 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -1097,7 +1097,7 @@ sub enable_restoring
 	# the path contains spaces.
 	$path =~ s{\\}{\\\\}g if ($PostgreSQL::Test::Utils::windows_os);
 	my $copy_command =
-	  $PostgreSQL::Test::Utils::windows_os
+	  $PostgreSQL::Test::Utils::windows_os && !$PostgreSQL::Test::Utils::is_cygwin
 	  ? qq{copy "$path\\\\%f" "%p"}
 	  : qq{cp "$path/%f" "%p"};
 
@@ -1167,7 +1167,7 @@ sub enable_archiving
 	# the path contains spaces.
 	$path =~ s{\\}{\\\\}g if ($PostgreSQL::Test::Utils::windows_os);
 	my $copy_command =
-	  $PostgreSQL::Test::Utils::windows_os
+	  $PostgreSQL::Test::Utils::windows_os && !$PostgreSQL::Test::Utils::is_cygwin
 	  ? qq{copy "%p" "$path\\\\%f"}
 	  : qq{cp "%p" "$path/%f"};
 
diff --git a/src/test/perl/PostgreSQL/Test/Utils.pm b/src/test/perl/PostgreSQL/Test/Utils.pm
index 878e12b15ed..0c3f4dc35a0 100644
--- a/src/test/perl/PostgreSQL/Test/Utils.pm
+++ b/src/test/perl/PostgreSQL/Test/Utils.pm
@@ -88,10 +88,11 @@ our @EXPORT = qw(
 
   $windows_os
   $is_msys2
+  $is_cygwin
   $use_unix_sockets
 );
 
-our ($windows_os, $is_msys2, $use_unix_sockets, $timeout_default,
+our ($windows_os, $is_msys2, $is_cygwin, $use_unix_sockets, $timeout_default,
 	$tmp_check, $log_path, $test_logfile);
 
 BEGIN
@@ -140,13 +141,18 @@ BEGIN
 	$ENV{PGAPPNAME} = basename($0);
 
 	# Must be set early
-	$windows_os = $Config{osname} eq 'MSWin32' || $Config{osname} eq 'msys';
+	$windows_os = $Config{osname} eq 'MSWin32' || $Config{osname} eq 'msys'
+		|| $Config{osname} eq 'cygwin';
+
 	# Check if this environment is MSYS2.
 	$is_msys2 =
 	     $windows_os
 	  && -x '/usr/bin/uname'
 	  && `uname -or` =~ /^[2-9].*Msys/;
 
+	# Check if this environment is Cygwin
+	$is_cygwin = $Config{osname} eq 'cygwin';
+
 	if ($windows_os)
 	{
 		require Win32API::File;
@@ -707,7 +713,7 @@ sub dir_symlink
 {
 	my $oldname = shift;
 	my $newname = shift;
-	if ($windows_os)
+	if ($windows_os && !$is_cygwin)
 	{
 		$oldname =~ s,/,\\,g;
 		$newname =~ s,/,\\,g;
diff --git a/src/test/recovery/t/020_archive_status.pl b/src/test/recovery/t/020_archive_status.pl
index 13ada994dbb..0462d1d90c2 100644
--- a/src/test/recovery/t/020_archive_status.pl
+++ b/src/test/recovery/t/020_archive_status.pl
@@ -26,7 +26,7 @@ my $primary_data = $primary->data_dir;
 # a portable solution, use an archive command based on a command known to
 # work but will fail: copy with an incorrect original path.
 my $incorrect_command =
-  $PostgreSQL::Test::Utils::windows_os
+  $PostgreSQL::Test::Utils::windows_os && !$PostgreSQL::Test::Utils::is_cygwin
   ? qq{copy "%p_does_not_exist" "%f_does_not_exist"}
   : qq{cp "%p_does_not_exist" "%f_does_not_exist"};
 $primary->safe_psql(
diff --git a/src/tools/ci/cores_backtrace.sh b/src/tools/ci/cores_backtrace.sh
index 28d3cecfc67..c49f9b07752 100755
--- a/src/tools/ci/cores_backtrace.sh
+++ b/src/tools/ci/cores_backtrace.sh
@@ -1,5 +1,7 @@
 #! /bin/sh
 
+#set -e
+
 if [ $# -ne 2 ]; then
     echo "cores_backtrace.sh <os> <directory>"
     exit 1
@@ -8,9 +10,22 @@ fi
 os=$1
 directory=$2
 
+findargs=''
 case $os in
     freebsd|linux|macos)
-    ;;
+        ;;
+
+    cygwin)
+        for stack in $(find "$directory" -type f -name "*.stackdump") ; do
+            binary=`basename "$stack" .stackdump`
+            echo;echo;
+            echo "dumping ${stack} for ${binary}"
+            awk '/^0/{print $2}' $stack |addr2line -f -i -e ./build/tmp_install/usr/local/pgsql/bin/postgres.exe
+            #awk '/^0/{print $2}' $stack |addr2line -f -i -e "./build/src/backend/$binary.exe"
+        done
+        exit 0
+        ;;
+
     *)
         echo "unsupported operating system ${os}"
         exit 1
@@ -48,3 +63,5 @@ for corefile in $(find "$directory" -type f) ; do
         gdb --batch --quiet -ex "thread apply all bt full" -ex "quit" "$binary" "$corefile" 2>/dev/null
     fi
 done
+
+exit 0
-- 
2.25.1

Reply via email to