Hi, I am running into a strange issue with Postgres 10 when using pg_dump with the directory format and jobs parameter set it intermittently hangs. Seems to occur less frequently the lower I set the jobs parameter but does happen eventually even when set to 2 (could not reproduce when jobs=1). I've tested with Postgres 11 and no matter how high I set the jobs parameter it never hangs (tested all the way up to jobs=8). Also with Postgres 10 and other formats I don't run into the issue. Here are my server specs and software versions (OS and Postgres)
Windows Server 2016 DataCenter 10.0.14393 2 cores 4 logical processors 26GB memory Postgres 11.4 (port 5433) Postgres 10.9 (port 5432) ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- And here are the psql statements and powershell script to recreate the issue. CREATE DATABASE pg_dump_hang; \c pg_dump_hang CREATE TABLE series1 AS SELECT i FROM generate_series(1,1000000) i; CREATE TABLE series2 AS SELECT i FROM generate_series(1,1000000) i; CREATE TABLE series3 AS SELECT i FROM generate_series(1,1000000) i; CREATE TABLE series4 AS SELECT i FROM generate_series(1,1000000) i; CREATE TABLE series5 AS SELECT i FROM generate_series(1,1000000) i; CREATE TABLE series6 AS SELECT i FROM generate_series(1,1000000) i; CREATE TABLE series7 AS SELECT i FROM generate_series(1,1000000) i; CREATE TABLE series8 AS SELECT i FROM generate_series(1,1000000) i; $env:PGPASSWORD='#######' $env:PGUSER='postgres' $env:PGPORT=5433 set-location "C:\program files\postgresql\11\bin" $a = 1 DO { remove-item -Recurse D:\dumphangtest .\pg_dump.exe --format=directory --jobs=4 --file=D:\dumphangtest pg_dump_hang $a++ "Completed loop $a at -" + (date) } While ($a -le 25) $env:PGPASSWORD='#######' $env:PGUSER='postgres' $env:PGPORT=5432 set-location "c:\program files\postgresql\10\bin" $a = 1 DO { remove-item -Recurse D:\dumphangtest .\pg_dump.exe --format=directory --jobs=4 --file=D:\dumphangtest pg_dump_hang $a++ "Completed loop $a at -" + (date) } While ($a -le 25) ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Here is the output from pg_stat_activity and the filesystem listing when the hang occurs. --jobs=4 postgres=# select pid,state,state_change,query_start,wait_event,wait_event_type,substring(query,1,35) from pg_stat_activity where datname like 'pg_dump%'; pid | state | state_change | query_start | wait_event | wait_event_type | substring ------+---------------------+-------------------------------+-------------------------------+-------------+-----------------+------------------------------------- 5324 | idle in transaction | 2019-06-27 15:29:13.03416+01 | 2019-06-27 15:29:13.03154+01 | ClientRead | Client | SELECT at.attname, (SELECT pg_catal 4144 | idle in transaction | 2019-06-27 15:29:16.222315+01 | 2019-06-27 15:29:15.070106+01 | ClientRead | Client | COPY public.series6 (i) TO stdout; 5576 | active | 2019-06-27 15:29:13.642069+01 | 2019-06-27 15:29:13.642068+01 | ClientWrite | Client | COPY public.series2 (i) TO stdout; 3860 | idle in transaction | 2019-06-27 15:29:17.109549+01 | 2019-06-27 15:29:16.198842+01 | ClientRead | Client | COPY public.series8 (i) TO stdout; 3120 | idle in transaction | 2019-06-27 15:29:16.350121+01 | 2019-06-27 15:29:15.089817+01 | ClientRead | Client | COPY public.series7 (i) TO stdout; (5 rows) D:\>dir dumphangtest Volume in drive D has no label. Volume Serial Number is C635-CDEF Directory of D:\dumphangtest 06/27/2019 10:29 AM <DIR> . 06/27/2019 10:29 AM <DIR> .. 06/27/2019 10:29 AM 2,114,899 2863.dat.gz 06/27/2019 10:29 AM 376,832 2864.dat.gz 06/27/2019 10:29 AM 2,114,899 2865.dat.gz 06/27/2019 10:29 AM 2,114,899 2866.dat.gz 06/27/2019 10:29 AM 2,114,899 2867.dat.gz 06/27/2019 10:29 AM 2,114,899 2868.dat.gz 06/27/2019 10:29 AM 2,114,899 2869.dat.gz 06/27/2019 10:29 AM 2,114,899 2870.dat.gz 06/27/2019 10:29 AM 8,452 toc.dat 9 File(s) 15,189,577 bytes 2 Dir(s) 102,021,439,488 bytes free --jobs=2 postgres=# select pid,state,state_change,query_start,wait_event,wait_event_type,substring(query,1,35) from pg_stat_activity where datname like 'pg_dump%'; pid | state | state_change | query_start | wait_event | wait_event_type | substring ------+---------------------+-------------------------------+-------------------------------+-------------+-----------------+------------------------------------- 3420 | idle in transaction | 2019-06-27 14:37:55.067559+01 | 2019-06-27 14:37:54.448198+01 | ClientRead | Client | COPY public.series8 (i) TO stdout; 3180 | idle in transaction | 2019-06-27 14:37:49.537396+01 | 2019-06-27 14:37:49.53608+01 | ClientRead | Client | SELECT at.attname, (SELECT pg_catal 5364 | active | 2019-06-27 14:37:50.151002+01 | 2019-06-27 14:37:50.151001+01 | ClientWrite | Client | COPY public.series2 (i) TO stdout; (3 rows) D:\>dir dumphangtest Volume in drive D has no label. Volume Serial Number is C635-CDEF Directory of D:\dumphangtest 06/27/2019 09:37 AM <DIR> . 06/27/2019 09:37 AM <DIR> .. 06/27/2019 09:37 AM 2,114,899 2863.dat.gz 06/27/2019 09:37 AM 81,920 2864.dat.gz 06/27/2019 09:37 AM 2,114,899 2865.dat.gz 06/27/2019 09:37 AM 2,114,899 2866.dat.gz 06/27/2019 09:37 AM 2,114,899 2867.dat.gz 06/27/2019 09:37 AM 2,114,899 2868.dat.gz 06/27/2019 09:37 AM 2,114,899 2869.dat.gz 06/27/2019 09:37 AM 2,114,899 2870.dat.gz 06/27/2019 09:37 AM 8,452 toc.dat 9 File(s) 14,894,665 bytes 2 Dir(s) 102,056,497,152 bytes free ---------------------------------------------------------------------------------------------------------- And here is the stack trace from the thread of the hung process. ntoskrnl.exe!KiSwapContext+0x76 ntoskrnl.exe!KiSwapThread+0x17d ntoskrnl.exe!KiCommitThreadWait+0x14f ntoskrnl.exe!KeWaitForSingleObject+0x377 ntoskrnl.exe!KiSchedulerApc+0x231 ntoskrnl.exe!KiDeliverApc+0x22a ntoskrnl.exe!KiSwapThread+0x364 ntoskrnl.exe!KiCommitThreadWait+0x14f ntoskrnl.exe!KeWaitForMultipleObjects+0x1fe ntoskrnl.exe!ObWaitForMultipleObjects+0x2c7 ntoskrnl.exe!NtWaitForMultipleObjects+0xf9 ntoskrnl.exe!KiSystemServiceCopyEnd+0x13 ntdll.dll!NtWaitForMultipleObjects+0x14 KERNELBASE.dll!WaitForMultipleObjectsEx+0xef KERNELBASE.dll!WaitForMultipleObjects+0xe postgres.exe!WaitLatchOrSocket+0x456 postgres.exe!WaitEventSetWait+0x8e postgres.exe!secure_write+0xeb postgres.exe!pq_putbytes+0x4e5 postgres.exe!pq_putbytes+0x41b postgres.exe!pq_putbytes+0x256 postgres.exe!CreateCopyDestReceiver+0x3784 postgres.exe!CreateCopyDestReceiver+0x17b2 postgres.exe!CreateCopyDestReceiver+0x144c postgres.exe!CreateCopyDestReceiver+0x1098 postgres.exe!DoCopy+0x50a postgres.exe!standard_ProcessUtility+0x37b pg_stat_statements.dll!pg_finfo_pg_stat_statements+0xde1 postgres.exe!ProcessUtility+0x34 postgres.exe!PortalRunFetch+0x891 postgres.exe!PortalRunFetch+0xbae postgres.exe!PortalRun+0x1a0 postgres.exe!get_stats_option_name+0x1335 postgres.exe!PostgresMain+0x635 postgres.exe!ShmemBackendArrayAllocation+0x2b3a postgres.exe!SubPostmasterMain+0x275 postgres.exe!main+0x480 postgres.exe!pgwin32_popen+0x144f KERNEL32.DLL!BaseThreadInitThunk+0x14 ntdll.dll!RtlUserThreadStart+0x21 Any ideas on changes in Postgres 10 that would cause this? We were previously running 9.6.7 and didn't encounter this issue on that version either. Thanks Ben. Ben Snaidero *Geotab* Senior Database Specialist Direct +1 (289) 230-7749 Toll-free +1 (877) 436-8221 Visit www.geotab.com Twitter <https://twitter.com/geotab> | Facebook <https://www.facebook.com/Geotab> | YouTube <https://www.youtube.com/user/MyGeotab> | LinkedIn <https://www.linkedin.com/company/geotab/>