Re: pg_dump (COPY) hanging intermittently
On Thu, Jun 27, 2019 at 3:19 PM Tom Lane wrote: > Ben Snaidero writes: > > Do these stack traces shed help at all? > > None worth mentioning :-(. Can you rebuild with debug symbols? > > regards, tom lane > So I've rebuilt with debug symbols and for some reason I am now unable to reproduce the issue. I've also gone back the original binaries and can also no longer reproduce the issue. I am starting to think this might be network related (storage is network attached) as I haven't even rebooted the server I have been using to test. Thanks for all your help. At least I learned how to build from the source (with debug) so I gained something from working through this issue.
Re: pg_dump (COPY) hanging intermittently
On Thu, Jun 27, 2019 at 2:34 PM Tom Lane wrote: > Ben Snaidero writes: > > I am running into a strange issue with Postgres 10 when using pg_dump > with > > the directory format and jobs parameter set it intermittently hangs. > Seems > > to occur less frequently the lower I set the jobs parameter but does > happen > > eventually even when set to 2 (could not reproduce when jobs=1). I've > > tested with Postgres 11 and no matter how high I set the jobs parameter > it > > never hangs (tested all the way up to jobs=8). Also with Postgres 10 and > > other formats I don't run into the issue. Here are my server specs and > > software versions (OS and Postgres) > > FWIW, it looks like the hang is on the client side (i.e., pg_dump itself). > The one active server process is just trying to send COPY data and is > waiting for the client to take it. So I'm wondering about deadlocks > inside parallel pg_dump. > > This doesn't present any obvious answer, because there is not much > difference between the parallel pg_dump logic in 10 and 11. But > maybe you could get stack trace(s) out of the stuck pg_dump job on 10? > (Keep in mind that unlike the backend, the pg_dump process will have > multiple threads in this situation.) > > Another thing you could do to triangulate is to verify that v11 pg_dump > is okay working against the v10 server. Unfortunately the other > combination can't be tested, but this'd be enough to exonerate the v10 > server side. > > regards, tom lane > Thanks for the suggestion Tom. I've verified that this looks to be a v10 server side issue as using the Postgres11 pg_dump.exe against the v10 server I still encounter the same hanging issue. Here are the stack traces from each of the threads of the pg_dump.exe. ntoskrnl.exe!KiSwapContext+0x76 ntoskrnl.exe!KiSwapThread+0x17d ntoskrnl.exe!KiCommitThreadWait+0x14f ntoskrnl.exe!KeWaitForSingleObject+0x377 ntoskrnl.exe!KiSchedulerApc+0x231 ntoskrnl.exe!KiDeliverApc+0x22a ntoskrnl.exe!KiSwapThread+0x364 ntoskrnl.exe!KiCommitThreadWait+0x14f ntoskrnl.exe!KeWaitForSingleObject+0x377 ntoskrnl.exe!NtWaitForSingleObject+0xf8 ntoskrnl.exe!KiSystemServiceCopyEnd+0x13 ntdll.dll!NtWaitForSingleObject+0x14 mswsock.dll!SockWaitForSingleObject+0x133 mswsock.dll!WSPSelect+0x7e4 WS2_32.dll!select+0x1d3 pg_dump.exe+0x64ae pg_dump.exe+0x65bd pg_dump.exe+0x6228 pg_dump.exe+0x509f pg_dump.exe+0x980b pg_dump.exe+0x117cc pg_dump.exe+0x6f3f pg_dump.exe+0x30c22 pg_dump.exe+0x3cb77 KERNEL32.DLL!BaseThreadInitThunk+0x14 ntdll.dll!RtlUserThreadStart+0x21 -- ntoskrnl.exe!KiSwapContext+0x76 ntoskrnl.exe!KiSwapThread+0x17d ntoskrnl.exe!KiCommitThreadWait+0x14f ntoskrnl.exe!KeWaitForSingleObject+0x377 ntoskrnl.exe!KiSchedulerApc+0x231 ntoskrnl.exe!KiDeliverApc+0x22a ntoskrnl.exe!KiSwapThread+0x364 ntoskrnl.exe!KiCommitThreadWait+0x14f ntoskrnl.exe!KeWaitForSingleObject+0x377 ntoskrnl.exe!NtWaitForSingleObject+0xf8 ntoskrnl.exe!KiSystemServiceCopyEnd+0x13 ntdll.dll!NtWaitForSingleObject+0x14 mswsock.dll!SockWaitForSingleObject+0x133 mswsock.dll!WSPRecv+0x5d2 WS2_32.dll!recv+0x16c pg_dump.exe+0x6746 pg_dump.exe+0x61ab pg_dump.exe+0x5e57 pg_dump.exe+0x6a22 MSVCR120.dll!_callthreadstartex+0x17 MSVCR120.dll!_threadstartex+0x88 KERNEL32.DLL!BaseThreadInitThunk+0x14 ntdll.dll!RtlUserThreadStart+0x21 -- ntoskrnl.exe!KiSwapContext+0x76 ntoskrnl.exe!KiSwapThread+0x17d ntoskrnl.exe!KiCommitThreadWait+0x14f ntoskrnl.exe!KeWaitForSingleObject+0x377 ntoskrnl.exe!KiSchedulerApc+0x231 ntoskrnl.exe!KiDeliverApc+0x22a ntoskrnl.exe!KiSwapThread+0x364 ntoskrnl.exe!KiCommitThreadWait+0x14f ntoskrnl.exe!KeWaitForSingleObject+0x377 ntoskrnl.exe!NtWaitForSingleObject+0xf8 ntoskrnl.exe!KiSystemServiceCopyEnd+0x13 ntdll.dll!NtWaitForSingleObject+0x14 mswsock.dll!SockWaitForSingleObject+0x133 mswsock.dll!WSPRecv+0x5d2 WS2_32.dll!recv+0x16c pg_dump.exe+0x6746 pg_dump.exe+0x61ab pg_dump.exe+0x5e57 pg_dump.exe+0x6a22 MSVCR120.dll!_callthreadstartex+0x17 MSVCR120.dll!_threadstartex+0x88 KERNEL32.DLL!BaseThreadInitThunk+0x14 ntdll.dll!RtlUserThreadStart+0x21 -- ntoskrnl.exe!KiSwapContext+0x76 ntoskrnl.exe!KiSwapThread+0x17d ntoskrnl.exe!KiCommitThreadWait+0x14f ntoskrnl.exe!KeWaitForSingleObject+0x377 ntoskrnl.exe!KiSchedulerApc+0x231 ntoskrnl.exe!KiDeliverApc+0x22a ntoskrnl.exe!KiSwapThread+0x364 ntoskrnl.exe!KiCommitThreadWait+0x14f ntoskrnl.exe!KeWaitForSingleObject+0x377 ntoskrnl.exe!NtWaitForSingleObject+0xf8 ntoskrnl.exe!KiSystemServiceCopyEnd+0x13 ntdll.dll!NtWaitForSingleObject+0x14 mswsock.dll!SockWaitForSingleObject+0x133 mswsock.dll!WSPSelect+0x7e4 WS2_32.dll!select+0x1d3 LIBPQ.dll!PQenv2encoding+0x1301 LIBPQ.dll!PQenv2encoding+0x11a3
pg_dump (COPY) hanging intermittently
06-27 14:37:55.067559+01 | 2019-06-27 14:37:54.448198+01 | ClientRead | Client | COPY public.series8 (i) TO stdout; 3180 | idle in transaction | 2019-06-27 14:37:49.537396+01 | 2019-06-27 14:37:49.53608+01 | ClientRead | Client | SELECT at.attname, (SELECT pg_catal 5364 | active | 2019-06-27 14:37:50.151002+01 | 2019-06-27 14:37:50.151001+01 | ClientWrite | Client | COPY public.series2 (i) TO stdout; (3 rows) D:\>dir dumphangtest Volume in drive D has no label. Volume Serial Number is C635-CDEF Directory of D:\dumphangtest 06/27/2019 09:37 AM . 06/27/2019 09:37 AM .. 06/27/2019 09:37 AM 2,114,899 2863.dat.gz 06/27/2019 09:37 AM81,920 2864.dat.gz 06/27/2019 09:37 AM 2,114,899 2865.dat.gz 06/27/2019 09:37 AM 2,114,899 2866.dat.gz 06/27/2019 09:37 AM 2,114,899 2867.dat.gz 06/27/2019 09:37 AM 2,114,899 2868.dat.gz 06/27/2019 09:37 AM 2,114,899 2869.dat.gz 06/27/2019 09:37 AM 2,114,899 2870.dat.gz 06/27/2019 09:37 AM 8,452 toc.dat 9 File(s) 14,894,665 bytes 2 Dir(s) 102,056,497,152 bytes free -- And here is the stack trace from the thread of the hung process. ntoskrnl.exe!KiSwapContext+0x76 ntoskrnl.exe!KiSwapThread+0x17d ntoskrnl.exe!KiCommitThreadWait+0x14f ntoskrnl.exe!KeWaitForSingleObject+0x377 ntoskrnl.exe!KiSchedulerApc+0x231 ntoskrnl.exe!KiDeliverApc+0x22a ntoskrnl.exe!KiSwapThread+0x364 ntoskrnl.exe!KiCommitThreadWait+0x14f ntoskrnl.exe!KeWaitForMultipleObjects+0x1fe ntoskrnl.exe!ObWaitForMultipleObjects+0x2c7 ntoskrnl.exe!NtWaitForMultipleObjects+0xf9 ntoskrnl.exe!KiSystemServiceCopyEnd+0x13 ntdll.dll!NtWaitForMultipleObjects+0x14 KERNELBASE.dll!WaitForMultipleObjectsEx+0xef KERNELBASE.dll!WaitForMultipleObjects+0xe postgres.exe!WaitLatchOrSocket+0x456 postgres.exe!WaitEventSetWait+0x8e postgres.exe!secure_write+0xeb postgres.exe!pq_putbytes+0x4e5 postgres.exe!pq_putbytes+0x41b postgres.exe!pq_putbytes+0x256 postgres.exe!CreateCopyDestReceiver+0x3784 postgres.exe!CreateCopyDestReceiver+0x17b2 postgres.exe!CreateCopyDestReceiver+0x144c postgres.exe!CreateCopyDestReceiver+0x1098 postgres.exe!DoCopy+0x50a postgres.exe!standard_ProcessUtility+0x37b pg_stat_statements.dll!pg_finfo_pg_stat_statements+0xde1 postgres.exe!ProcessUtility+0x34 postgres.exe!PortalRunFetch+0x891 postgres.exe!PortalRunFetch+0xbae postgres.exe!PortalRun+0x1a0 postgres.exe!get_stats_option_name+0x1335 postgres.exe!PostgresMain+0x635 postgres.exe!ShmemBackendArrayAllocation+0x2b3a postgres.exe!SubPostmasterMain+0x275 postgres.exe!main+0x480 postgres.exe!pgwin32_popen+0x144f KERNEL32.DLL!BaseThreadInitThunk+0x14 ntdll.dll!RtlUserThreadStart+0x21 Any ideas on changes in Postgres 10 that would cause this? We were previously running 9.6.7 and didn't encounter this issue on that version either. Thanks Ben. Ben Snaidero *Geotab* Senior Database Specialist Direct +1 (289) 230-7749 Toll-free +1 (877) 436-8221 Visit www.geotab.com Twitter <https://twitter.com/geotab> | Facebook <https://www.facebook.com/Geotab> | YouTube <https://www.youtube.com/user/MyGeotab> | LinkedIn <https://www.linkedin.com/company/geotab/>
Re: strange slow query performance
On Thu, Jan 17, 2019 at 4:13 PM David G. Johnston < david.g.johns...@gmail.com> wrote: > On Thu, Jan 17, 2019 at 9:19 AM Ben Snaidero > wrote: > > Any ideas as to why this is happening? > > Not really, I would expect roughly double execution time, not an > exponential increase. Still not experienced enough to diagnose with > what has been provided but I will suggest you provide the version that > you are running these queries against and confirm that the results are > consistent across multiple runs to discount the possibility that > locking or other transient work is involved. > > Curious if increasing work_mem helps but its mostly a trial-and-error > thing for me (I wouldn't expect it to)... > > David J. > Query was tested multiple times with similar results. I also tried increasing work_mem with the same results as well. Memory did not increase or even come close to the 10MB setting. Query does seem to occupy one full cpu for the duration of the run though. Query is running on windows with Postgres 9.6.7
strange slow query performance
The following query runs as expected. explain analyze SELECT MainTable.gid AS MainTable_gid,MainTable.datetime AS MainTable_datetime,MainTable.objectid AS MainTable_objectid,MainTable.type AS MainTable_type FROM MainTable WHERE objectid = ANY(ARRAY(SELECT NE.objectid AS entityId FROM ( nodeobject NE INNER JOIN ( Node N1NE INNER JOIN Node N2NE ON ((N1NE.iLeft >= N2NE.iLeft) AND (N1NE.iLeft <= N2NE.iRight))) ON NE.inodeid = N1NE.iId ) WHERE N2NE.iId = ANY(ARRAY[14438,10814]))) AND objectid=3161; QUERY PLAN - Index Scan using ix_MainTable_objectid_datetime on MainTable (cost=3254.91..3264.39 rows=1 width=32) (actual time=33.094..33.094 rows=0 loops=1) Index Cond: ((objectid = ANY ($3)) AND (objectid = 3161)) InitPlan 1 (returns $3) -> Nested Loop (cost=1.10..3254.75 rows=16296 width=4) (actual time=0.403..26.147 rows=19042 loops=1) -> Nested Loop (cost=0.81..1784.15 rows=618 width=4) (actual time=0.249..10.485 rows=2082 loops=1) -> Index Scan using pk_node on node n2ne (cost=0.41..16.80 rows=2 width=8) (actual time=0.103..0.233 rows=2 loops=1) Index Cond: (iid = ANY ('{14438,10814}'::integer[])) -> Index Only Scan using ix_node on node n1ne (cost=0.41..880.59 rows=309 width=8) (actual time=0.074..5.022 rows=1041 loops=2) Index Cond: ((ileft >= n2ne.ileft) AND (ileft <= n2ne.iright)) Heap Fetches: 30 -> Index Only Scan using ix_nodeobject_inodeid_objectid on nodeobject ne (cost=0.29..1.81 rows=57 width=8) (actual time=0.002..0.007 rows=9 loops=2082) Index Cond: (inodeid = n1ne.iid) Heap Fetches: 13973 Planning time: 5.693 ms Execution time: 33.383 ms (15 rows) But when we add a second condition to the where clause it causes the query performance to drop dramatically. Values in "any(array" do not make a difference. In this example they are the same but even for different values performance is still the poor. Any ideas as to why this is happening? explain analyze SELECT MainTable.gid AS MainTable_gid,MainTable.datetime AS MainTable_datetime,MainTable.objectid AS MainTable_objectid,MainTable.type AS MainTable_type FROM MainTable WHERE objectid = ANY(ARRAY(SELECT NE.objectid AS entityId FROM ( nodeobject NE INNER JOIN ( Node N1NE INNER JOIN Node N2NE ON ((N1NE.iLeft >= N2NE.iLeft) AND (N1NE.iLeft <= N2NE.iRight))) ON NE.inodeid = N1NE.iId ) WHERE N2NE.iId = ANY(ARRAY[14438,10814]))) AND objectid = ANY(ARRAY(SELECT NE.objectid AS entityId FROM ( nodeobject NE INNER JOIN ( Node N1NE INNER JOIN Node N2NE ON ((N1NE.iLeft >= N2NE.iLeft) AND (N1NE.iLeft <= N2NE.iRight))) ON NE.inodeid = N1NE.iId ) WHERE N2NE.iId = ANY(ARRAY[14438,10814]))) AND objectid=3161; QUERY PLAN -- - Index Scan using ix_MainTable_objectid_datetime on MainTable (cost=6509.66..6534.02 rows=1 width=32) (actual time=16442.004..16442.004 rows=0 loops=1) Index Cond: ((objectid = ANY ($3)) AND (objectid = ANY ($7)) AND (objectid = 3161)) InitPlan 1 (returns $3) -> Nested Loop (cost=1.10..3254.75 rows=16296 width=4) (actual time=0.438..28.484 rows=19042 loops=1) -> Nested Loop (cost=0.81..1784.15 rows=618 width=4) (actual time=0.172..7.141 rows=2082 loops=1) -> Index Scan using pk_node on node n2ne (cost=0.41..16.80 rows=2 width=8) (actual time=0.046..0.111 rows=2 loops=1) Index Cond: (iid = ANY ('{14438,10814}'::integer[])) -> Index Only Scan using ix_node on node n1ne (cost=0.41..880.59 rows=309 width=8) (actual time=0.073..3.438 rows=1041 loops=2) Index Cond: ((ileft >= n2ne.ileft) AND (ileft <= n2ne.iright)) Heap Fetches: 30 -> Index Only Scan using ix_nodeobject_inodeid_objectid on nodeobject ne (cost=0.29..1.81 rows=57 width=8) (actual time=0.003..0.010 rows=9 loops=2082) Index Cond: (inodeid = n1ne.iid) Heap Fetches: 13973 InitPlan 2 (returns $7) -> Nested Loop (cost=1.10..3254.75 rows=16296 width=4) (actual time=0.056..11.786 rows=19042 loops=1) -> Nested Loop (cost=0.81..1784.15 rows=618 width=4) (actual time=0.034..1.343 rows=2082 loops=1) -> Index Scan using pk_node on node n2ne_1 (cost=0.41..16.80 rows=2 width=8) (actual time=0.007..0.016 rows=2 loops=1) Index Cond: (iid = ANY ('{14438,10814}'::integer[])) -> Index Only Scan using ix_node on node n1ne_1 (cost=0.41..880.59 rows=309 width=8) (actual time=0.012..0.581