Re: pg_dump (COPY) hanging intermittently

2019-07-04 Thread Ben Snaidero
On Thu, Jun 27, 2019 at 3:19 PM Tom Lane  wrote:

> Ben Snaidero  writes:
> > Do these stack traces shed help at all?
>
> None worth mentioning :-(.  Can you rebuild with debug symbols?
>
> regards, tom lane
>


So I've rebuilt with debug symbols and for some reason I am now unable to
reproduce the issue.  I've also gone back the original binaries and can
also no longer reproduce the issue.  I am starting to think this might be
network related (storage is network attached) as I haven't even rebooted
the server I have been using to test.

Thanks for all your help.  At least I learned how to build from the source
(with debug) so I gained something from working through this issue.


Re: pg_dump (COPY) hanging intermittently

2019-06-27 Thread Ben Snaidero
On Thu, Jun 27, 2019 at 2:34 PM Tom Lane  wrote:

> Ben Snaidero  writes:
> > I am running into a strange issue with Postgres 10 when using pg_dump
> with
> > the directory format and jobs parameter set it intermittently hangs.
> Seems
> > to occur less frequently the lower I set the jobs parameter but does
> happen
> > eventually even when set to 2 (could not reproduce when jobs=1).  I've
> > tested with Postgres 11 and no matter how high I set the jobs parameter
> it
> > never hangs (tested all the way up to jobs=8).  Also with Postgres 10 and
> > other formats I don't run into the issue.  Here are my server specs and
> > software versions (OS and Postgres)
>
> FWIW, it looks like the hang is on the client side (i.e., pg_dump itself).
> The one active server process is just trying to send COPY data and is
> waiting for the client to take it.  So I'm wondering about deadlocks
> inside parallel pg_dump.
>
> This doesn't present any obvious answer, because there is not much
> difference between the parallel pg_dump logic in 10 and 11.  But
> maybe you could get stack trace(s) out of the stuck pg_dump job on 10?
> (Keep in mind that unlike the backend, the pg_dump process will have
> multiple threads in this situation.)
>
> Another thing you could do to triangulate is to verify that v11 pg_dump
> is okay working against the v10 server.  Unfortunately the other
> combination can't be tested, but this'd be enough to exonerate the v10
> server side.
>
> regards, tom lane
>

Thanks for the suggestion Tom.  I've verified that this looks to be a v10
server side issue as using the Postgres11 pg_dump.exe against the v10
server I still encounter the same hanging issue.

Here are the stack traces from each of the threads of the pg_dump.exe.

ntoskrnl.exe!KiSwapContext+0x76
ntoskrnl.exe!KiSwapThread+0x17d
ntoskrnl.exe!KiCommitThreadWait+0x14f
ntoskrnl.exe!KeWaitForSingleObject+0x377
ntoskrnl.exe!KiSchedulerApc+0x231
ntoskrnl.exe!KiDeliverApc+0x22a
ntoskrnl.exe!KiSwapThread+0x364
ntoskrnl.exe!KiCommitThreadWait+0x14f
ntoskrnl.exe!KeWaitForSingleObject+0x377
ntoskrnl.exe!NtWaitForSingleObject+0xf8
ntoskrnl.exe!KiSystemServiceCopyEnd+0x13
ntdll.dll!NtWaitForSingleObject+0x14
mswsock.dll!SockWaitForSingleObject+0x133
mswsock.dll!WSPSelect+0x7e4
WS2_32.dll!select+0x1d3
pg_dump.exe+0x64ae
pg_dump.exe+0x65bd
pg_dump.exe+0x6228
pg_dump.exe+0x509f
pg_dump.exe+0x980b
pg_dump.exe+0x117cc
pg_dump.exe+0x6f3f
pg_dump.exe+0x30c22
pg_dump.exe+0x3cb77
KERNEL32.DLL!BaseThreadInitThunk+0x14
ntdll.dll!RtlUserThreadStart+0x21

--

ntoskrnl.exe!KiSwapContext+0x76
ntoskrnl.exe!KiSwapThread+0x17d
ntoskrnl.exe!KiCommitThreadWait+0x14f
ntoskrnl.exe!KeWaitForSingleObject+0x377
ntoskrnl.exe!KiSchedulerApc+0x231
ntoskrnl.exe!KiDeliverApc+0x22a
ntoskrnl.exe!KiSwapThread+0x364
ntoskrnl.exe!KiCommitThreadWait+0x14f
ntoskrnl.exe!KeWaitForSingleObject+0x377
ntoskrnl.exe!NtWaitForSingleObject+0xf8
ntoskrnl.exe!KiSystemServiceCopyEnd+0x13
ntdll.dll!NtWaitForSingleObject+0x14
mswsock.dll!SockWaitForSingleObject+0x133
mswsock.dll!WSPRecv+0x5d2
WS2_32.dll!recv+0x16c
pg_dump.exe+0x6746
pg_dump.exe+0x61ab
pg_dump.exe+0x5e57
pg_dump.exe+0x6a22
MSVCR120.dll!_callthreadstartex+0x17
MSVCR120.dll!_threadstartex+0x88
KERNEL32.DLL!BaseThreadInitThunk+0x14
ntdll.dll!RtlUserThreadStart+0x21

--

ntoskrnl.exe!KiSwapContext+0x76
ntoskrnl.exe!KiSwapThread+0x17d
ntoskrnl.exe!KiCommitThreadWait+0x14f
ntoskrnl.exe!KeWaitForSingleObject+0x377
ntoskrnl.exe!KiSchedulerApc+0x231
ntoskrnl.exe!KiDeliverApc+0x22a
ntoskrnl.exe!KiSwapThread+0x364
ntoskrnl.exe!KiCommitThreadWait+0x14f
ntoskrnl.exe!KeWaitForSingleObject+0x377
ntoskrnl.exe!NtWaitForSingleObject+0xf8
ntoskrnl.exe!KiSystemServiceCopyEnd+0x13
ntdll.dll!NtWaitForSingleObject+0x14
mswsock.dll!SockWaitForSingleObject+0x133
mswsock.dll!WSPRecv+0x5d2
WS2_32.dll!recv+0x16c
pg_dump.exe+0x6746
pg_dump.exe+0x61ab
pg_dump.exe+0x5e57
pg_dump.exe+0x6a22
MSVCR120.dll!_callthreadstartex+0x17
MSVCR120.dll!_threadstartex+0x88
KERNEL32.DLL!BaseThreadInitThunk+0x14
ntdll.dll!RtlUserThreadStart+0x21

--

ntoskrnl.exe!KiSwapContext+0x76
ntoskrnl.exe!KiSwapThread+0x17d
ntoskrnl.exe!KiCommitThreadWait+0x14f
ntoskrnl.exe!KeWaitForSingleObject+0x377
ntoskrnl.exe!KiSchedulerApc+0x231
ntoskrnl.exe!KiDeliverApc+0x22a
ntoskrnl.exe!KiSwapThread+0x364
ntoskrnl.exe!KiCommitThreadWait+0x14f
ntoskrnl.exe!KeWaitForSingleObject+0x377
ntoskrnl.exe!NtWaitForSingleObject+0xf8
ntoskrnl.exe!KiSystemServiceCopyEnd+0x13
ntdll.dll!NtWaitForSingleObject+0x14
mswsock.dll!SockWaitForSingleObject+0x133
mswsock.dll!WSPSelect+0x7e4
WS2_32.dll!select+0x1d3
LIBPQ.dll!PQenv2encoding+0x1301
LIBPQ.dll!PQenv2encoding+0x11a3

pg_dump (COPY) hanging intermittently

2019-06-27 Thread Ben Snaidero
06-27 14:37:55.067559+01 | 2019-06-27
14:37:54.448198+01 | ClientRead  | Client  | COPY public.series8
(i) TO stdout;
 3180 | idle in transaction | 2019-06-27 14:37:49.537396+01 | 2019-06-27
14:37:49.53608+01  | ClientRead  | Client  | SELECT at.attname,
(SELECT pg_catal
 5364 | active  | 2019-06-27 14:37:50.151002+01 | 2019-06-27
14:37:50.151001+01 | ClientWrite | Client  | COPY public.series2
(i) TO stdout;
(3 rows)

D:\>dir dumphangtest
 Volume in drive D has no label.
 Volume Serial Number is C635-CDEF

 Directory of D:\dumphangtest

06/27/2019  09:37 AM  .
06/27/2019  09:37 AM  ..
06/27/2019  09:37 AM 2,114,899 2863.dat.gz
06/27/2019  09:37 AM81,920 2864.dat.gz
06/27/2019  09:37 AM 2,114,899 2865.dat.gz
06/27/2019  09:37 AM 2,114,899 2866.dat.gz
06/27/2019  09:37 AM 2,114,899 2867.dat.gz
06/27/2019  09:37 AM 2,114,899 2868.dat.gz
06/27/2019  09:37 AM 2,114,899 2869.dat.gz
06/27/2019  09:37 AM 2,114,899 2870.dat.gz
06/27/2019  09:37 AM 8,452 toc.dat
   9 File(s) 14,894,665 bytes
   2 Dir(s)  102,056,497,152 bytes free

--

And here is the stack trace from the thread of the hung process.

ntoskrnl.exe!KiSwapContext+0x76
ntoskrnl.exe!KiSwapThread+0x17d
ntoskrnl.exe!KiCommitThreadWait+0x14f
ntoskrnl.exe!KeWaitForSingleObject+0x377
ntoskrnl.exe!KiSchedulerApc+0x231
ntoskrnl.exe!KiDeliverApc+0x22a
ntoskrnl.exe!KiSwapThread+0x364
ntoskrnl.exe!KiCommitThreadWait+0x14f
ntoskrnl.exe!KeWaitForMultipleObjects+0x1fe
ntoskrnl.exe!ObWaitForMultipleObjects+0x2c7
ntoskrnl.exe!NtWaitForMultipleObjects+0xf9
ntoskrnl.exe!KiSystemServiceCopyEnd+0x13
ntdll.dll!NtWaitForMultipleObjects+0x14
KERNELBASE.dll!WaitForMultipleObjectsEx+0xef
KERNELBASE.dll!WaitForMultipleObjects+0xe
postgres.exe!WaitLatchOrSocket+0x456
postgres.exe!WaitEventSetWait+0x8e
postgres.exe!secure_write+0xeb
postgres.exe!pq_putbytes+0x4e5
postgres.exe!pq_putbytes+0x41b
postgres.exe!pq_putbytes+0x256
postgres.exe!CreateCopyDestReceiver+0x3784
postgres.exe!CreateCopyDestReceiver+0x17b2
postgres.exe!CreateCopyDestReceiver+0x144c
postgres.exe!CreateCopyDestReceiver+0x1098
postgres.exe!DoCopy+0x50a
postgres.exe!standard_ProcessUtility+0x37b
pg_stat_statements.dll!pg_finfo_pg_stat_statements+0xde1
postgres.exe!ProcessUtility+0x34
postgres.exe!PortalRunFetch+0x891
postgres.exe!PortalRunFetch+0xbae
postgres.exe!PortalRun+0x1a0
postgres.exe!get_stats_option_name+0x1335
postgres.exe!PostgresMain+0x635
postgres.exe!ShmemBackendArrayAllocation+0x2b3a
postgres.exe!SubPostmasterMain+0x275
postgres.exe!main+0x480
postgres.exe!pgwin32_popen+0x144f
KERNEL32.DLL!BaseThreadInitThunk+0x14
ntdll.dll!RtlUserThreadStart+0x21

Any ideas on changes in Postgres 10 that would cause this?  We were
previously running 9.6.7 and didn't encounter this issue on that version
either.

Thanks
Ben.

Ben Snaidero
*Geotab*
Senior Database Specialist
Direct +1 (289) 230-7749
Toll-free +1 (877) 436-8221
Visit www.geotab.com
Twitter <https://twitter.com/geotab> | Facebook
<https://www.facebook.com/Geotab> | YouTube
<https://www.youtube.com/user/MyGeotab> | LinkedIn
<https://www.linkedin.com/company/geotab/>


Re: strange slow query performance

2019-01-17 Thread Ben Snaidero
On Thu, Jan 17, 2019 at 4:13 PM David G. Johnston <
david.g.johns...@gmail.com> wrote:

> On Thu, Jan 17, 2019 at 9:19 AM Ben Snaidero 
> wrote:
> > Any ideas as to why this is happening?
>
> Not really, I would expect roughly double execution time, not an
> exponential increase.  Still not experienced enough to diagnose with
> what has been provided but I will suggest you provide the version that
> you are running these queries against and confirm that the results are
> consistent across multiple runs to discount the possibility that
> locking or other transient work is involved.
>
> Curious if increasing work_mem helps but its mostly a trial-and-error
> thing for me (I wouldn't expect it to)...
>
> David J.
>

Query was tested multiple times with similar results.  I also tried
increasing work_mem with the same results as well.  Memory did not increase
or even come close to the 10MB setting.  Query does seem to occupy one full
cpu for the duration of the run though.  Query is running on windows with
Postgres 9.6.7


strange slow query performance

2019-01-17 Thread Ben Snaidero
The following query runs as expected.

explain analyze SELECT MainTable.gid AS MainTable_gid,MainTable.datetime AS
MainTable_datetime,MainTable.objectid AS MainTable_objectid,MainTable.type
AS MainTable_type FROM MainTable
   WHERE objectid = ANY(ARRAY(SELECT NE.objectid AS entityId FROM (
nodeobject NE INNER JOIN (   Node N1NE INNER JOIN Node N2NE   ON
((N1NE.iLeft >= N2NE.iLeft) AND (N1NE.iLeft <= N2NE.iRight)))   ON
NE.inodeid = N1NE.iId ) WHERE N2NE.iId = ANY(ARRAY[14438,10814])))
 AND objectid=3161;

   QUERY PLAN

-
 Index Scan using ix_MainTable_objectid_datetime on MainTable
(cost=3254.91..3264.39 rows=1 width=32) (actual time=33.094..33.094 rows=0
loops=1)
   Index Cond: ((objectid = ANY ($3)) AND (objectid = 3161))
   InitPlan 1 (returns $3)
 ->  Nested Loop  (cost=1.10..3254.75 rows=16296 width=4) (actual
time=0.403..26.147 rows=19042 loops=1)
   ->  Nested Loop  (cost=0.81..1784.15 rows=618 width=4) (actual
time=0.249..10.485 rows=2082 loops=1)
 ->  Index Scan using pk_node on node n2ne
(cost=0.41..16.80 rows=2 width=8) (actual time=0.103..0.233 rows=2 loops=1)
   Index Cond: (iid = ANY ('{14438,10814}'::integer[]))
 ->  Index Only Scan using ix_node on node n1ne
(cost=0.41..880.59 rows=309 width=8) (actual time=0.074..5.022 rows=1041
loops=2)
   Index Cond: ((ileft >= n2ne.ileft) AND (ileft <=
n2ne.iright))
   Heap Fetches: 30
   ->  Index Only Scan using ix_nodeobject_inodeid_objectid on
nodeobject ne  (cost=0.29..1.81 rows=57 width=8) (actual time=0.002..0.007
rows=9 loops=2082)
 Index Cond: (inodeid = n1ne.iid)
 Heap Fetches: 13973
 Planning time: 5.693 ms
 Execution time: 33.383 ms
(15 rows)

But when we add a second condition to the where clause it causes the query
performance to drop dramatically.  Values in "any(array" do not make a
difference.  In this example they are the same but even for different
values performance is still the poor.  Any ideas as to why this is
happening?

explain analyze SELECT MainTable.gid AS MainTable_gid,MainTable.datetime AS
MainTable_datetime,MainTable.objectid AS MainTable_objectid,MainTable.type
AS MainTable_type FROM MainTable
   WHERE objectid = ANY(ARRAY(SELECT NE.objectid AS entityId FROM (
nodeobject NE INNER JOIN (   Node N1NE INNER JOIN Node N2NE   ON
((N1NE.iLeft >= N2NE.iLeft) AND (N1NE.iLeft <= N2NE.iRight)))   ON
NE.inodeid = N1NE.iId ) WHERE N2NE.iId = ANY(ARRAY[14438,10814])))
 AND objectid = ANY(ARRAY(SELECT NE.objectid AS entityId FROM (
nodeobject NE INNER JOIN (   Node N1NE INNER JOIN Node N2NE   ON
((N1NE.iLeft >= N2NE.iLeft) AND (N1NE.iLeft <= N2NE.iRight)))   ON
NE.inodeid = N1NE.iId ) WHERE N2NE.iId = ANY(ARRAY[14438,10814]))) AND
objectid=3161;


QUERY PLAN


--
-
 Index Scan using ix_MainTable_objectid_datetime on MainTable
(cost=6509.66..6534.02 rows=1 width=32) (actual time=16442.004..16442.004
rows=0 loops=1)
   Index Cond: ((objectid = ANY ($3)) AND (objectid = ANY ($7)) AND
(objectid = 3161))
   InitPlan 1 (returns $3)
 ->  Nested Loop  (cost=1.10..3254.75 rows=16296 width=4) (actual
time=0.438..28.484 rows=19042 loops=1)
   ->  Nested Loop  (cost=0.81..1784.15 rows=618 width=4) (actual
time=0.172..7.141 rows=2082 loops=1)
 ->  Index Scan using pk_node on node n2ne
(cost=0.41..16.80 rows=2 width=8) (actual time=0.046..0.111 rows=2 loops=1)
   Index Cond: (iid = ANY ('{14438,10814}'::integer[]))
 ->  Index Only Scan using ix_node on node n1ne
(cost=0.41..880.59 rows=309 width=8) (actual time=0.073..3.438 rows=1041
loops=2)
   Index Cond: ((ileft >= n2ne.ileft) AND (ileft <=
n2ne.iright))
   Heap Fetches: 30
   ->  Index Only Scan using ix_nodeobject_inodeid_objectid on
nodeobject ne  (cost=0.29..1.81 rows=57 width=8) (actual time=0.003..0.010
rows=9 loops=2082)
 Index Cond: (inodeid = n1ne.iid)
 Heap Fetches: 13973
   InitPlan 2 (returns $7)
 ->  Nested Loop  (cost=1.10..3254.75 rows=16296 width=4) (actual
time=0.056..11.786 rows=19042 loops=1)
   ->  Nested Loop  (cost=0.81..1784.15 rows=618 width=4) (actual
time=0.034..1.343 rows=2082 loops=1)
 ->  Index Scan using pk_node on node n2ne_1
(cost=0.41..16.80 rows=2 width=8) (actual time=0.007..0.016 rows=2 loops=1)
   Index Cond: (iid = ANY ('{14438,10814}'::integer[]))
 ->  Index Only Scan using ix_node on node n1ne_1
(cost=0.41..880.59 rows=309 width=8) (actual time=0.012..0.581