Hi Craig, I've debugged the runaway process, though I'm not sure of the
solution yet.

My best interpretation is that an SSL client dirty disconnected while
running a request. This caused an infinite loop in pq_recvbuf(), calling
secure_read(), triggering my_sock_read() over and over. Calling
SSL_get_error() in secure_read() returns 10045 (either connection reset, or
WSAEOPNOTSUPP, I'm not sure) - after this, pq_recvbuf() appears to think
errno=EINTR has occurred, so it immediately tries again.

I'd appreciate any further interpretation and advice on this, how to
diagnose it further, or whether it's been addressed in a later version. Hope
you appreciate I've erred on the side of too much detail in the following
analysis.

As a side note, I had the postgres.exe process in debug mode for 2 hours,
and experienced no server downtime at all, which is good to know.

Regards,
-Brendan

=========================================
Main call stack:
=========================================

>       postgres.exe!pq_recvbuf()  Line 746 + 0x1a bytes
        postgres.exe!pq_getbyte()  Line 795 + 0x5 bytes
        postgres.exe!SocketBackend(StringInfoData * inBuf=0x00000000)  Line
317 + 0x5 bytes
        postgres.exe!PostgresMain(int argc=4, char * * argv=0x0155db48,
const char * username=0x01064e20)  Line 3595 + 0x23 bytes
        postgres.exe!BackendRun(Port * port=0x00bdfd40)  Line 3208
        postgres.exe!SubPostmasterMain(int argc=3, char * * argv=0x01063f28)
Line 3674 + 0x8 bytes
        postgres.exe!main(int argc=3, char * * argv=0x01063f28)  Line 165 +
0x7 bytes
        postgres.exe!__tmainCRTStartup()  Line 597 + 0x17 bytes
        kernel32.dll!_baseprocessst...@4()  + 0x23 bytes

Relevant variables:

PqRecvLength: 0

PqRecvPointer: 0

PqRecvBuffer: Q...,SELECT COUNT(*) FROM.   (SELECT tgargs from pg_trigger
tr.      LEFT JOIN p-g_depend dep ON dep.objid=tr.oid AND deptype = 'i'.
LEFT JOIN pg_constraint co ON refobjid = co.oid AND contype = 'f'.     WHERE
tgisconstraint AND co.oid IS NULL.     GROUP BY tgargs.    HAVING count(1) =
3) AS foo.M pg_namespace nsp.  LEFT OUTER JOIN pg_description des ON
des.objoid=nsp.oid. WHERE NOT ((nspname = 'pg_catalog' and (SELECT count(*)
FROM pg_class WHERE relname = 'pg_class' AND relnamespace = nsp.oid) > 0)
OR.(nspname = 'pgagent' and (SELECT count(*) FROM pg_class WHERE relname =
'pga_job' AND relnamespace = nsp.oid) > 0) OR.(nspname =
'information_schema' and (SELECT count(*) FROM pg_class WHERE relname =
'tables' AND relnamespace = nsp.oid) >0x00789E10  20 30 29 20 4f 52 0a 28 6e
73 70 6e 61 6d 65 20   0) OR.(nspname = 'dbo' and (SELECT count(*) FROM
pg_class WHERE relname = 'systables' AND relnamespace = nsp.oid) > 0)
OR.(nspname = 'sys' and (SELECT count(*) FROM pg_class WHERE relname =
'all_tables' AND relnamespace = nsp.oid) > 0)). ORDER BY 1, nspname

MyProcPort:     0x00bdfd40
        sock    2044
        proto   196608
        laddr   {addr={...} salen=16 }
        raddr   {addr={...} salen=16 }
        remote_host     0x0106acf0 "<removed>"
        remote_port     0x010694a0 "3307"
        canAcceptConnections    CAC_OK
        database_name   0x01064df8 "<removed>"
        user_name       0x01064e20 "<removed>"
        cmdline_options 0x00000000 <Bad Ptr>
        guc_options     0x00000000 {type=??? length=??? head=??? ...}
        auth_method     uaMD5   
        auth_arg        0x00000000 <Bad Ptr>
        md5Salt 0x00bdfe7c "<removed>"
        cryptSalt       0x00bdfe80 "UA"
        SessionStartTime        303975637670000
        default_keepalives_idle 0
        default_keepalives_interval     0
        default_keepalives_count        0
        keepalives_idle 0
        keepalives_interval     0
        keepalives_count        0
        gss     0x010694e0 {outbuf={...} cred=0x00000000 ctx=0x00000000 ...}
        ssl     0x003fbb78 {version=769 type=8192 method=0x1002b298 ...}
        peer    0x00000000 {cert_info=??? sig_alg=??? signature=??? ...}
        peer_dn 0x00bdfeb4 "(anonymous)"
        peer_cn 0x00bdff35 "(anonymous)"
        count   5287    unsigned long

pq_recvbuf() calls "secure_read(MyProcPort, PqRecvBuffer + PqRecvLength,
PQ_BUFFER_SIZE - PqRecvLength);", which returns -1. I've inferred from
control flow that errno=EINTR (can't debug the value unfortunately), so it
continues the loop.


=========================================
Within secure_read()
=========================================

SSL_get_error() returns 10054 (connection reset or WSAEOPNOTSUPP, I'm not
sure)

The switch() command jumps to SSL_ERROR_SYSCALL, then immediately jumps to
SSL_ERROR_NONE !?, and port->count NOT incremented - this was strange,
suggesting a source code mismatch or debugger problem perhaps.

secure_read() returns -1, popping back to pq_recvbuf()


=========================================
Within my_sock_read()
=========================================

secure_read() causes a call to my_sock_read(), which has a separate stack
trace (waiting on single object, I assume):

>       postgres.exe!my_sock_read(bio_st * h=0x003fbcf0, char *
buf=0x0165bc98, int size=5)  Line 443   C
        libeay32.dll!0026a78f()         
        [Frames below may be incorrect and/or missing, no symbols loaded for
libeay32.dll]   
        mswsock.dll!_wspeventsel...@16()  + 0x92 bytes  
        ffffffff()      

Within prepare_for_client_read(): DoingCommandRead = 1

Within EnableNotifyInterrupt(): IsTransactionOrTransactionBlock() = 0,
notifyInterruptOccurred = 0

Within EnableCatchupInterrupt(): catchupInterruptOccurred = 0

After "res = std_sock_read(h, buf, size);" (can't debug result,
unfortunately), (buf, size, h) hadn't change at all

Within client_read_ended(): DoingCommandRead = 1

Exiting my_sock_read() and wandering through some assembly, it finds itself
back into socket_read(), then back in pg_recvbuf(), and the loop continues.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Reply via email to