On 18/12/2010 1:13 AM, Magnus Hagander wrote:
On Fri, Dec 17, 2010 at 17:42, Magnus Hagander<mag...@hagander.net>  wrote:
On Fri, Dec 17, 2010 at 17:24, Craig Ringer<cr...@postnewspapers.com.au>  wrote:
On 17/12/2010 7:17 PM, Magnus Hagander wrote:
Now, that's annoying. So clearly we can't use that function to
determine which version we're on. Seems it only works for "image help
api", and not the general thing.

According to http://msdn.microsoft.com/en-us/library/ms679294(v=vs.85).aspx,
we could look for:

SysEnumLines - if present, we have at least 6.1.

However, I don't see any function that appeared in 6.0 only..

Actually, I'm wrong - there are functions enough to determine the
version. So here's a patch that tries that.

Great. I pulled the latest from your git tree, tested that, and got much better results. Crashdump size is back to what I expected. In my test code, fcinfo->args and fcinfo->argnull can be examined without problems. Backtraces look good; see below. It seems to be including backend private memory again now. Thanks _very_ much for your work on this.

fcinfo->flinfo is still inaccessible, but I suspect it's in shared memory, as it's at 0x00000135 . Ditto fcinfo->resultinfo and fcinfo->context.

This has me wondering - is it going to be necessary to dump shared memory to make many backtraces useful? I just responded to Tom mentioning that the patch doesn't currently dump shared memory, but I hadn't realized the extent to which it's used for _lots_ more than just disk buffers. I'm not sure how to handle dumping shared_buffers when someone might be using multi-gigabyte shared_buffers, though. Dumping the whole lot would risk sudden out-of-disk-space issues, slowdowns as dumps are written, and the backend being "frozen" as it's being dumped could delay the system coming back up again. Trying to selectively dump critical parts could cause dumps to fail if the system is in early startup or a bad state.

The same concern applies to writing backend private memory; it's fine most of the time, but if you're doing data warehousing queries with 2GB of work_mem, it's going to be nasty having all that extra disk I/O and disk space use, not to mention the hold-up while the dump is written. If this is something we want to have people running in production "just in case" or to track down rare / hard to reproduce faults, that'll be a problem.

OTOH, we can't really go poking around in palloc contexts to decide what to dump.

I guess we could always do a small, minimalist minidump, then write _another_ dump that attempts to include select parts of shm and backend private memory.

I just thought of two other things, too:

- Is it possible for this handler to be called recursively if it fails during the handler call? If so, do we need to uninstall the handler before attempting a dump to avoid such recursion? I need to do some testing and dig around MSDN to find out more about this.

- Can asynchronous events like signals (or their win32 emulation) interrupt an executing crash handler, or are they blocked before the crash handler is called? If they're not blocked, do we need to try to block them before attempting a dump? Again, I need to do some reading on this.


Anyway, here's an example of the backtraces I'm currently getting. They're clearly missing some parameters (in shm? Unsure) but provide source file+line, argument values where resolvable, and the call stack its self. Locals are accessible at all levels of the stack when you go poking around in windbg.

This dump file has an exception of interest stored in it.
The stored exception information can be accessed via .ecxr.
(930.12e8): Access violation - code c0000005 (first/second chance not available)
eax=00bce2c0 ebx=72d0e800 ecx=000002e4 edx=72cb81c8 esi=000000f0 edi=00000930
eip=771464f4 esp=00bce294 ebp=00bce2a4 iopl=0         nv up ei pl zr na pe nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00000246
ntdll!KiFastSystemCallRet:
771464f4 c3              ret
0:000> .ecxr
eax=00000000 ebx=00000000 ecx=015fd7d8 edx=7362100f esi=015fd7c8 edi=015fd804
eip=73621052 esp=00bcf284 ebp=015fd7c8 iopl=0         nv up ei pl zr na pe nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00010246
crashme!crashdump_crashme+0x2:
73621052 c70001000000    mov     dword ptr [eax],1    ds:0023:00000000=????????
0:000> kp
  *** Stack trace for last set context - .thread/.cxr resets it
ChildEBP RetAddr
00bcf280 0031c797 crashme!crashdump_crashme(struct FunctionCallInfoData * 
fcinfo = 0x015e3318)+0x2 
[c:\users\craig\developer\postgres\contrib\crashme\crashme.c @ 14]
00bcf2e4 0031c804 postgres!ExecMakeFunctionResult(struct FuncExprState * fcache = 
0x015e3318, struct ExprContext * econtext = 0x00319410, char * isNull = 0x00000000 
"", ExprDoneCond * isDone = 0x7362100f)+0x427 
[c:\users\craig\developer\postgres\src\backend\executor\execqual.c @ 1824]
00bcf30c 0031b760 postgres!ExecEvalFunc(struct FuncExprState * fcache = 0x00000000, 
struct ExprContext * econtext = 0x00000000, char * isNull = 0x00000000 "", 
ExprDoneCond * isDone = 0x00000000)+0x34 
[c:\users\craig\developer\postgres\src\backend\executor\execqual.c @ 2260]
00bcf338 0031ba83 postgres!ExecTargetList(struct List * targetlist = 0x00000000, struct 
ExprContext * econtext = 0x00000000, unsigned int * values = 0x00000000, char * isnull = 
0x00000000 "", ExprDoneCond * itemIsDone = 0x00000000, ExprDoneCond * isDone = 
0x00000000)+0x70 [c:\users\craig\developer\postgres\src\backend\executor\execqual.c @ 
5095]
00bcf378 0032f074 postgres!ExecProject(struct ProjectionInfo * projInfo = 
0x00000000, ExprDoneCond * isDone = 0x00000000)+0x173 
[c:\users\craig\developer\postgres\src\backend\executor\execqual.c @ 5312]
00bcf38c 00317e07 postgres!ExecResult(struct ResultState * node = <Memory access 
error>)+0x94 [c:\users\craig\developer\postgres\src\backend\executor\noderesult.c 
@ 157]
00bcf39c 00315ccd postgres!ExecProcNode(struct PlanState * node = <Memory access 
error>)+0x67 
[c:\users\craig\developer\postgres\src\backend\executor\execprocnode.c @ 361]
00bcf3b0 00316ace postgres!ExecutePlan(struct EState * estate = 0x015fd7c8, struct PlanState * planstate = 
<Memory access error>, CmdType operation = <Memory access error>, char sendTuples = <Memory 
access error>, long numberTuples = <Memory access error>, ScanDirection direction = 
NoMovementScanDirection (0n0), struct _DestReceiver * dest = <Memory access error>)+0x2d 
[c:\users\craig\developer\postgres\src\backend\executor\execmain.c @ 1236]
00bcf3e0 0041ec5d postgres!standard_ExecutorRun(struct QueryDesc * queryDesc = <Memory access 
error>, ScanDirection direction = <Memory access error>, long count = <Memory access 
error>)+0x8e [c:\users\craig\developer\postgres\src\backend\executor\execmain.c @ 288]
00bcf404 0041f270 postgres!PortalRunSelect(struct PortalData * portal = 0x00000000, char forward 
= <Memory access error>, long count = <Memory access error>, struct _DestReceiver * 
dest = <Memory access error>)+0x6d 
[c:\users\craig\developer\postgres\src\backend\tcop\pquery.c @ 953]
00bcf48c 0041c292 postgres!PortalRun(struct PortalData * portal = 0x015fb5b8, long count 
= 0n2147483647, char isTopLevel = 0n1 '', struct _DestReceiver * dest = 0x015e3418, 
struct _DestReceiver * altdest = 0x015e3418, char * completionTag = 0x00bcf500 
"")+0x190 [c:\users\craig\developer\postgres\src\backend\tcop\pquery.c @ 803]
00bcf540 0041cbc5 postgres!exec_simple_query(char * query_string = 0x015fd7d8 
"???")+0x3a2 [c:\users\craig\developer\postgres\src\backend\tcop\postgres.c @ 
1067]
00bcf5c4 003e2bdc postgres!PostgresMain(int argc = 0n2, char ** argv = 0x01555138, char * 
username = 0x00d484a0 "Craig")+0x575 
[c:\users\craig\developer\postgres\src\backend\tcop\postgres.c @ 3935]
00bcf5e4 003e58a9 postgres!BackendRun(struct Port * port = 0x00000000)+0x19c 
[c:\users\craig\developer\postgres\src\backend\postmaster\postmaster.c @ 3562]
00bcf788 003475bc postgres!SubPostmasterMain(int argc = 0n13900471, char ** 
argv = 0x00d41ac5)+0x2f9 
[c:\users\craig\developer\postgres\src\backend\postmaster\postmaster.c @ 4058]
00bcf7a0 0051845d postgres!main(int argc = 0n1990922644, char ** argv = 
0x7ffdf000)+0x1ec [c:\users\craig\developer\postgres\src\backend\main\main.c @ 
173]
00bcf7e4 76ab1194 postgres!__tmainCRTStartup(void)+0x10f 
[f:\dd\vctools\crt_bld\self_x86\crt\src\crtexe.c @ 586]
00bcf7f0 7715b495 kernel32!BaseThreadInitThunk+0xe
00bcf830 7715b468 ntdll!__RtlUserThreadStart+0x70
00bcf848 00000000 ntdll!_RtlUserThreadStart+0x1b






--
Craig Ringer

Tech-related writing at http://soapyfrogs.blogspot.com/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to