Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)
On tor, 2010-12-16 at 15:16 +0100, Magnus Hagander wrote: Found another problem in it: when running with an older version of dbghelp.dll (which I was), it simply didn't work. We need to grab the version of dbghelp.dll at runtime and pick which things we're going to dump based on that. The attached version of the patch does that. Can you please test that it still generates the full dump on your system, which supposedly has a much more modern version of dbghelp.dll than mine? I was going through the GetLastError() calls to unify the printf formats, as discussed, and I stumbled across this: + write_stderr(could not write crash dump to %s: error code %08x\n, +dumpPath, GetLastError()); Is there a reason why this uses that particular format, unlike all other call sites? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)
On 11/07/2011 4:23 AM, Peter Eisentraut wrote: I was going through the GetLastError() calls to unify the printf formats, as discussed, and I stumbled across this: + write_stderr(could not write crash dump to %s: error code %08x\n, +dumpPath, GetLastError()); Is there a reason why this uses that particular format, unlike all other call sites? Other call site usages include: errmsg_internal(could not create signal event: %d, (int) GetLastError()) write_stderr(could not create signal listener pipe: error code %d; retrying\n, (int) GetLastError()) etc, so I presume you're referring to the use of %08x as a format specifier rather than %d ? The only reason is that ntstatus errors are typically reported this way - but, really, not all possible errors arising there would be ntstatus errors. As it's not consistent with the rest of Pg I don't see any reason not to change it to %d. -- Craig Ringer POST Newspapers 276 Onslow Rd, Shenton Park Ph: 08 9381 3088 Fax: 08 9388 2258 ABN: 50 008 917 717 http://www.postnewspapers.com.au/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)
On Sun, Dec 19, 2010 at 07:26, Craig Ringer cr...@postnewspapers.com.au wrote: On 18/12/2010 1:13 AM, Magnus Hagander wrote: On Fri, Dec 17, 2010 at 17:42, Magnus Hagandermag...@hagander.net wrote: On Fri, Dec 17, 2010 at 17:24, Craig Ringercr...@postnewspapers.com.au wrote: On 17/12/2010 7:17 PM, Magnus Hagander wrote: Now, that's annoying. So clearly we can't use that function to determine which version we're on. Seems it only works for image help api, and not the general thing. According to http://msdn.microsoft.com/en-us/library/ms679294(v=vs.85).aspx, we could look for: SysEnumLines - if present, we have at least 6.1. However, I don't see any function that appeared in 6.0 only.. Actually, I'm wrong - there are functions enough to determine the version. So here's a patch that tries that. Great. I pulled the latest from your git tree, tested that, and got much better results. Crashdump size is back to what I expected. In my test code, fcinfo-args and fcinfo-argnull can be examined without problems. Backtraces look good; see below. It seems to be including backend private memory again now. Thanks _very_ much for your work on this. Ok, great. I think that leaves us at least complete enough to commit - we can always improve things further as we get some more real world testing. fcinfo-flinfo is still inaccessible, but I suspect it's in shared memory, as it's at 0x0135 . Ditto fcinfo-resultinfo and fcinfo-context. Hmm. Not sure why those would be in shared memory, that seems strange. This has me wondering - is it going to be necessary to dump shared memory to make many backtraces useful? I just responded to Tom mentioning that the patch doesn't currently dump shared memory, but I hadn't realized the extent to which it's used for _lots_ more than just disk buffers. I'm not sure how to handle dumping shared_buffers when someone might be using multi-gigabyte shared_buffers, though. Dumping the whole lot would risk sudden out-of-disk-space issues, slowdowns as dumps are written, and the backend being frozen as it's being dumped could delay the system coming back up again. Trying to selectively dump critical parts could cause dumps to fail if the system is in early startup or a bad state. I don'tt hink a slowdown issue during dump being written is really a problem - it's not like this is something that happens normally. In fact, we're going to restart all the other backends after the crash anyway, and that's going to be a lot more disruptive. Filling up the disk, however, is a much bigger issue.. As is having a huge dump to mail around (for error reports), if it's not actually necessary... The same concern applies to writing backend private memory; it's fine most of the time, but if you're doing data warehousing queries with 2GB of work_mem, it's going to be nasty having all that extra disk I/O and disk space use, not to mention the hold-up while the dump is written. If this is something we want to have people running in production just in case or to track down rare / hard to reproduce faults, that'll be a problem. Potential problem, yes, but not a very big one I believe. Yes, if that very big backend crashes there will be a big dump - but how else are you going to find out what went wrong? In the shared memory case, *all* dumps will be huge, not just those that happen to use a lot of local memory. OTOH, we can't really go poking around in palloc contexts to decide what to dump. No, that's way more complex stuff than we dare execute in a crash handler. I guess we could always do a small, minimalist minidump, then write _another_ dump that attempts to include select parts of shm and backend private memory. I just thought of two other things, too: - Is it possible for this handler to be called recursively if it fails during the handler call? If so, do we need to uninstall the handler before attempting a dump to avoid such recursion? I need to do some testing and dig around MSDN to find out more about this. I'm pretty sure I read somewhere that it can't happen and we don't need to do that, but I can't find any reference to it atm :-( - Can asynchronous events like signals (or their win32 emulation) interrupt an executing crash handler, or are they blocked before the crash handler is called? If they're not blocked, do we need to try to block them before attempting a dump? Again, I need to do some reading on this. The exception handler itself is global and will run on the crashing thread. I'm fairly certain it stops the other threads while running, but again I can't find a reference to that right now. But ISTM it would have to. Anyway, here's an example of the backtraces I'm currently getting. They're clearly missing some parameters (in shm? Unsure) but provide source file+line, argument values where resolvable, and the call stack its self. Locals are accessible at all levels of the stack when you go poking around in
Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)
On Sun, Dec 19, 2010 at 12:51, Magnus Hagander mag...@hagander.net wrote: On Sun, Dec 19, 2010 at 07:26, Craig Ringer cr...@postnewspapers.com.au wrote: On 18/12/2010 1:13 AM, Magnus Hagander wrote: On Fri, Dec 17, 2010 at 17:42, Magnus Hagandermag...@hagander.net wrote: On Fri, Dec 17, 2010 at 17:24, Craig Ringercr...@postnewspapers.com.au wrote: On 17/12/2010 7:17 PM, Magnus Hagander wrote: Now, that's annoying. So clearly we can't use that function to determine which version we're on. Seems it only works for image help api, and not the general thing. According to http://msdn.microsoft.com/en-us/library/ms679294(v=vs.85).aspx, we could look for: SysEnumLines - if present, we have at least 6.1. However, I don't see any function that appeared in 6.0 only.. Actually, I'm wrong - there are functions enough to determine the version. So here's a patch that tries that. Great. I pulled the latest from your git tree, tested that, and got much better results. Crashdump size is back to what I expected. In my test code, fcinfo-args and fcinfo-argnull can be examined without problems. Backtraces look good; see below. It seems to be including backend private memory again now. Thanks _very_ much for your work on this. Ok, great. I think that leaves us at least complete enough to commit - we can always improve things further as we get some more real world testing. Actually, looking through it again just before commit, I can't help but think the whole code about loading the DLL from different places is unnecessary. The windows DLL search order *always* has the directory of the EXE first, followed by either system dirs or CWD depending on version. Any reason not to just call LoadLibrary once? That makes the patch look like attached. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ diff --git a/doc/src/sgml/installation.sgml b/doc/src/sgml/installation.sgml index 0eeaa25..a480a8d 100644 --- a/doc/src/sgml/installation.sgml +++ b/doc/src/sgml/installation.sgml @@ -2734,6 +2734,22 @@ cc-1020 cc: ERROR File = pqcomm.c, Line = 427 under commandCMD.EXE/command, as the MSYS console has buffering issues. /para + + sect3 +titleCollecting crash dumps on Windows/title + +para + If PostgreSQL on Windows crashes, it has the ability to generate + productnameminidumps/ that can be used to track down the cause + for the crash, similar to core dumps on Unix. These dumps can be + read using the productnameWindows Debugger Tools/ or using + productnameVisual Studio/. To enable the generation of dumps + on Windows, create a subdirectory named filenamecrashdumps/filename + inside the cluster data directory. The dumps will then be written + into this directory with a unique name based on the identifier of + the crashing process and the current time of the crash. +/para + /sect3 /sect2 sect2 id=installation-notes-sco diff --git a/src/backend/main/main.c b/src/backend/main/main.c index 6cb70f2..5a56c25 100644 --- a/src/backend/main/main.c +++ b/src/backend/main/main.c @@ -79,6 +79,14 @@ main(int argc, char *argv[]) argv = save_ps_display_args(argc, argv); /* + * If supported on the current platform, set up a handler to be called if + * the backend/postmaster crashes with a fatal signal or exception. + */ +#ifdef WIN32 + pgwin32_install_crashdump_handler(); +#endif + + /* * Set up locale information from environment. Note that LC_CTYPE and * LC_COLLATE will be overridden later from pg_control if we are in an * already-initialized database. We set them here so that they will be diff --git a/src/backend/port/win32/Makefile b/src/backend/port/win32/Makefile index 8bf9f74..d00c334 100644 --- a/src/backend/port/win32/Makefile +++ b/src/backend/port/win32/Makefile @@ -12,6 +12,6 @@ subdir = src/backend/port/win32 top_builddir = ../../../.. include $(top_builddir)/src/Makefile.global -OBJS = timer.o socket.o signal.o security.o mingwcompat.o +OBJS = timer.o socket.o signal.o security.o mingwcompat.o crashdump.o include $(top_srcdir)/src/backend/common.mk diff --git a/src/backend/port/win32/crashdump.c b/src/backend/port/win32/crashdump.c new file mode 100644 index 000..ba6fca7 --- /dev/null +++ b/src/backend/port/win32/crashdump.c @@ -0,0 +1,174 @@ +/*- + * + * win32_crashdump.c + * Automatic crash dump creation for PostgreSQL on Windows + * + * The crashdump feature traps unhandled win32 exceptions produced by the + * backend, and tries to produce a Windows MiniDump crash + * dump for later debugging and analysis. The machine performing the dump + * doesn't need any special debugging tools; the user only needs to send + * the dump to somebody who has the same version of PostgreSQL and has debugging + * tools. + * + * crashdump module originally by Craig Ringer ring...@ringerc.id.au + * + *
Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)
On 19/12/2010 7:51 PM, Magnus Hagander wrote: Great. I pulled the latest from your git tree, tested that, and got much better results. Crashdump size is back to what I expected. In my test code, fcinfo-args and fcinfo-argnull can be examined without problems. Backtraces look good; see below. It seems to be including backend private memory again now. Thanks _very_ much for your work on this. Ok, great. I think that leaves us at least complete enough to commit - we can always improve things further as we get some more real world testing. fcinfo-flinfo is still inaccessible, but I suspect it's in shared memory, as it's at 0x0135 . Ditto fcinfo-resultinfo and fcinfo-context. Hmm. Not sure why those would be in shared memory, that seems strange. OK, I'll have to do some more digging on that, then. I'm getting on a plane in about 2 hours, but will be bringing Visual Studio snapshots, a postgres git tree, an XP vm, etc with me, so time permitting I should be able to keep on with this. Anyway, here's an example of the backtraces I'm currently getting. They're clearly missing some parameters (in shm? Unsure) but provide source file+line, argument values where resolvable, and the call stack its self. Locals are accessible at all levels of the stack when you go poking around in windbg. Yeah, they're still very useful. Is that a release or debug build? That's a release build. I'm intentionally testing with release builds because I want something that's useful on real-world end-user systems, and I'm aware that few Windows users will build Pg for themselves. (That said, the Perl-based build scripts are some of the least painful build tools I've ever worked with on Windows. Only CMake can rival them for low-pain Windows compilation.) -- Craig Ringer Tech-related writing at http://soapyfrogs.blogspot.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)
On 19/12/2010 8:05 PM, Magnus Hagander wrote: Actually, looking through it again just before commit, I can't help but think the whole code about loading the DLL from different places is unnecessary. The windows DLL search order *always* has the directory of the EXE first, followed by either system dirs or CWD depending on version. Any reason not to just call LoadLibrary once? Good point. The program directory was added to the DLL search path with Windows XP. On any Windows version that Pg targets you can trust Windows to load a DLL from the app dir before system32. We don't really care about win2k/nt4 or 9x, where this is necessary. It's not a security concern for the reasons already outlined in the comments in the patch - in brief, because Pg keeps the cwd inside the datadir, and if someone can write malicious files in there you have big problems already. If it was a security concern our fallback would be an attempt to load %WINDIR%\system32\dbghelp.dll by full path, rather than using an unqualified name to search the path. So: I agree. We should just let Windows find dbghelp.dll on the normal search path; building an explicit path isn't necessary. -- Craig Ringer Tech-related writing at http://soapyfrogs.blogspot.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)
On Sun, Dec 19, 2010 at 13:57, Craig Ringer cr...@postnewspapers.com.au wrote: On 19/12/2010 7:51 PM, Magnus Hagander wrote: Great. I pulled the latest from your git tree, tested that, and got much better results. Crashdump size is back to what I expected. In my test code, fcinfo-args and fcinfo-argnull can be examined without problems. Backtraces look good; see below. It seems to be including backend private memory again now. Thanks _very_ much for your work on this. Ok, great. I think that leaves us at least complete enough to commit - we can always improve things further as we get some more real world testing. fcinfo-flinfo is still inaccessible, but I suspect it's in shared memory, as it's at 0x0135 . Ditto fcinfo-resultinfo and fcinfo-context. Hmm. Not sure why those would be in shared memory, that seems strange. OK, I'll have to do some more digging on that, then. I'm getting on a plane in about 2 hours, but will be bringing Visual Studio snapshots, a postgres git tree, an XP vm, etc with me, so time permitting I should be able to keep on with this. Ok. I still think what we have now is a great improvement over nothing. But improvements on top of it is obviously always good :-) Anyway, here's an example of the backtraces I'm currently getting. They're clearly missing some parameters (in shm? Unsure) but provide source file+line, argument values where resolvable, and the call stack its self. Locals are accessible at all levels of the stack when you go poking around in windbg. Yeah, they're still very useful. Is that a release or debug build? That's a release build. I'm intentionally testing with release builds because I want something that's useful on real-world end-user systems, and I'm aware that few Windows users will build Pg for themselves. Good - I just wanted to be sure that's what you were testing as well. (That said, the Perl-based build scripts are some of the least painful build tools I've ever worked with on Windows. Only CMake can rival them for low-pain Windows compilation.) :-) Thanks! They're not quite that painless to maintain, but it's not *too* bad. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)
On Sun, Dec 19, 2010 at 14:06, Craig Ringer cr...@postnewspapers.com.au wrote: On 19/12/2010 8:05 PM, Magnus Hagander wrote: Actually, looking through it again just before commit, I can't help but think the whole code about loading the DLL from different places is unnecessary. The windows DLL search order *always* has the directory of the EXE first, followed by either system dirs or CWD depending on version. Any reason not to just call LoadLibrary once? Good point. The program directory was added to the DLL search path with Windows XP. On any Windows version that Pg targets you can trust Windows to load a DLL from the app dir before system32. We don't really care about win2k/nt4 or 9x, where this is necessary. It's not a security concern for the reasons already outlined in the comments in the patch - in brief, because Pg keeps the cwd inside the datadir, and if someone can write malicious files in there you have big problems already. If it was a security concern our fallback would be an attempt to load %WINDIR%\system32\dbghelp.dll by full path, rather than using an unqualified name to search the path. So: I agree. We should just let Windows find dbghelp.dll on the normal search path; building an explicit path isn't necessary. I've committed the version that does this. Thanks! -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Magnus Hagander mag...@hagander.net writes: On Sun, Dec 19, 2010 at 07:26, Craig Ringer cr...@postnewspapers.com.au wrote: fcinfo-flinfo is still inaccessible, but I suspect it's in shared memory, as it's at 0x0135 . Ditto fcinfo-resultinfo and fcinfo-context. Hmm. Not sure why those would be in shared memory, that seems strange. FWIW, I don't believe that theory either. Offhand I cannot think of any case where an FmgrInfo would be in shared memory. It seems more likely from the above that you've misidentified where the fcinfo record is. There are some code paths where we don't bother to set up those fields, but I think we're always careful to set them to NULL if so. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Craig Ringer cr...@postnewspapers.com.au writes: On 17/12/2010 11:17 PM, Tom Lane wrote: There is absolutely nothing more frustrating than having a crash dump that hasn't got the information you need. Please turn it on by default. There have to be limits to that, though. dbghelp.dll can dump shared memory, too - but it's not going to be practical to write or work with many-gigabyte dumps most of the time, and neither my initial patch nor Magnus's recent reworking of it ask windbg.dll to include shared memory in the dump. Well, actually my comment above is based directly on pain in working with core dumps on Unix-oid systems that don't include shared memory. Linux *does* include shared memory, and I have not noticed that that was a problem. Of course, in development installations I don't try to raise shared_buffers to the moon ... It's possible to tell dbghelp.dll to include specific additional memory ranges, so it may well be worth including specific critical areas within the shared memory block while omitting the bulk of the buffers. Doing that safely in an unsafe we're-crashing environment might not be simple, though, especially if you're not even sure whether you got far into startup before crashing. I agree that having the crash dump code know anything specific about the contents of shared memory is a nonstarter --- far too fragile. But perhaps we could have some simple rule based only on what the kernel knows about the shmem block, like dump shmem if it's no more than 1GB. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)
On Sun, Dec 19, 2010 at 11:07 AM, Tom Lane t...@sss.pgh.pa.us wrote: I agree that having the crash dump code know anything specific about the contents of shared memory is a nonstarter --- far too fragile. But perhaps we could have some simple rule based only on what the kernel knows about the shmem block, like dump shmem if it's no more than 1GB. Not sure what knobs we have available, but would there be any value in trying to dump the FIRST 1GB? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)
On Sun, Dec 19, 2010 at 17:28, Robert Haas robertmh...@gmail.com wrote: On Sun, Dec 19, 2010 at 11:07 AM, Tom Lane t...@sss.pgh.pa.us wrote: I agree that having the crash dump code know anything specific about the contents of shared memory is a nonstarter --- far too fragile. But perhaps we could have some simple rule based only on what the kernel knows about the shmem block, like dump shmem if it's no more than 1GB. Not sure what knobs we have available, but would there be any value in trying to dump the FIRST 1GB? So such knob that I can find. Basically, we pick a combination of the flags at http://msdn.microsoft.com/en-us/library/ms680519(v=VS.85).aspx We could send it as a user stream, i guess, but it won't be available in the debugger at the same address space then - just as a blob of data to process manually. I doubt it's worth it in that case... -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Robert Haas robertmh...@gmail.com writes: On Sun, Dec 19, 2010 at 11:07 AM, Tom Lane t...@sss.pgh.pa.us wrote: I agree that having the crash dump code know anything specific about the contents of shared memory is a nonstarter --- far too fragile. But perhaps we could have some simple rule based only on what the kernel knows about the shmem block, like dump shmem if it's no more than 1GB. Not sure what knobs we have available, but would there be any value in trying to dump the FIRST 1GB? Probably not. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Magnus Hagander mag...@hagander.net writes: On Sun, Dec 19, 2010 at 17:28, Robert Haas robertmh...@gmail.com wrote: On Sun, Dec 19, 2010 at 11:07 AM, Tom Lane t...@sss.pgh.pa.us wrote: perhaps we could have some simple rule based only on what the kernel knows about the shmem block, like dump shmem if it's no more than 1GB. Not sure what knobs we have available, but would there be any value in trying to dump the FIRST 1GB? So such knob that I can find. Basically, we pick a combination of the flags at http://msdn.microsoft.com/en-us/library/ms680519(v=VS.85).aspx Yeah, I thought I was probably asking for something that couldn't practically be supported. Let's try it as-is for a few releases and see how it goes. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)
On 17/12/2010 11:17 PM, Tom Lane wrote: So I'd be happy to have it enabled - given that we want a default. There is absolutely nothing more frustrating than having a crash dump that hasn't got the information you need. Please turn it on by default. There have to be limits to that, though. dbghelp.dll can dump shared memory, too - but it's not going to be practical to write or work with many-gigabyte dumps most of the time, and neither my initial patch nor Magnus's recent reworking of it ask windbg.dll to include shared memory in the dump. It's possible to tell dbghelp.dll to include specific additional memory ranges, so it may well be worth including specific critical areas within the shared memory block while omitting the bulk of the buffers. Doing that safely in an unsafe we're-crashing environment might not be simple, though, especially if you're not even sure whether you got far into startup before crashing. -- Craig Ringer Tech-related writing at http://soapyfrogs.blogspot.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)
On 18/12/2010 1:13 AM, Magnus Hagander wrote: On Fri, Dec 17, 2010 at 17:42, Magnus Hagandermag...@hagander.net wrote: On Fri, Dec 17, 2010 at 17:24, Craig Ringercr...@postnewspapers.com.au wrote: On 17/12/2010 7:17 PM, Magnus Hagander wrote: Now, that's annoying. So clearly we can't use that function to determine which version we're on. Seems it only works for image help api, and not the general thing. According to http://msdn.microsoft.com/en-us/library/ms679294(v=vs.85).aspx, we could look for: SysEnumLines - if present, we have at least 6.1. However, I don't see any function that appeared in 6.0 only.. Actually, I'm wrong - there are functions enough to determine the version. So here's a patch that tries that. Great. I pulled the latest from your git tree, tested that, and got much better results. Crashdump size is back to what I expected. In my test code, fcinfo-args and fcinfo-argnull can be examined without problems. Backtraces look good; see below. It seems to be including backend private memory again now. Thanks _very_ much for your work on this. fcinfo-flinfo is still inaccessible, but I suspect it's in shared memory, as it's at 0x0135 . Ditto fcinfo-resultinfo and fcinfo-context. This has me wondering - is it going to be necessary to dump shared memory to make many backtraces useful? I just responded to Tom mentioning that the patch doesn't currently dump shared memory, but I hadn't realized the extent to which it's used for _lots_ more than just disk buffers. I'm not sure how to handle dumping shared_buffers when someone might be using multi-gigabyte shared_buffers, though. Dumping the whole lot would risk sudden out-of-disk-space issues, slowdowns as dumps are written, and the backend being frozen as it's being dumped could delay the system coming back up again. Trying to selectively dump critical parts could cause dumps to fail if the system is in early startup or a bad state. The same concern applies to writing backend private memory; it's fine most of the time, but if you're doing data warehousing queries with 2GB of work_mem, it's going to be nasty having all that extra disk I/O and disk space use, not to mention the hold-up while the dump is written. If this is something we want to have people running in production just in case or to track down rare / hard to reproduce faults, that'll be a problem. OTOH, we can't really go poking around in palloc contexts to decide what to dump. I guess we could always do a small, minimalist minidump, then write _another_ dump that attempts to include select parts of shm and backend private memory. I just thought of two other things, too: - Is it possible for this handler to be called recursively if it fails during the handler call? If so, do we need to uninstall the handler before attempting a dump to avoid such recursion? I need to do some testing and dig around MSDN to find out more about this. - Can asynchronous events like signals (or their win32 emulation) interrupt an executing crash handler, or are they blocked before the crash handler is called? If they're not blocked, do we need to try to block them before attempting a dump? Again, I need to do some reading on this. Anyway, here's an example of the backtraces I'm currently getting. They're clearly missing some parameters (in shm? Unsure) but provide source file+line, argument values where resolvable, and the call stack its self. Locals are accessible at all levels of the stack when you go poking around in windbg. This dump file has an exception of interest stored in it. The stored exception information can be accessed via .ecxr. (930.12e8): Access violation - code c005 (first/second chance not available) eax=00bce2c0 ebx=72d0e800 ecx=02e4 edx=72cb81c8 esi=00f0 edi=0930 eip=771464f4 esp=00bce294 ebp=00bce2a4 iopl=0 nv up ei pl zr na pe nc cs=001b ss=0023 ds=0023 es=0023 fs=003b gs= efl=0246 ntdll!KiFastSystemCallRet: 771464f4 c3 ret 0:000 .ecxr eax= ebx= ecx=015fd7d8 edx=7362100f esi=015fd7c8 edi=015fd804 eip=73621052 esp=00bcf284 ebp=015fd7c8 iopl=0 nv up ei pl zr na pe nc cs=001b ss=0023 ds=0023 es=0023 fs=003b gs= efl=00010246 crashme!crashdump_crashme+0x2: 73621052 c7000100mov dword ptr [eax],1ds:0023:= 0:000 kp *** Stack trace for last set context - .thread/.cxr resets it ChildEBP RetAddr 00bcf280 0031c797 crashme!crashdump_crashme(struct FunctionCallInfoData * fcinfo = 0x015e3318)+0x2 [c:\users\craig\developer\postgres\contrib\crashme\crashme.c @ 14] 00bcf2e4 0031c804 postgres!ExecMakeFunctionResult(struct FuncExprState * fcache = 0x015e3318, struct ExprContext * econtext = 0x00319410, char * isNull = 0x , ExprDoneCond * isDone = 0x7362100f)+0x427 [c:\users\craig\developer\postgres\src\backend\executor\execqual.c @ 1824] 00bcf30c 0031b760
Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)
On 17/12/2010 3:46 PM, Magnus Hagander wrote: On Dec 17, 2010 8:02 AM, Craig Ringer cr...@postnewspapers.com.au mailto:cr...@postnewspapers.com.au wrote: On 16/12/10 21:01, Magnus Hagander wrote: Found another problem in it: when running with an older version of dbghelp.dll (which I was), it simply didn't work. We need to grab the version of dbghelp.dll at runtime and pick which things we're going to dump based on that. I was about to suggest dropping the fallback loading of the system dbghelp.dll, and just bundle a suitable version with Pg, then I saw how big the DLL is. It's 800kb (!!) of code that'll hopefully never get used on any given system. With a footprint like that, bundling it seems rather less attractive. I think Magnus is right: test the dbghelp.dll version and fall back to supported features - as far back as XP, anyway; who cares about anything earlier than that. An updated version can always be put in place by the user if the built-in one can't produce enough detail. Did you get a chance to test that it still produced a full dump on your system? I've just tested that and it looks ok so far. The dumps produced are smaller - 1.3MB instead of ~4MB for a dump produced by calling a simple crashme() function from SQL, the source for which follows at the end of this post. However, I'm getting reasonable stack traces, as shown by the windb.exe output below. That said, the dump appears to exclude the backend's private memory. I can't resolve *fcinfo from PG_FUNCTION_ARGS; it looks like the direct values of arguments are available as they're on the stack, as are function local variables, but the process heap isn't dumped so you can't see the contents of any pointers-to-struct. That'll make it harder to track down many kinds of error - but, on the other hand, means that dumps of processes with huge work_mem etc won't be massively bloated. That might be a safer starting point than including the private process memory, really. I'd like to offer a flag to turn on dumping of private process memory but I'm not sure how best to go about it, given that we'd want to avoid accessing GUCs etc if possible. later, I think; this is better for now. I'm happy with the patch as it stands, with the one exception that consideration of mingw is required before it can be committed. Symbol search path is: C:\Users\Craig\Developer\postgres\Release\postgres;SRV*c:\localsymbols*http://msdl.microsoft.com/download/symbols;C:\Program Files\PostgreSQL\9.0\symbols Executable search path is: Windows 7 Version 7600 MP (2 procs) Free x86 compatible Product: WinNt, suite: SingleUserTS Machine Name: Debug session time: Fri Dec 17 17:37:19.000 2010 (UTC + 8:00) System Uptime: not available Process Uptime: 0 days 0:01:09.000 This dump file has an exception of interest stored in it. The stored exception information can be accessed via .ecxr. (16b4.cd8): Access violation - code c005 (first/second chance not available) eax=02e8 ebx=03210aa8 ecx=0055e700 edx=03210a68 esi=03210a68 edi=0055ea14 eip=771464f4 esp=0055e6d4 ebp=0055e6e4 iopl=0 nv up ei pl zr na pe nc cs=001b ss=0023 ds=0023 es=0023 fs=003b gs= efl=0246 ntdll!KiFastSystemCallRet: 771464f4 c3 ret 0:000 .ecxr eax= ebx= ecx=0103fa88 edx=7001100f esi=0103fa78 edi=0103fab4 eip=70011052 esp=0055f62c ebp=0103fa78 iopl=0 nv up ei pl zr na pe nc cs=001b ss=0023 ds=0023 es=0023 fs=003b gs= efl=00010246 crashme!crashdump_crashme+0x2: 70011052 c7000100mov dword ptr [eax],1ds:0023:= 0:000 ~ . 0 Id: 16b4.cd8 Suspend: 0 Teb: 7ffdf000 Unfrozen 1 Id: 16b4.1798 Suspend: 0 Teb: 7ffde000 Unfrozen 2 Id: 16b4.6f4 Suspend: 0 Teb: 7ffdd000 Unfrozen 0:000 k *** Stack trace for last set context - .thread/.cxr resets it ChildEBP RetAddr 0055f628 0142c797 crashme!crashdump_crashme+0x2 [c:\users\craig\developer\postgres\contrib\crashme\crashme.c @ 14] 0055f68c 0142c804 postgres!ExecMakeFunctionResult+0x427 [c:\users\craig\developer\postgres\src\backend\executor\execqual.c @ 1824] 0055f6b4 0142b760 postgres!ExecEvalFunc+0x34 [c:\users\craig\developer\postgres\src\backend\executor\execqual.c @ 2260] 0055f6e0 0142ba83 postgres!ExecTargetList+0x70 [c:\users\craig\developer\postgres\src\backend\executor\execqual.c @ 5095] 0055f720 0143f074 postgres!ExecProject+0x173 [c:\users\craig\developer\postgres\src\backend\executor\execqual.c @ 5312] 0055f734 01427e07 postgres!ExecResult+0x94 [c:\users\craig\developer\postgres\src\backend\executor\noderesult.c @ 157] 0055f744 01425ccd postgres!ExecProcNode+0x67 [c:\users\craig\developer\postgres\src\backend\executor\execprocnode.c @ 361] 0055f758 01426ace postgres!ExecutePlan+0x2d [c:\users\craig\developer\postgres\src\backend\executor\execmain.c @ 1236] 0055f788 0152ec7d postgres!standard_ExecutorRun+0x8e
Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)
On Fri, Dec 17, 2010 at 12:08, Craig Ringer cr...@postnewspapers.com.au wrote: On 17/12/2010 3:46 PM, Magnus Hagander wrote: On Dec 17, 2010 8:02 AM, Craig Ringer cr...@postnewspapers.com.au mailto:cr...@postnewspapers.com.au wrote: On 16/12/10 21:01, Magnus Hagander wrote: Found another problem in it: when running with an older version of dbghelp.dll (which I was), it simply didn't work. We need to grab the version of dbghelp.dll at runtime and pick which things we're going to dump based on that. I was about to suggest dropping the fallback loading of the system dbghelp.dll, and just bundle a suitable version with Pg, then I saw how big the DLL is. It's 800kb (!!) of code that'll hopefully never get used on any given system. With a footprint like that, bundling it seems rather less attractive. I think Magnus is right: test the dbghelp.dll version and fall back to supported features - as far back as XP, anyway; who cares about anything earlier than that. An updated version can always be put in place by the user if the built-in one can't produce enough detail. Did you get a chance to test that it still produced a full dump on your system? I've just tested that and it looks ok so far. The dumps produced are smaller - 1.3MB instead of ~4MB for a dump produced by calling a simple crashme() function from SQL, the source for which follows at the end of this post. However, I'm getting reasonable stack traces, as shown by the windb.exe output below. That said, the dump appears to exclude the backend's private memory. I can't resolve *fcinfo from PG_FUNCTION_ARGS; it looks like the direct values of arguments are available as they're on the stack, as are function local variables, but the process heap isn't dumped so you can't see the contents of any pointers-to-struct. That'll make it harder to track down many kinds of error - but, on the other hand, means that dumps of processes with huge work_mem etc won't be massively bloated. What version of dbghelp do you have? And what does the API report for it? The code is supposed to add the private memory if it finds version 6 or higher, perhaps that check is incorrect? That might be a safer starting point than including the private process memory, really. I'd like to offer a flag to turn on dumping of private process memory but I'm not sure how best to go about it, given that we'd want to avoid accessing GUCs etc if possible. later, I think; this is better for now. I'm not too happy with having a bunch of switches for it. People likely to tweak those can just install the debugger tools and use those, I think. So I'd be happy to have it enabled - given that we want a default. However, what I'm most interested in right now is finding out why it's not being included anymore in the new patch, because it's supposed to be.. I'm happy with the patch as it stands, with the one exception that consideration of mingw is required before it can be committed. What do you mean? That it has to be tested on mingw specifically, or something else? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Magnus Hagander mag...@hagander.net writes: On Fri, Dec 17, 2010 at 12:08, Craig Ringer cr...@postnewspapers.com.au wrote: That might be a safer starting point than including the private process memory, really. I'm not too happy with having a bunch of switches for it. People likely to tweak those can just install the debugger tools and use those, I think. Yeah. So I'd be happy to have it enabled - given that we want a default. There is absolutely nothing more frustrating than having a crash dump that hasn't got the information you need. Please turn it on by default. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)
On 17/12/2010 7:17 PM, Magnus Hagander wrote: What version of dbghelp do you have? 6.1.7600.16385 by default, as shipped in Windows 7 32-bit, and what I was testing with. 6.12.0002.633 is what came with my copy of Debugging Tools for windows, from the windows SDK. The same version comes with Visual Studio 10 Express. 6.9.0003.113 is what came with Visual Studio 9 Express. I've now re-tested with the latest, 6.12.0002.633, and have the same result. And what does the API report for it? This: version = (*pApiVersion)(); write_stderr(version: %hu.%hu.%hu\n, version-MajorVersion, version-MinorVersion, version-Revision); outputs version: 4.0.5 when loading dbghelp.dll 6.12.0002.633 from c:\postgres91\bin\dbghelp.dll as verified by a write_stderr() just before LoadLibrary and a test verifying the LoadLibrary worked. Shouldn't dbghelp.dll just ignore any flags it doesn't understand? I'm surprised we can't just pass flags a particular version doesn't know about and have it deal with them. Still, I guess it's not wise to assume such things. The code is supposed to add the private memory if it finds version 6 or higher, perhaps that check is incorrect? It begins to look like the version check might be lying. I'll dig further in a bit; it's 12:20am now and I need to sleep. I'm going on holiday in about 48 hours, and will be away from Windows (phew!) unless I get a VM set up during that time. I'll see what I can do. I'm happy with the patch as it stands, with the one exception that consideration of mingw is required before it can be committed. What do you mean? That it has to be tested on mingw specifically, or something else? I'm not even sure it'll compile or link with MinGW, and if it does, I doubt it'll be useful. It should only be compiled and called for native VC++ builds. I'm not sure if backend\port\win32 is built for all win32, or only for VC++. Testing for defined(_MSC_VER) would do the trick. I'm not sure testing defined(WIN32_ONLY_COMPILER) is quite right, since I doubt dbghelp.dll would be much use for a Pg compiled with Borland or the like either if that were ever supported. -- Craig Ringer Tech-related writing at http://soapyfrogs.blogspot.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)
On Fri, Dec 17, 2010 at 17:24, Craig Ringer cr...@postnewspapers.com.au wrote: On 17/12/2010 7:17 PM, Magnus Hagander wrote: What version of dbghelp do you have? 6.1.7600.16385 by default, as shipped in Windows 7 32-bit, and what I was testing with. 6.12.0002.633 is what came with my copy of Debugging Tools for windows, from the windows SDK. The same version comes with Visual Studio 10 Express. 6.9.0003.113 is what came with Visual Studio 9 Express. I've now re-tested with the latest, 6.12.0002.633, and have the same result. And what does the API report for it? This: version = (*pApiVersion)(); write_stderr(version: %hu.%hu.%hu\n, version-MajorVersion, version-MinorVersion, version-Revision); outputs version: 4.0.5 when loading dbghelp.dll 6.12.0002.633 from c:\postgres91\bin\dbghelp.dll as verified by a write_stderr() just before LoadLibrary and a test verifying the LoadLibrary worked. Now, that's annoying. So clearly we can't use that function to determine which version we're on. Seems it only works for image help api, and not the general thing. According to http://msdn.microsoft.com/en-us/library/ms679294(v=vs.85).aspx, we could look for: SysEnumLines - if present, we have at least 6.1. However, I don't see any function that appeared in 6.0 only.. We're probably better off looking at the VERSIONINFO structure. Which has a not-so-nice API, but at least it's documented :) Shouldn't dbghelp.dll just ignore any flags it doesn't understand? I'm surprised we can't just pass flags a particular version doesn't know about and have it deal with them. Still, I guess it's not wise to assume such things. You'd think so, but if I include all the flags and run it with the default one on Windows 2003, it comes back with the infamous Parameter Is Incorrect error. The code is supposed to add the private memory if it finds version 6 or higher, perhaps that check is incorrect? It begins to look like the version check might be lying. I'll dig further in a bit; it's 12:20am now and I need to sleep. I'm going on holiday in about 48 hours, and will be away from Windows (phew!) unless I get a VM set up during that time. I'll see what I can do. I'll try to put together a version that uses the VERSIONINFO to determine it, for you to test. I'm happy with the patch as it stands, with the one exception that consideration of mingw is required before it can be committed. What do you mean? That it has to be tested on mingw specifically, or something else? I'm not even sure it'll compile or link with MinGW, and if it does, I doubt it'll be useful. It should only be compiled and called for native VC++ builds. I'm pretty sure it could be useful with mingw as well. not *as* useful, but still useful. You'd get proper stack traces and such for anything in msvcrt, and the actual reason for the exception etc. I'm not sure if backend\port\win32 is built for all win32, or only for VC++. It's for all win32. Testing for defined(_MSC_VER) would do the trick. I'm not sure testing defined(WIN32_ONLY_COMPILER) is quite right, since I doubt dbghelp.dll would be much use for a Pg compiled with Borland or the like either if that were ever supported. We don't support building the backend with borland - only libpq and psql. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)
On Fri, Dec 17, 2010 at 17:42, Magnus Hagander mag...@hagander.net wrote: On Fri, Dec 17, 2010 at 17:24, Craig Ringer cr...@postnewspapers.com.au wrote: On 17/12/2010 7:17 PM, Magnus Hagander wrote: Now, that's annoying. So clearly we can't use that function to determine which version we're on. Seems it only works for image help api, and not the general thing. According to http://msdn.microsoft.com/en-us/library/ms679294(v=vs.85).aspx, we could look for: SysEnumLines - if present, we have at least 6.1. However, I don't see any function that appeared in 6.0 only.. Actually, I'm wrong - there are functions enough to determine the version. So here's a patch that tries that. (BTW, the document is actually incorrect - partially because the version that's shipped with Windows 2003 which is 5.2, is more or less undocumented) Let me know how that one works for you, and if it doesn't, whether it does actually detect any of those higher versions by searching for the functions. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ diff --git a/doc/src/sgml/installation.sgml b/doc/src/sgml/installation.sgml index 0eeaa25..a480a8d 100644 --- a/doc/src/sgml/installation.sgml +++ b/doc/src/sgml/installation.sgml @@ -2734,6 +2734,22 @@ cc-1020 cc: ERROR File = pqcomm.c, Line = 427 under commandCMD.EXE/command, as the MSYS console has buffering issues. /para + + sect3 +titleCollecting crash dumps on Windows/title + +para + If PostgreSQL on Windows crashes, it has the ability to generate + productnameminidumps/ that can be used to track down the cause + for the crash, similar to core dumps on Unix. These dumps can be + read using the productnameWindows Debugger Tools/ or using + productnameVisual Studio/. To enable the generation of dumps + on Windows, create a subdirectory named filenamecrashdumps/filename + inside the cluster data directory. The dumps will then be written + into this directory with a unique name based on the identifier of + the crashing process and the current time of the crash. +/para + /sect3 /sect2 sect2 id=installation-notes-sco diff --git a/src/backend/main/main.c b/src/backend/main/main.c index 6cb70f2..5a56c25 100644 --- a/src/backend/main/main.c +++ b/src/backend/main/main.c @@ -79,6 +79,14 @@ main(int argc, char *argv[]) argv = save_ps_display_args(argc, argv); /* + * If supported on the current platform, set up a handler to be called if + * the backend/postmaster crashes with a fatal signal or exception. + */ +#ifdef WIN32 + pgwin32_install_crashdump_handler(); +#endif + + /* * Set up locale information from environment. Note that LC_CTYPE and * LC_COLLATE will be overridden later from pg_control if we are in an * already-initialized database. We set them here so that they will be diff --git a/src/backend/port/win32/Makefile b/src/backend/port/win32/Makefile index 8bf9f74..d00c334 100644 --- a/src/backend/port/win32/Makefile +++ b/src/backend/port/win32/Makefile @@ -12,6 +12,6 @@ subdir = src/backend/port/win32 top_builddir = ../../../.. include $(top_builddir)/src/Makefile.global -OBJS = timer.o socket.o signal.o security.o mingwcompat.o +OBJS = timer.o socket.o signal.o security.o mingwcompat.o crashdump.o include $(top_srcdir)/src/backend/common.mk diff --git a/src/backend/port/win32/crashdump.c b/src/backend/port/win32/crashdump.c new file mode 100644 index 000..82fb692 --- /dev/null +++ b/src/backend/port/win32/crashdump.c @@ -0,0 +1,222 @@ +/*- + * + * win32_crashdump.c + * Automatic crash dump creation for PostgreSQL on Windows + * + * The crashdump feature traps unhandled win32 exceptions produced by the + * backend, and tries to produce a Windows MiniDump crash + * dump for later debugging and analysis. The machine performing the dump + * doesn't need any special debugging tools; the user only needs to send + * the dump to somebody who has the same version of PostgreSQL and has debugging + * tools. + * + * crashdump module originally by Craig Ringer ring...@ringerc.id.au + * + * LIMITATIONS + * === + * This *won't* work in hard OOM situations or stack overflows. + * + * For those, it'd be necessary to take a much more complicated approach where + * the handler switches to a new stack (if it can) and forks a helper process + * to debug its self. + * + * POSSIBLE FUTURE WORK + * + * For bonus points, the crash dump format permits embedding of user-supplied data. + * If there's anything else (postgresql.conf? Last few lines of a log file?) that + * should always be supplied with a crash dump, it could potentially be added, + * though at the cost of a greater chance of the crash dump failing. + * + * + * Portions Copyright (c) 1996-2010, PostgreSQL Global Development Group + * + * IDENTIFICATION + *
Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Hi all Updated patch attached. This one avoids the inline function stuff and instead just decides whether to compile unix_crashdump.c or win32_crashdump.c based on build system tests. It passes make check on nix and tests on win32, both with and without the crashdumps directory existing. The inadvertent duplication of the functon prototype is fixed. I haven't added any SGML documentation yet, as I'm not sure (a) to what level it should be documented, and (b) whether it should be in the main docs or just the developer docs plus the crash debugging guide on the wiki. Thoughts? You can also find this patch at: https://github.com/ringerc/postgres/tree/crashdump ie git://github.com/ringerc/postgres.git, branch crashdump . -- System Network Administrator POST Newspapers From b6b3052fab356971cf38a48d675c68b1767a7ad5 Mon Sep 17 00:00:00 2001 From: Craig Ringer cr...@ayaki.(none) Date: Thu, 16 Dec 2010 15:11:19 +0800 Subject: [PATCH] Add support for generating a crash dump on backend/postmaster crash On win32, services running under accounts without the right to interact with the desktop don't work well with the system's built-in crash dumping and error reporting facilities. For that reason, the crashdump code installs an unhandled exception handler to trap failures and have a crashing backend or postmaster debug its self. This handler loads and calls dbghelp.dll to generate a minidump when it is invoked. The minidump may be debugged after the backend has finished dying, or sent elsewhere for someone else to debug. Minidumps are only written if a directory called 'crashdumps' exists off the root of the datadir. If that directory exists, a backend crash will result in a minidump file being written to that directory, containing the pid of the crashing backend and a system tick count to prevent overwrites on pid wraparound. The filename of the dump is written to the postgresql log (if possible) before the backend resumes crashing. If the crashdumps directory does NOT exist, a log message is emitted from the crashing backend to indicate why no dump was written before it crashes. Minidumps may be debugged using windbg.exe or a paid version of Microsoft Visual Studio, though not with the free Visual Studio Express Edition. In future a helper process or even the postmaster could take responsibility for debugging a crashing process, so it doesn't need to try to debug its self. If the postmaster gains the ability to manage generic (non-worker-backend children) then a crash handler/reporter would be a useful lightweight option. This would also offer the opportunity to have log excerpts, config files, or other important resources automatically included in the crashdump. Crash dump support is handled via the ports infrastructure. Currently installation of a crash dump handler is a no-op on platforms other than win32, so this patch only meaningfully affects Windows systems. This infrastructure may theoretically be extended to support self-dumps on other platforms, though most UNIX platforms offer OS-level core dump features that remove the need for application software to worry about this. This is revision 4 of the patch. The static inline to no-op out the install call was replaced by a proper extern that calls an empty function, after comments by Tom Lane. An accidental duplication of the prototype in both port.h and postgres.h was fixed and the prototype was moved to postgres.h only. Trailing whitespace was fixed. Build integration was fixed to use the same mechanism as the rest of the ports infrastructure, and a new configure was generated to reflect the changes to configure.in. Squashed commit of the following: commit 4faef5aec63f3d3a6a01235b33f22ce0101c444d Author: Craig Ringer cr...@ayaki.(none) Date: Thu Dec 16 14:36:46 2010 +0800 Ignore autom4te.cache and the new pg_crashdump.c commit 9186496a2b444b67ab822ed45aea3a093269676e Author: Craig Ringer cr...@ayaki.(none) Date: Thu Dec 16 14:34:16 2010 +0800 New configure file to match configure.in changes for crashdump support As PostgreSQL versions the generated copy of `configure', re-generate it with autoconf-2.63 to reflect the changes to configure.in . The diff shows only changes related to the configure.in alterations, with no other random alterations made by configure. commit 2a2ad20a8bf4f4e3730b4e271b3956ebebfea87c Author: Craig Ringer cr...@ayaki.(none) Date: Thu Dec 16 14:02:24 2010 +0800 Correctly integrate crashdump support into the build system Use configure / Mkvcbuild to switch between win32_crashdump.c and unix_crashdump.c rather than trying to do it with Make conditionals. Document the right way to do this for the future reference of others. commit 19bd84f3ec459744fb9df1bc2b22522f031aee2d Author: Craig Ringer cr...@ayaki.(none) Date: Thu Dec 16 13:21:28 2010 +0800 Add support for generating a crash dump on backend/postmaster crash Crash dump support is handled via the ports
Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)
BTW, related to the crash dump work: I wanted to see if I could get any integration into Windows Error Reporting for Pg, because it'd be nice to be able to use Microsoft's servers to aggregate crash reports and collect data on problems. WER (on Windows 7 and Vista) also provides automatic management of retained local copies of crash dumps, automatic bundling of related files, automatic out-of-process debugging, etc. WER has been significantly improved in Windows Visa / 7 and may be worth considering enabling its use when Pg is installed on those platforms. Do you think there'd be any interest in re-enabling WER when installation on win7 / vista is detected? I've seen mention that the win32 installers go to some lengths to disable WER because it tends to hang the server while waiting for a user prompt. I was wondering if/how the installer disables WER for Pg? There aren't any calls to AddERExcludedApplication() or WerAddExcludedApplication() in the Pg sources. Pg doesn't appear under: HKEY_CURRENT_USER\Software\Microsoft\Windows\Windows Error Reporting HKEY_LOCAL_MACHINE\Software\Microsoft\Windows\Windows Error Reporting within an ExcludedApplications key either. WHY WINDOWS ERROR REPORTING? WER on Windows 7 permits things like an out-of-process debugger to be invoked. Once the postmaster knows how to manage generic children (as has been discussed to permit better integration of PgAgent among other things) it could also become responsible for doing out-of-process crash dumping and error reporting on crashed children rather than having to do potentially unreliable self-debugging as is presently the case. It might not even be necessary to do that much, as WER on win7/vista offers so much more configurability. On Win7/vista you can also set WER up not to block execution while it waits for user input, and to prompt the user even when the user is running under a different security level. It can automatically bundle postgresql.conf, the most recent log file, etc when an error report is sent. By contrast to app-generated minidumps, most importantly it doesn't require the user to do anything more than click yes for an error report to be collected and become available for analysys of most frequent failures. The downside is that it requires a *business* to register for access to WER data, and that business must obtain a VeriSign code signing cert (US$99 if obtained for WER purposes). This might be something the EDB folks would be interested in, especially as it should also permit shipping of signed Pg binaries, which is great for protecting against AV software, making it easier to deploy in business environments, etc. Note that Windows Error Reporting doens't eliminate the utility of the crashdumps feature, as self-crashdumps are easily accessible to and configurable by the local user/admin and work fine on Windows XP. The current self-generated crash dumps also contain more information than you'll usually get from a WER dump, though it may be possible to customise WER so it includes the same information. I'd really like to know more about past experience with Windows Error Reporting, because I'd like to investigate whether it's worth using in future with a few tweaks for win7 use. -- System Network Administrator POST Newspapers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)
On Thu, Dec 16, 2010 at 09:21, Craig Ringer cr...@postnewspapers.com.au wrote: Hi all Updated patch attached. This one avoids the inline function stuff and instead just decides whether to compile unix_crashdump.c or win32_crashdump.c based on build system tests. It passes make check on nix and tests on win32, both with and without the crashdumps directory existing. The inadvertent duplication of the functon prototype is fixed. I've updated this patch in quite a few ways, and you can find the latest on my github (will attach when I'm done with cleanups). Things I've changed so far: * I moved it all into backend/port/win32, and thus removed all the autoconf things, since they are not necessary. This is only for win32 at this point, and AFAIK we don't expect it to be for other platforms. If that's needed, we can move it back later, but for now let's keep it simple. * using elog() really isn't safe. We set the crash handler *long* before we have loaded the elog subsystem. It's better to get a log to eventlog than to not get it at all. * Plus a number of smaller changes One thing that I did notice, and I'm not entirely sure what to do with - which is why I wanted to post this while I'm still working on the cleanup: We use the existance of the crashdumps directory as an indication we want crashdumps. That's fine when the system is up. But what if we crash *in the postmaster before we have done chdir()*? Should we perhaps instead define a subdirectory of *where the .EXE file is*, and dump the file there? I haven't added any SGML documentation yet, as I'm not sure (a) to what level it should be documented, and (b) whether it should be in the main docs or just the developer docs plus the crash debugging guide on the wiki. Thoughts? It needs to be documented somewhere.Perhaps in 15.8 Platform Specific Notes? That's really about building it, but it might be reasonable there anyway? It does hold a number of things today that aren't related to building, for other platforms. (And yes, it should go in the wiki as well, of course) -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)
On Thu, Dec 16, 2010 at 8:01 AM, Magnus Hagander mag...@hagander.net wrote: We use the existance of the crashdumps directory as an indication we want crashdumps. That's fine when the system is up. But what if we crash *in the postmaster before we have done chdir()*? Should we perhaps instead define a subdirectory of *where the .EXE file is*, and dump the file there? That seems pretty ugly, for a pretty marginal case. The number of crashes that are going to happen in the postmaster before we've done chdir() should be extremely small, especially if we're talking about crashes in the field rather than during development. How about adding a command-line option to force a dump to be written in $CWD whether a crashdumps directory exists or not? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)
On Thu, Dec 16, 2010 at 14:01, Magnus Hagander mag...@hagander.net wrote: On Thu, Dec 16, 2010 at 09:21, Craig Ringer cr...@postnewspapers.com.au wrote: Hi all Updated patch attached. This one avoids the inline function stuff and instead just decides whether to compile unix_crashdump.c or win32_crashdump.c based on build system tests. It passes make check on nix and tests on win32, both with and without the crashdumps directory existing. The inadvertent duplication of the functon prototype is fixed. I've updated this patch in quite a few ways, and you can find the latest on my github (will attach when I'm done with cleanups). Things I've changed so far: Found another problem in it: when running with an older version of dbghelp.dll (which I was), it simply didn't work. We need to grab the version of dbghelp.dll at runtime and pick which things we're going to dump based on that. The attached version of the patch does that. Can you please test that it still generates the full dump on your system, which supposedly has a much more modern version of dbghelp.dll than mine? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ *** a/src/backend/main/main.c --- b/src/backend/main/main.c *** *** 79,84 main(int argc, char *argv[]) --- 79,92 argv = save_ps_display_args(argc, argv); /* + * If supported on the current platform, set up a handler to be called if + * the backend/postmaster crashes with a fatal signal or exception. + */ + #ifdef WIN32 + pgwin32_install_crashdump_handler(); + #endif + + /* * Set up locale information from environment. Note that LC_CTYPE and * LC_COLLATE will be overridden later from pg_control if we are in an * already-initialized database. We set them here so that they will be *** a/src/backend/port/win32/Makefile --- b/src/backend/port/win32/Makefile *** *** 12,17 subdir = src/backend/port/win32 top_builddir = ../../../.. include $(top_builddir)/src/Makefile.global ! OBJS = timer.o socket.o signal.o security.o mingwcompat.o include $(top_srcdir)/src/backend/common.mk --- 12,17 top_builddir = ../../../.. include $(top_builddir)/src/Makefile.global ! OBJS = timer.o socket.o signal.o security.o mingwcompat.o crashdump.o include $(top_srcdir)/src/backend/common.mk *** /dev/null --- b/src/backend/port/win32/crashdump.c *** *** 0 --- 1,227 + /*- + * + * win32_crashdump.c + * Automatic crash dump creation for PostgreSQL on Windows + * + * The crashdump feature traps unhandled win32 exceptions produced by the + * backend, and tries to produce a Windows MiniDump crash + * dump for later debugging and analysis. The machine performing the dump + * doesn't need any special debugging tools; the user only needs to send + * the dump to somebody who has the same version of PostgreSQL and has debugging + * tools. + * + * crashdump module originally by Craig Ringer ring...@ringerc.id.au + * + * LIMITATIONS + * === + * This *won't* work in hard OOM situations or stack overflows. + * + * For those, it'd be necessary to take a much more complicated approach where + * the handler switches to a new stack (if it can) and forks a helper process + * to debug its self. + * + * POSSIBLE FUTURE WORK + * + * For bonus points, the crash dump format permits embedding of user-supplied data. + * If there's anything else (postgresql.conf? Last few lines of a log file?) that + * should always be supplied with a crash dump, it could potentially be added, + * though at the cost of a greater chance of the crash dump failing. + * + * + * Portions Copyright (c) 1996-2010, PostgreSQL Global Development Group + * + * IDENTIFICATION + * src/backend/port/win32/crashdump.c + * + *- + */ + + #include postgres.h + + #define WIN32_LEAN_AND_MEAN + #include windows.h + #include string.h + #include dbghelp.h + + /* + * Much of the following code is based on CodeProject and MSDN examples, + * particularly + * http://www.codeproject.com/KB/debug/postmortemdebug_standalone1.aspx + * + * Useful MSDN articles: + * + * http://msdn.microsoft.com/en-us/library/ff805116(v=VS.85).aspx + * http://msdn.microsoft.com/en-us/library/ms679294(VS.85).aspx + * + * Other useful articles on working with minidumps: + * http://www.debuginfo.com/articles/effminidumps.html + */ + + typedef LPAPI_VERSION (WINAPI *IMAGEHLPAPIVERSION)(void); + typedef BOOL (WINAPI *MINIDUMPWRITEDUMP)(HANDLE hProcess, DWORD dwPid, HANDLE hFile, MINIDUMP_TYPE DumpType, + CONST PMINIDUMP_EXCEPTION_INFORMATION ExceptionParam, + CONST PMINIDUMP_USER_STREAM_INFORMATION UserStreamParam, + CONST PMINIDUMP_CALLBACK_INFORMATION CallbackParam + ); + + /* +
Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)
On Thu, Dec 16, 2010 at 15:07, Robert Haas robertmh...@gmail.com wrote: On Thu, Dec 16, 2010 at 8:01 AM, Magnus Hagander mag...@hagander.net wrote: We use the existance of the crashdumps directory as an indication we want crashdumps. That's fine when the system is up. But what if we crash *in the postmaster before we have done chdir()*? Should we perhaps instead define a subdirectory of *where the .EXE file is*, and dump the file there? That seems pretty ugly, for a pretty marginal case. The number of crashes that are going to happen in the postmaster before we've done chdir() should be extremely small, especially if we're talking about crashes in the field rather than during development. How about adding a command-line option to force a dump to be written in $CWD whether a crashdumps directory exists or not? Is that less ugly? ;) But yes, we are talking about in the field, so it's fairly small. But any crash during guc loading for example would go there, I think? We can do such a commandline. We don't have any platform-specific commandline options today. Is that something we've intentionally avoided, or just not needed before? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)
On Thu, Dec 16, 2010 at 9:24 AM, Magnus Hagander mag...@hagander.net wrote: Is that less ugly? ;) Well, I thought it was or I would have suggested it, but it's obviously open to interpretation. But yes, we are talking about in the field, so it's fairly small. But any crash during guc loading for example would go there, I think? Probably. But how likely is that to happen only on Windows and only in the field? We can do such a commandline. We don't have any platform-specific commandline options today. Is that something we've intentionally avoided, or just not needed before? Beats me. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Robert Haas robertmh...@gmail.com writes: On Thu, Dec 16, 2010 at 9:24 AM, Magnus Hagander mag...@hagander.net wrote: We can do such a commandline. We don't have any platform-specific commandline options today. Is that something we've intentionally avoided, or just not needed before? Beats me. Yes, it's something we intentionally avoid: there is no convenient way to have conditional entries in getopt()'s option-letters string. I think the proposal for such a switch is unnecessary lily-gilding, and is far more likely to create problems than solve any. Please remember that a debugging facility has got to be minimal impact, or it creates its own Heisenbugs. (In the unlikely event that someone needed to debug such an early crash, surely they could just create a crashdump subdirectory in whatever directory they want to launch the postmaster from.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)
On Thu, Dec 16, 2010 at 15:56, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Thu, Dec 16, 2010 at 9:24 AM, Magnus Hagander mag...@hagander.net wrote: We can do such a commandline. We don't have any platform-specific commandline options today. Is that something we've intentionally avoided, or just not needed before? Beats me. Yes, it's something we intentionally avoid: there is no convenient way to have conditional entries in getopt()'s option-letters string. I think the proposal for such a switch is unnecessary lily-gilding, and is far more likely to create problems than solve any. Please remember that a debugging facility has got to be minimal impact, or it creates its own Heisenbugs. (In the unlikely event that someone needed to debug such an early crash, surely they could just create a crashdump subdirectory in whatever directory they want to launch the postmaster from.) Or actually - run the postmaster from a debugger... I think the thing we lose is the ability for the DBA to say pre-emptively if this crashes tomorrow on restart i want a crashdump. Hmm. What we could do is have pg_ctl chdir() into the data directory on start. We don't set it to anything there now - so we rely on the If this parameter is NULL, the new process will have the same current drive and directory as the calling process. Is that likely to cause problems in other areas? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Magnus Hagander mag...@hagander.net writes: On Thu, Dec 16, 2010 at 15:56, Tom Lane t...@sss.pgh.pa.us wrote: I think the proposal for such a switch is unnecessary lily-gilding, Hmm. What we could do is have pg_ctl chdir() into the data directory on start. See above. You're solving a problem that probably doesn't exist, and introducing a pile of new ones in the process --- for example, this will break commands that use relative paths for either the postmaster executable or the data directory itself. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Magnus Hagander mag...@hagander.net writes: Found another problem in it: when running with an older version of dbghelp.dll (which I was), it simply didn't work. We need to grab the version of dbghelp.dll at runtime and pick which things we're going to dump based on that. The attached version of the patch does that. Can you please test that it still generates the full dump on your system, which supposedly has a much more modern version of dbghelp.dll than mine? This version of the patch looks fairly sane in terms of how it connects to the rest of the code; but I'm unqualified to comment on the Windows-specific code. One thought is maybe the call point should be in startup_hacks() rather than the main line of main()? I'm not terribly set on it if you don't like that. Also, I notice that the comment for startup_hacks() claims that it isn't executed in subprocesses, which is evidently an obsolete statement --- that should get cleaned up independently of this. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)
On 16/12/10 21:01, Magnus Hagander wrote: Found another problem in it: when running with an older version of dbghelp.dll (which I was), it simply didn't work. We need to grab the version of dbghelp.dll at runtime and pick which things we're going to dump based on that. I was about to suggest dropping the fallback loading of the system dbghelp.dll, and just bundle a suitable version with Pg, then I saw how big the DLL is. It's 800kb (!!) of code that'll hopefully never get used on any given system. With a footprint like that, bundling it seems rather less attractive. I think Magnus is right: test the dbghelp.dll version and fall back to supported features - as far back as XP, anyway; who cares about anything earlier than that. An updated version can always be put in place by the user if the built-in one can't produce enough detail. * I moved it all into backend/port/win32, and thus removed all the autoconf things, since they are not necessary. This is only for win32 at this point, and AFAIK we don't expect it to be for other platforms. If that's needed, we can move it back later, but for now let's keep it simple. Yeah, fair call - especially as most other OSes have native crash dump facilities (kernel-generated core dumps, etc) that render explicit crash handling much less useful. * using elog() really isn't safe. We set the crash handler *long* before we have loaded the elog subsystem. It's better to get a log to eventlog than to not get it at all. Good point. It made some sense when running as a loadable module, as the crash dumper was only loaded quite late, and so long as it elog()'d only after writing the dump that was fine. Now, not so much. We use the existance of the crashdumps directory as an indication we want crashdumps. That's fine when the system is up. But what if we crash *in the postmaster before we have done chdir()*? Should we perhaps instead define a subdirectory of *where the .EXE file is*, and dump the file there? Well, on Windows it's pretty easy to find out where your executable is, and the crash dumper already does that to determine where to load dbghelp.dll from. It'd be no stretch to use a subdir of the postgresql install dir for crash dumps. OTOH, there are all sorts of exciting permissions issues to deal with then, as write permission for postgres won't be inherited from the datadir. It's also generally a big ugly and not very good practice to write to the application directory. If not writing to the datadir, for non-services you'd use something like: %LOCALAPPDATA%\postgres\crashdumps ... but for a service running under its own account that's not a good option either. Consequently, IMO the datadir remains the best place for crashdumps. Is this going to be a real-world issue? One simple answer might be not to load the crash dump handler until after the postmaster chdir()s, but honestly I'm not sure it matters. If it can't find somewhere to write dumps, it doesn't write a dump. The only possible issue I can imagine would be if the postmaster started with a cwd that a malicious user could write to, so they could create a 'crashdumps' dir, somehow cause an early start postmaster crash, therefore somehow leak info from the protected postgres account files that way. A quick look at the code suggests that the postmaster doesn't do anything interesting in the datadir before chdir()ing in there, so I don't see any danger in that. In any case, if a malicious user has write access to the postmaster's startup cwd, with the design of Windows PATH and library loading that means they can replace critical DLLs with their own malicious versions, so there's a whole lot more to worry about than leaking information via crashdumps. So: I say just use the cwd, whatever it is, and don't worry about it. An admin can manually create a crashdumps dir in the service start cwd and set appropriate permissions in the unlikely event they're trying to collect crash dumps for an early-crashing postmaster and they can't just use a debugger directly. It needs to be documented somewhere.Perhaps in 15.8 Platform Specific Notes? That's really about building it, but it might be reasonable there anyway? It does hold a number of things today that aren't related to building, for other platforms. Seems reasonable. -- System Network Administrator POST Newspapers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)
On Dec 17, 2010 8:02 AM, Craig Ringer cr...@postnewspapers.com.au wrote: On 16/12/10 21:01, Magnus Hagander wrote: Found another problem in it: when running with an older version of dbghelp.dll (which I was), it simply didn't work. We need to grab the version of dbghelp.dll at runtime and pick which things we're going to dump based on that. I was about to suggest dropping the fallback loading of the system dbghelp.dll, and just bundle a suitable version with Pg, then I saw how big the DLL is. It's 800kb (!!) of code that'll hopefully never get used on any given system. With a footprint like that, bundling it seems rather less attractive. I think Magnus is right: test the dbghelp.dll version and fall back to supported features - as far back as XP, anyway; who cares about anything earlier than that. An updated version can always be put in place by the user if the built-in one can't produce enough detail. Did you get a chance to test that it still produced a full dump on your system? We use the existance of the crashdumps directory as an indication we want crashdumps. That's fine when the system is up. But what if we crash *in the postmaster before we have done chdir()*? Should we perhaps instead define a subdirectory of *where the .EXE file is*, and dump the file there? ... Is this going to be a real-world issue? Nah, I'm with tom on the fact that this is probably not. It needs to be documented somewhere.Perhaps in 15.8 Platform Specific Notes? That's really about building it, but it might be reasonable there anyway? It does hold a number of things today that aren't related to building, for other platforms. Seems reasonable. I'll put something together there then. /Magnus
Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)
I've updated this entry in the CommitFest app to note that Craig had some implementation questions attached to his patch submission that I haven't seen anyone address yet, and to include a reference to Tom's latest question--which may make those questions moot, not sure. This pretty clearly need to sit on the stove a little bit longer before it's done regardless. I'm marking this one Returned With Feedback, and hopefully Craig will continue hammering on this to clean up the remaining details and resubmit in the next month. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services and Supportwww.2ndQuadrant.us PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)
On 12/15/2010 01:01 AM, Tom Lane wrote: Craig Ringercr...@postnewspapers.com.au writes: I've attached an updated patch that fixes a failure when compiling on gcc/linux. The no-op inline installCrashDumpHandler() for unsupported platforms was not declared static, so it was not being optimized out of objects it wasn't used in and was causing symbol collisions during linkage. Why in the world would you get involved in that portability mess for a function that is called only once? There's no possible performance justification for making it inline. The main concern I heard voiced when first suggesting this was about performance. Given that concern, if I could make it a no-op on unix/linux I thought that worth doing. I'm _much_ happier with a simple, non-ifdef'd extern function declaration and compilation of an empty function body on unsupported platforms. Given how concerned everyone was about *any* effect on backend startup, though, I was concerned that'd be turned down as unnecessary bloat. I've done it a nicer way now, and will post the updated patch once I've had a chance to re-test it on my Windows dev box. I'm also wondering why you have got conflicting declarations in postgres.h and port.h, and why none of these declarations follow ANSI C (write (void) not ()). For postgres.h : that's a good question, as I thought I removed that. I suspect it was reintroduced when reapplying the patch to my working tree to revise it. Whoops. As for the ansi C style - too much time with C++, though long ago now. I think I got the PostgreSQL rules for code formatting right, but missed the void param rule. -- Craig Ringer -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Craig Ringer cr...@postnewspapers.com.au writes: I've attached an updated patch that fixes a failure when compiling on gcc/linux. The no-op inline installCrashDumpHandler() for unsupported platforms was not declared static, so it was not being optimized out of objects it wasn't used in and was causing symbol collisions during linkage. Why in the world would you get involved in that portability mess for a function that is called only once? There's no possible performance justification for making it inline. I'm also wondering why you have got conflicting declarations in postgres.h and port.h, and why none of these declarations follow ANSI C (write (void) not ()). regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)
I've attached an updated patch that fixes a failure when compiling on gcc/linux. The no-op inline installCrashDumpHandler() for unsupported platforms was not declared static, so it was not being optimized out of objects it wasn't used in and was causing symbol collisions during linkage. make check now passes correctly on Linux and no warnings are generated during compilation. === All 124 tests passed. === Sorry about this mess-up; I don't recall having to use inline static when using inlines in headers with c++, but as that was a while ago (and C++) my recollection can't be trusted anyway. Your thoughts on the following questions from before would be much appreciated: I'm not sure I've taken the right approach in terms of how I've hooked the crash dump code into the build system. I'm fairly happy with when it's actually inited, and with the win32 portfile (ports/win32/crashdump.c) - what I'm not so sure about is how I've handled the conditional compilation by platform. Currently, src/backend/ports/win32/crashdump.c is compiled on win32. It exports a generic win32-garbage-free function: void installCrashDumpHandler(); I was just going to create a no-op version of the same function in a (new) file src/backend/ports/crashdump.c , but I was having problems figuring out how to sensibly express compile this file for everything except these ports. In addition, doing it this way means there's still a pointless no-op function call in each backend start and some useless symbols - maybe not a big worry, but not ideal. What I ended up doing was putting a conditionally compiled switch in port.h that produces an inline no-op version of installCrashDumpHandler for unsupported platforms, and an extern for supported platforms (currently win32). That'll optimize out completely on unsupported platforms, which is nice if unimportant. I'm just not sure if port.h is really the right place and if it's only used by the backend. I considered a new header included by postgres.h (since it's really backend-only stuff) but as it seems Pg generally prefers bigger headers over more little headers, I popped it in port.h . From de9a1f0e0f4ccfd8b33f37212aea61fd6ba29a89 Mon Sep 17 00:00:00 2001 From: Craig Ringer cr...@ayaki.(none) Date: Tue, 14 Dec 2010 13:47:09 +0800 Subject: [PATCH] Add support for generating a crash dump on backend/postmaster crash Crash dump support is handled via the ports infrastructure. Currently installation of a crash dump handler is a no-op on platforms other than win32. On win32, an unhandled exception handler is installed. This handler loads and calls dbghelp.dll to generate a minidump when it is invoked. The minidump may be debugged after the backend has finished dying, or sent elsewhere for someone else to debug. This is revision 2 of the patch, including a fix for a warning when compiling on Linux/gcc. --- src/backend/main/main.c|7 ++ src/backend/port/win32/Makefile|2 +- src/backend/port/win32/crashdump.c | 202 src/include/port.h | 13 +++ src/include/postgres.h |4 + 5 files changed, 227 insertions(+), 1 deletions(-) create mode 100644 src/backend/port/win32/crashdump.c diff --git a/src/backend/main/main.c b/src/backend/main/main.c index 6cb70f2..45a25f4 100644 --- a/src/backend/main/main.c +++ b/src/backend/main/main.c @@ -79,6 +79,13 @@ main(int argc, char *argv[]) argv = save_ps_display_args(argc, argv); /* +* If supported on the current platform, set up a handler to be called if +* the backend/postmaster crashes with a fatal signal or exception. +* See port/crashdump.c +*/ + installCrashDumpHandler(); + + /* * Set up locale information from environment. Note that LC_CTYPE and * LC_COLLATE will be overridden later from pg_control if we are in an * already-initialized database. We set them here so that they will be diff --git a/src/backend/port/win32/Makefile b/src/backend/port/win32/Makefile index 8bf9f74..d00c334 100644 --- a/src/backend/port/win32/Makefile +++ b/src/backend/port/win32/Makefile @@ -12,6 +12,6 @@ subdir = src/backend/port/win32 top_builddir = ../../../.. include $(top_builddir)/src/Makefile.global -OBJS = timer.o socket.o signal.o security.o mingwcompat.o +OBJS = timer.o socket.o signal.o security.o mingwcompat.o crashdump.o include $(top_srcdir)/src/backend/common.mk diff --git a/src/backend/port/win32/crashdump.c b/src/backend/port/win32/crashdump.c new file mode 100644 index 000..a5629ea --- /dev/null +++ b/src/backend/port/win32/crashdump.c @@ -0,0 +1,202 @@ +/*- + * + * crashdump.c + * Automatic crash dump creation for PostgreSQL on Windows + * + * The crashdump module traps unhandled exceptions produced by the backend + * the module is
Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)
On 22/11/2010 7:37 PM, Magnus Hagander wrote: Finally getting to looking at this one - sorry about the very long delay. Ditto, I'm afraid. I agree with Heikki's earlier comment that it's better to have this included in the backend - but that's obviously not going to happen for already-released versions. I'd therefor advocate a contrib module for existing versions, and then in core for 9.1+. I think that was the conclusion the following discussion came to, as well. It's easy to distribute a binary .dll for older versions and if the EDB folks found this facility really useful they could even bundle it in the installer. There's no point backpatching it given how few win32 users will build from source. We should then have an option to turn it off (on by default). But we don't need to pay the overhead on every backend startup - we could just map the value into the parameter shared memory block, and require a full postmaster restart in order to change it. As discussed later, implemented by testing for a 'crashdumps' directory. If the directory exists, a dump is written. If the directory is missing a log message is emitted to explain why no dump was written; as this'll only happen when a backend dies, log noise isn't an issue and it'll help save sysadmin head-scratching given the magic behaviour of turning on/off based on directory existence. Do we want to backpatch it into contrib/? Adding a new module there seems kind of wrong - probably better to keep the source separate and just publish the DLL files for people who do debugging? +1 to just publishing the DLLs Looking at the code: * Why do we need to look at differnt versions of dbghelp.dll? Can't we always use the one with Windows? And in that case, can't we just use compile-time linking, do we need to bother with DLL loading at all? Newer versions than those shipped with an OS release contain updated and improved functionality as well as bug fixes. We *can* use the version shipped with the OS, as even in XP dbghelp.dll contains the required functionality, but it'd be better to bundle it. That's microsoft's recommendation when you use dbghelp.dll . * Per your comment about using elog() - a good idea in that case is to use write_stderr(). That will send the output to stderr if there is one, and otherwise the eventlog (when running as service). Hm, ok. The attached patch doesn't do that yet. Given the following comments, do you think the change still necessary? * And yes, in the crash handler, we should *definitely* not use elog(). Currently I'm actually using it, but only after the dump has been written or we've decided not to write a dump. At this point, if elog() fails we don't really care, we'd just be returning control to the OS's fatal exception cleanup anyway. It's IMO desirable for output to go to the same log as everything else, so admins can find out what's going on and can associate the crash dump files with events that happened around a certain time. * On Unix, the core file is dropped in the database directory, we don't have a separate directory for crashdumps. If we want to be consistent, we should do that here too. I do think that storing them in a directory like crashdumps is better, but I just wanted to raise the comment. Given the recent discussion about using the dir as an on/off switch, that'll be necessary. * However, when storing it in crashdumps, I think the code would need to create that directory if it does not exist, doesn't it? Not now that we're using it an on/off switch, but of course it would've made sense otherwise. * Right now, we overwrite old crashdumps. It is well known that PIDs are recycled pretty quickly on Windows - should we perhaps dump as postgres-pid-sequence.mdmp when there is a filename conflict? Good idea. Rather than try to test for existing files I've just taken the simple route (always good in code run during process crash) and appended the system ticks to the dump name. System ticks are a 32-bit quantity that wraps about every 40 days, so they should be easily unique enough in combination with the pid. The logs will report the crashdump filenames unless the backend is too confused to be able to elog(), and all the crash dump files have file system timestamps so there isn't much point trying to format a crash date/time into their names. I could use wall clock seconds (or ms) instead, but favour system ticks because they can be read with a simple int-returning call that doesn't require any more stack allocation. On Windows, getting wall clock time in seconds seems to involve a call to GetSystemTimeAsFileTime (http://msdn.microsoft.com/en-us/library/ms724397(v=VS.85).aspx) to populate a struct containing a fake-64-bit int as two 32-bit ints. It's not clear that it's free of timezone calculations and generally looks like something better avoided in a crash handler if it's not necessary. As GetSystemTicks seems quite sufficient, I
Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)
On 6/12/2010 12:57 PM, Craig Ringer wrote: On 22/11/2010 7:37 PM, Magnus Hagander wrote: Finally getting to looking at this one - sorry about the very long delay. Ditto, I'm afraid. Oh, I forgot to mention in the patch email: I'm not sure I've taken the right approach in terms of how I've hooked the crash dump code into the build system. I'm fairly happy with when it's actually inited, and with the win32 portfile (ports/win32/crashdump.c) - what I'm not so sure about is how I've handled the conditional compilation by platform. Currently, src/backend/ports/win32/crashdump.c is compiled on win32. It exports a generic win32-garbage-free function: void installCrashDumpHandler(); I was just going to create a no-op version of the same function in a (new) file src/backend/ports/crashdump.c , but I was having problems figuring out how to sensibly express compile this file for everything except these ports. In addition, doing it this way means there's still a pointless no-op function call in each backend start and some useless symbols - maybe not a big worry, but not ideal. What I ended up doing was putting a conditionally compiled switch in port.h that produces an inline no-op version of installCrashDumpHandler for unsupported platforms, and an extern for supported platforms (currently win32). That'll optimize out completely on unsupported platforms, which is nice if unimportant. I'm just not sure if port.h is really the right place and if it's only used by the backend. I considered a new header included by postgres.h (since it's really backend-only stuff) but as it seems Pg generally prefers bigger headers over more little headers, I popped it in port.h . Comments? -- Craig Ringer -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Craig Ringer wrote: On 11/24/2010 05:18 AM, Magnus Hagander wrote: Or you set the handler always, and have the handler only actually create the dump if the directory exists. That way you can add the directory and still get a dump from both existing backends and the postmaster itself without a restart. That's way smarter. No extra filesystem access during startup, even if it is cheap. I added a commenting referencing this bit to the CF entry so it doesn't get forgotten. Magnus raised a few other issues in his earlier review too. Discussion of this patch seems to have jumped the gun a bit into talking about backpatching before the code for HEAD was completely done, then stalled there. Are we going to see an updated patch that addresses submitted feedback in this cycle? -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services and Supportwww.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)
However, I am not clear what benefit we get from moving this into core in 9.1. If it's still going to require a full postmaster restart, the GUC you have to change may as well be shared_preload_libraries as a new one. There's no reason it should require a postmaster restart. New backends spawned after the handler is turned on would enable it, and existing backends could be signalled to enable it as well. on-by-default is what we gain. I think that's fairly big... More than that. If a crash handler is in core, then: - It can be inited much earlier, catching crashes that happen during loading of libraries and during later backend init; and - It's basically free when the cost of shared library loading is removed, so it can be left on in production or even be on by default. I need to do some testing on this, but given the apparently near-zero cost of initing the handler I won't be surprised if testing a GUC to see if the handler should be on or not costs more than loading it does. I still wouldn't support on-by-default because you'd need an automatic process to weed out old crash dumps or limit the number stored. That's a bigger job. So I think the admin should have to turn it on, but it'd be good to make it easy to turn on in production without a restart; I don't see why that'd be hard. I'll try to put together a patch that does just that, time permitting. Things are kind of hectic ATM. -- Craig Ringer -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)
On 11/23/2010 01:30 AM, Tom Lane wrote: I'm not really sure why we're even considering the notion of back-patching this item. Doing so would not fit with any past practice or agreed-on project management practices, not even under our lax standards for contrib (and I keep hearing people claim that contrib is or should be as trustworthy as core, anyway). Since when do we back-patch significant features that have not been through a beta test cycle? I see no advantage to back-patching. It's easy to provide a drop-in binary DLL for older versions of Pg on Windows, who're the only people this will work for. If the EDB folks saw value in it, they could bundle the DLL with updated point releases of the installer for Windows. No back-patching is necessary. -- Craig Ringer -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)
On 11/23/2010 01:46 AM, Tom Lane wrote: * However, when storing it in crashdumps, I think the code would need to create that directory if it does not exist, doesn't it? If it didn't do so, then manual creation/removal of the directory could be used as an on/off switch for the feature. Yep. That's how I'd want to do it in 9.1 - test for the directory and use that to decide whether to set the handler during early backend startup. That way you don't need a GUC, and should be able to load it *very* early in backend startup. I haven't looked at the patch but this discussion makes it sound like the dumper is dependent on an uncomfortably large amount of backend code being functional. You need to minimize the number of assumptions of that sort. It doesn't need to have any backend code working, really; all it needs is a usable stack and the ability to allocate off the heap. It won't save you in total OOM situations, stack exhaustion, or severe stack corruption, but should work pretty much any other time. The crash dump facility in dbghelp.dll offers *optional* features where apps can stream in additional app-specific data like recent log excerpts, etc. I didn't try to implement anything like that in the initial module partly because I want to minimize the amount of the backend that must be working for a crash dump to succeed. -- Craig Ringer -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)
On 11/23/2010 01:56 AM, Heikki Linnakangas wrote: On 22.11.2010 19:47, Robert Haas wrote: I am as conservative about back-patching as anybody here, but debugging on Windows is not an easy thing to do, and I strongly suspect we are going to point people experiencing crashes on Windows to this code whether it's part of our official distribution or not. This whole thing makes me wonder: is there truly no reliable, simple method to tell Windows to create a core dump on crash, without writing custom code for it. I haven't seen one, but I find it amazing if there isn't. We can't be alone with this. Search for 'dbghelp.dll' on your average Windows system and you'll be surprised how many apps use it. Steam (the software distribution system) does, as does the Adobe Creative Suite on my machine. If you're running in interactive mode with access to the user's display you can use Windows error reporting. Services running in isolated user accounts don't seem to be able to use that. In any case, windows error reporting only collects *tiny* dumps with barely anything beyond the stack contents; they're a nightmare to use, and really require psychic powers and deep knowledge of scary Windows APIs for effective debugging. -- Craig Ringer -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)
On Tue, Nov 23, 2010 at 15:02, Craig Ringer cr...@postnewspapers.com.au wrote: However, I am not clear what benefit we get from moving this into core in 9.1. If it's still going to require a full postmaster restart, the GUC you have to change may as well be shared_preload_libraries as a new one. There's no reason it should require a postmaster restart. New backends spawned after the handler is turned on would enable it, and existing backends could be signalled to enable it as well. I think that came off my comment that we could store the on/off in the startup shared memory block. It'd then be the only way to get it into any existing backends. on-by-default is what we gain. I think that's fairly big... More than that. If a crash handler is in core, then: - It can be inited much earlier, catching crashes that happen during loading of libraries and during later backend init; and Yeah. - It's basically free when the cost of shared library loading is removed, so it can be left on in production or even be on by default. I need to do some testing on this, but given the apparently near-zero cost of initing the handler I won't be surprised if testing a GUC to see if the handler should be on or not costs more than loading it does. I'm fairly certain it does. The GUC would be there to be able to turn the whole thing off because you don't want the dumps, not for performance reasons. I still wouldn't support on-by-default because you'd need an automatic process to weed out old crash dumps or limit the number stored. That's a bigger job. So I think the admin should have to turn it on, but it'd be good to make it easy to turn on in production without a restart; I don't see why that'd be hard. I think the admin should deal with that - just like the admin has to clear out the old logs. I'll try to put together a patch that does just that, time permitting. Things are kind of hectic ATM. Let me know if you want me to look at adapting the patch for that - i can do that if you prefer. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)
On Tue, Nov 23, 2010 at 15:09, Craig Ringer cr...@postnewspapers.com.au wrote: On 11/23/2010 01:46 AM, Tom Lane wrote: * However, when storing it in crashdumps, I think the code would need to create that directory if it does not exist, doesn't it? If it didn't do so, then manual creation/removal of the directory could be used as an on/off switch for the feature. Yep. That's how I'd want to do it in 9.1 - test for the directory and use that to decide whether to set the handler during early backend startup. That way you don't need a GUC, and should be able to load it *very* early in backend startup. Or you set the handler always, and have the handler only actually create the dump if the directory exists. That way you can add the directory and still get a dump from both existing backends and the postmaster itself without a restart. The crash dump facility in dbghelp.dll offers *optional* features where apps can stream in additional app-specific data like recent log excerpts, etc. I didn't try to implement anything like that in the initial module partly because I want to minimize the amount of the backend that must be working for a crash dump to succeed. Yeah, we already have the logs in the logfile etc. Let's keep this uncomplicated so that it stays working :-) -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Magnus Hagander mag...@hagander.net writes: On Tue, Nov 23, 2010 at 15:09, Craig Ringer cr...@postnewspapers.com.au wrote: Yep. That's how I'd want to do it in 9.1 - test for the directory and use that to decide whether to set the handler during early backend startup. That way you don't need a GUC, and should be able to load it *very* early in backend startup. Or you set the handler always, and have the handler only actually create the dump if the directory exists. +1 for that way. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)
On 11/24/2010 05:18 AM, Magnus Hagander wrote: Or you set the handler always, and have the handler only actually create the dump if the directory exists. That way you can add the directory and still get a dump from both existing backends and the postmaster itself without a restart. That's way smarter. No extra filesystem access during startup, even if it is cheap. -- Craig Ringer -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)
On Tue, Oct 5, 2010 at 17:21, Craig Ringer cr...@postnewspapers.com.au wrote: OK, it's pretty much ready for proper testing now. If a few people are happy with the results and think it's a good idea I'll chuck it in the commitfest app. As built, the crash dump handler works with a stock PostgreSQL 9.0 (release build) as shipped in EDB's installer. Just drop crashdump.dll in lib\, optionally pop the dbghelp.dll redist in bin\, add 'crashdump' to shared_preload_libraries, and crash some backends however you feel like doing so. The current build of crashdump.dll for release versions of PostgreSQL 9.0 on 32-bit Windows is here: http://www.postnewspapers.com.au/~craig/webfiles/crashdump.dll If folks are happy with how this works, all it needs is: - Consideration of whether elog should be used or not. I'm inclined to suggest using elog to tell the user about the dump, but only after the crash dump has been written successfully. - Comments on whether this should be left as a loadable module, or integerated into core so it loads early in backend startup. The latter will permit crash dumping of early backend startup problems, and will have tiny overhead because there's no DLL to find and load. OTOH, it's harder to pull out if somehow something breaks. I'd want to start with loadable module in shared_preload_libraries and if that proves useful, only then consider integrating in core. I'm way too bad a programmer to want my code anywhere near Pg's core without plenty of real world testing. - (Maybe) a method to configure the crash dump type. I'm not too sure it's necessary given the set of dump flags I've landed up with, so I'd leave this be unless it proves to be necessary in real-world testing. Finally getting to looking at this one - sorry about the very long delay. I agree with Heikki's earlier comment that it's better to have this included in the backend - but that's obviously not going to happen for already-released versions. I'd therefor advocate a contrib module for existing versions, and then in core for 9.1+. We should then have an option to turn it off (on by default). But we don't need to pay the overhead on every backend startup - we could just map the value into the parameter shared memory block, and require a full postmaster restart in order to change it. Do we want to backpatch it into contrib/? Adding a new module there seems kind of wrong - probably better to keep the source separate and just publish the DLL files for people who do debugging? Looking at the code: * Why do we need to look at differnt versions of dbghelp.dll? Can't we always use the one with Windows? And in that case, can't we just use compile-time linking, do we need to bother with DLL loading at all? * Per your comment about using elog() - a good idea in that case is to use write_stderr(). That will send the output to stderr if there is one, and otherwise the eventlog (when running as service). * And yes, in the crash handler, we should *definitely* not use elog(). * On Unix, the core file is dropped in the database directory, we don't have a separate directory for crashdumps. If we want to be consistent, we should do that here too. I do think that storing them in a directory like crashdumps is better, but I just wanted to raise the comment. * However, when storing it in crashdumps, I think the code would need to create that directory if it does not exist, doesn't it? * Right now, we overwrite old crashdumps. It is well known that PIDs are recycled pretty quickly on Windows - should we perhaps dump as postgres-pid-sequence.mdmp when there is a filename conflict? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)
On Mon, Nov 22, 2010 at 6:37 AM, Magnus Hagander mag...@hagander.net wrote: Do we want to backpatch it into contrib/? Adding a new module there seems kind of wrong - probably better to keep the source separate and just publish the DLL files for people who do debugging? If this works without changes to core, I see little reason not to back-patch it into contrib. Our primary concern with back-patching is to avoid doing things that might break existing installations, but if there aren't any core changes, I don't really see how that can be an issue here. It seems to me that it's probably simpler for us and our users to keep the debugging tools together with our main tree. However, I am not clear what benefit we get from moving this into core in 9.1. If it's still going to require a full postmaster restart, the GUC you have to change may as well be shared_preload_libraries as a new one. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Robert Haas robertmh...@gmail.com writes: However, I am not clear what benefit we get from moving this into core in 9.1. If it's still going to require a full postmaster restart, the GUC you have to change may as well be shared_preload_libraries as a new one. +1. I am not in favor of randomly repackaging functionality, unless there's some clear benefit gained by doing so. In this case it seems like something that could and should remain at arm's length forever, so a contrib module is the ideal packaging. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)
On Mon, Nov 22, 2010 at 9:49 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: However, I am not clear what benefit we get from moving this into core in 9.1. If it's still going to require a full postmaster restart, the GUC you have to change may as well be shared_preload_libraries as a new one. +1. I am not in favor of randomly repackaging functionality, unless there's some clear benefit gained by doing so. In this case it seems like something that could and should remain at arm's length forever, so a contrib module is the ideal packaging. Now, if we could make this so low-overhead that it doesn't need a switch, that would be a good argument for moving it into core, at least to my way of thinking. But if it's something that needs to be enabled with a postmaster restart anyway, meh. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)
On mån, 2010-11-22 at 12:37 +0100, Magnus Hagander wrote: Do we want to backpatch it into contrib/? It's not a bug fix or an upgrading aid, so no. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Peter Eisentraut pete...@gmx.net writes: On mån, 2010-11-22 at 12:37 +0100, Magnus Hagander wrote: Do we want to backpatch it into contrib/? It's not a bug fix or an upgrading aid, so no. I'm less than thrilled about back-patching this, too. It seems to fly in the face of all our historical practice. If you drop the bit about back-patching, then I don't particularly care whether it goes into core or contrib for 9.1 --- whichever packaging makes the most sense is fine. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)
On Mon, Nov 22, 2010 at 16:33, Tom Lane t...@sss.pgh.pa.us wrote: Peter Eisentraut pete...@gmx.net writes: On mån, 2010-11-22 at 12:37 +0100, Magnus Hagander wrote: Do we want to backpatch it into contrib/? It's not a bug fix or an upgrading aid, so no. I'm less than thrilled about back-patching this, too. It seems to fly in the face of all our historical practice. If you drop the bit about back-patching, then I don't particularly care whether it goes into core or contrib for 9.1 --- whichever packaging makes the most sense is fine. My suggestion in the first place was not to backpatch it - I just wanted to get peoples opinions. I'm perfectly happy if we keep it somewhere else for the time being - as long as we make the binaries easily available. But that can go on the wiki for example. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)
On Mon, Nov 22, 2010 at 15:15, Robert Haas robertmh...@gmail.com wrote: On Mon, Nov 22, 2010 at 6:37 AM, Magnus Hagander mag...@hagander.net wrote: Do we want to backpatch it into contrib/? Adding a new module there seems kind of wrong - probably better to keep the source separate and just publish the DLL files for people who do debugging? If this works without changes to core, I see little reason not to back-patch it into contrib. Our primary concern with back-patching is to avoid doing things that might break existing installations, but if there aren't any core changes, I don't really see how that can be an issue here. It seems to me that it's probably simpler for us and our users to keep the debugging tools together with our main tree. However, I am not clear what benefit we get from moving this into core in 9.1. If it's still going to require a full postmaster restart, the GUC you have to change may as well be shared_preload_libraries as a new one. on-by-default is what we gain. I think that's fairly big... -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Magnus Hagander mag...@hagander.net writes: on-by-default is what we gain. I think that's fairly big... Only if that's actually what we want, which is far from clear in this corner. There are good reasons why most Linux distros configure daemons not to dump core by default. It's annoying when we are trying to debug a Postgres crash, but that doesn't mean the reasons aren't good. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)
On Mon, Nov 22, 2010 at 10:17 AM, Peter Eisentraut pete...@gmx.net wrote: On mån, 2010-11-22 at 12:37 +0100, Magnus Hagander wrote: Do we want to backpatch it into contrib/? It's not a bug fix or an upgrading aid, so no. I don't see why an upgrading aid would be worthy of back-patching, but not a debugging aid. I'd certainly prioritize those in the other order. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Robert Haas robertmh...@gmail.com writes: I don't see why an upgrading aid would be worthy of back-patching, but not a debugging aid. I'd certainly prioritize those in the other order. I think the sort of upgrading aid Peter has in mind is the kind where it's entirely useless if it's not back-patched, because it has to run in the pre-upgraded server. We've discussed such things before in the context of in-place upgrade, though I believe there have been no actual instances as yet. I'm not really sure why we're even considering the notion of back-patching this item. Doing so would not fit with any past practice or agreed-on project management practices, not even under our lax standards for contrib (and I keep hearing people claim that contrib is or should be as trustworthy as core, anyway). Since when do we back-patch significant features that have not been through a beta test cycle? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)
On Mon, Nov 22, 2010 at 12:30 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: I don't see why an upgrading aid would be worthy of back-patching, but not a debugging aid. I'd certainly prioritize those in the other order. I think the sort of upgrading aid Peter has in mind is the kind where it's entirely useless if it's not back-patched, because it has to run in the pre-upgraded server. We've discussed such things before in the context of in-place upgrade, though I believe there have been no actual instances as yet. I'm not really sure why we're even considering the notion of back-patching this item. Doing so would not fit with any past practice or agreed-on project management practices, not even under our lax standards for contrib (and I keep hearing people claim that contrib is or should be as trustworthy as core, anyway). Since when do we back-patch significant features that have not been through a beta test cycle? I am as conservative about back-patching as anybody here, but debugging on Windows is not an easy thing to do, and I strongly suspect we are going to point people experiencing crashes on Windows to this code whether it's part of our official distribution or not. I don't see what we get out of insisting that people install it separately. This is a tool that is only intended to be used when PostgreSQL is CRASHING, so arguing that we shouldn't include the code because it might not be stable doesn't carry much water AFAICS. As far as I understand it, we don't back-patch new features because of the risk of breaking things, but in this case refusing to back-patch the code seems more likely to prevent adequate diagnosis of what is already broken. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Magnus Hagander mag...@hagander.net writes: * On Unix, the core file is dropped in the database directory, we don't have a separate directory for crashdumps. If we want to be consistent, we should do that here too. I do think that storing them in a directory like crashdumps is better, but I just wanted to raise the comment. Just a note on that - it's by no means universal that Unix systems will put the core files in $PGDATA. OS X likes to put them in /cores, which I think is a convention shared with some other BSDish systems. On Linux I believe it's possible to configure where the core goes via environment settings. * However, when storing it in crashdumps, I think the code would need to create that directory if it does not exist, doesn't it? If it didn't do so, then manual creation/removal of the directory could be used as an on/off switch for the feature. Which would have a number of advantages, not least that you don't need to have the crash dumper dependent on GUC working. I haven't looked at the patch but this discussion makes it sound like the dumper is dependent on an uncomfortably large amount of backend code being functional. You need to minimize the number of assumptions of that sort. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Robert Haas robertmh...@gmail.com writes: I am as conservative about back-patching as anybody here, but debugging on Windows is not an easy thing to do, and I strongly suspect we are going to point people experiencing crashes on Windows to this code whether it's part of our official distribution or not. I don't see what we get out of insisting that people install it separately. This is a tool that is only intended to be used when PostgreSQL is CRASHING, so arguing that we shouldn't include the code because it might not be stable doesn't carry much water AFAICS. As far as I understand it, we don't back-patch new features because of the risk of breaking things, but in this case refusing to back-patch the code seems more likely to prevent adequate diagnosis of what is already broken. Well, if we're going to hand out prebuilt DLLs to people, we can do that without having back-patched the code officially. But more to the point is that it's not clear that we're going to end up with a contrib module at all going forward (a core feature would clearly be a lot more reliable), and I really do not wish to get involved with maintaining two independent versions of this code. This argument seems to boil down to we have to have this yesterday, which I don't buy for a minute. If it's as critical as that, why did it take this long for someone to write it? I do NOT agree that this feature is important enough to justify a free pass around our normal management and quality assurance processes. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)
On 22.11.2010 19:47, Robert Haas wrote: I am as conservative about back-patching as anybody here, but debugging on Windows is not an easy thing to do, and I strongly suspect we are going to point people experiencing crashes on Windows to this code whether it's part of our official distribution or not. This whole thing makes me wonder: is there truly no reliable, simple method to tell Windows to create a core dump on crash, without writing custom code for it. I haven't seen one, but I find it amazing if there isn't. We can't be alone with this. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)
On Mon, Nov 22, 2010 at 18:46, Tom Lane t...@sss.pgh.pa.us wrote: Magnus Hagander mag...@hagander.net writes: * However, when storing it in crashdumps, I think the code would need to create that directory if it does not exist, doesn't it? If it didn't do so, then manual creation/removal of the directory could be used as an on/off switch for the feature. Which would have a number of advantages, not least that you don't need to have the crash dumper dependent on GUC working. I haven't looked at the patch but this discussion makes it sound like the dumper is dependent on an uncomfortably large amount of backend code being functional. You need to minimize the number of assumptions of that sort. No, it's dependent on close to zero backend functionality. Particularly if we take out the dependency on elog() (write_stderr is much simpler). In fact, the calls to elog() are the *only* places where it calls into the backend as it stands today. And yes, ISTM it could work reasonably well to use the creation/deletion of the directory as an on/off switch for it. Which is the default is of course up to the packager then as well ;) -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)
On Mon, Nov 22, 2010 at 18:54, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: I am as conservative about back-patching as anybody here, but debugging on Windows is not an easy thing to do, and I strongly suspect we are going to point people experiencing crashes on Windows to this code whether it's part of our official distribution or not. I don't see what we get out of insisting that people install it separately. This is a tool that is only intended to be used when PostgreSQL is CRASHING, so arguing that we shouldn't include the code because it might not be stable doesn't carry much water AFAICS. As far as I understand it, we don't back-patch new features because of the risk of breaking things, but in this case refusing to back-patch the code seems more likely to prevent adequate diagnosis of what is already broken. Well, if we're going to hand out prebuilt DLLs to people, we can do that without having back-patched the code officially. But more to the point is that it's not clear that we're going to end up with a contrib module at all going forward (a core feature would clearly be a lot more reliable), and I really do not wish to get involved with maintaining two independent versions of this code. I think the reasonable options are either don't backpatch at all or backpatch the same way as we put it in HEAD, which is probably included in backend. I agree that sticking it in contrib is a half-measure that we shouldn't do. *IF* we go with a contrib module for 9.1 as well, we could backpatch as contrib module, but I think that's the only case. This argument seems to boil down to we have to have this yesterday, which I don't buy for a minute. If it's as critical as that, why did it take this long for someone to write it? I do NOT agree that this feature is important enough to justify a free pass around our normal management and quality assurance processes. Agreed. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)
On Mon, Nov 22, 2010 at 18:56, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 22.11.2010 19:47, Robert Haas wrote: I am as conservative about back-patching as anybody here, but debugging on Windows is not an easy thing to do, and I strongly suspect we are going to point people experiencing crashes on Windows to this code whether it's part of our official distribution or not. This whole thing makes me wonder: is there truly no reliable, simple method to tell Windows to create a core dump on crash, without writing custom code for it. I haven't seen one, but I find it amazing if there isn't. We can't be alone with this. You can do it without custom code but it's quite hard. And AFAIK you need to install the debugger tools package from microsoft (separate download, and not something you'll normally find on a system). There is DrWatson and the Error Reporting service you can use. DrWatson is made for interactive programs (or was the last time I looked at it). The error reporting service requires setting up a local error reporting service and configure it for reports, if you want them sent anywhere other than to Microsoft. Neither one of them is a particularly good choie. The minidumps Craig's patch does are a lot more useful. We intentionally *disable* drwatson, because it's part of what pops up an error message you have to click Ok on before your server will continue running... -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Magnus Hagander mag...@hagander.net writes: On Mon, Nov 22, 2010 at 18:46, Tom Lane t...@sss.pgh.pa.us wrote: ... I haven't looked at the patch but this discussion makes it sound like the dumper is dependent on an uncomfortably large amount of backend code being functional. No, it's dependent on close to zero backend functionality. Particularly if we take out the dependency on elog() (write_stderr is much simpler). In fact, the calls to elog() are the *only* places where it calls into the backend as it stands today. Well, in the contrib-module guise, it's dependent on shared_preload_libraries or local_preload_libraries, which at least involves guc and dfmgr working pretty well (and not only in the postmaster, but during backend startup). regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)
On Mon, Nov 22, 2010 at 19:39, Tom Lane t...@sss.pgh.pa.us wrote: Magnus Hagander mag...@hagander.net writes: On Mon, Nov 22, 2010 at 18:46, Tom Lane t...@sss.pgh.pa.us wrote: ... I haven't looked at the patch but this discussion makes it sound like the dumper is dependent on an uncomfortably large amount of backend code being functional. No, it's dependent on close to zero backend functionality. Particularly if we take out the dependency on elog() (write_stderr is much simpler). In fact, the calls to elog() are the *only* places where it calls into the backend as it stands today. Well, in the contrib-module guise, it's dependent on shared_preload_libraries or local_preload_libraries, which at least involves guc and dfmgr working pretty well (and not only in the postmaster, but during backend startup). Yes, sorry. My mindset was in the version that'll go into 9.1. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)
On mån, 2010-11-22 at 19:56 +0200, Heikki Linnakangas wrote: This whole thing makes me wonder: is there truly no reliable, simple method to tell Windows to create a core dump on crash, without writing custom code for it. I haven't seen one, but I find it amazing if there isn't. We can't be alone with this. Well, there is no reliable and simple method to rename a file on Windows, so what can you expect ... ;-) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)
On Mon, Nov 22, 2010 at 1:28 PM, Magnus Hagander mag...@hagander.net wrote: I think the reasonable options are either don't backpatch at all or backpatch the same way as we put it in HEAD, which is probably included in backend. I agree that sticking it in contrib is a half-measure that we shouldn't do. *IF* we go with a contrib module for 9.1 as well, we could backpatch as contrib module, but I think that's the only case. I agree with this, certainly. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)
On 4/10/2010 8:27 PM, Heikki Linnakangas wrote: On 04.10.2010 15:21, Craig Ringer wrote: Thinking about it, a possibly better alternative is to ship it as a library as is done with the pl/pgsql debugger, and use (I think) shared_preload_libraries to load it when desired. That way there's zero cost if disabled, though a somewhat higher cost if enabled. Hmm. That'll let me put a test version together that'll work with 9.0 as well, so that seems like a no-brainer really, it's just a better way to do it. Time for a Pg-on-windows build, yay. If it's not a lot of code, it's better to have it built-in always. Loading extra libraries in debug-mode could lead to heisenbugs. It turns out that the basic minidumps are small (21kb here) and very fast to generate. It may well be worth leaving it enabled all the time after all. I just need to time how long the handler registration takes - though as LOAD of the current handler implemented as a contrib module takes 2.1ms, and LOAD of an module with an empty _PG_init() also takes 2.1ms, I'm guessing not long. I've attached an early version for people to play with if anyone's interested. It currently contains a bunch of elog() messages to report on its progress - though I'm not at all sure elog() should be used in the final version given that Pg is crashing at the time the handler is called and there's no guarantee elog() is safe to call. It also doesn't offer any way to set the dump type yet, it's always the minimal dump with exception info, registers and stack only. Finally, a crashme function is included to trigger a reliable unhandled exception on demand, for initial testing. Usage with Pg built from source on Windows: - Drop crashdump.c and the Makefile in contrib/crashdump - Run build.bat and install.bat - Create a crashdumps directory inside your datadir, and make sure the server has read/write/create permission on it. - add 'crashdump' to shared_preload_libraries in postgresql.conf - Start the server as normal - Start psql and issue: -- CREATE OR REPLACE FUNCTION crashdump_crashme() RETURNS void AS '$libdir/crashdump','crashdump_crashme' LANGUAGE C; -- SELECT crashdump_crashme(); Your backend should crash. In the error log (and, depending on how the buffering works out, maybe in psql) you should see: WARNING: loading dll WARNING: loading dll try 1, WARNING: loading dll try 2, 7388 WARNING: pdump: 738C70D8 WARNING: Generating dumppath WARNING: Generated dumppath WARNING: dumping to path: crashdumps\backend.dmp WARNING: outfile: 00B0 WARNING: about to dump NOTICE: crashdump: wrote crash dump to crashdumps\postgres-4341.mdmp Once you have the dump file, you can fire up windbg from the Debugging Tools for Windows and use File-Open Crash Dump to open it. The .excr command will report what the exception that caused the crash was, producing output like this: 0:000 .ecxr eax= ebx= ecx=02a1e290 edx=73831014 esi=02a1e188 edi=00c3f798 eip=738313b2 esp=00c3f724 ebp=02a1e280 iopl=0 nv up ei pl zr na pe nc cs=001b ss=0023 ds=0023 es=0023 fs=003b gs= efl=00010246 crashdump!crashdump_crashme+0x2: 738313b2 c7000100mov dword ptr [eax],1ds:0023:= and if possible opening the postgresql source file the crash happened in to the appropriate line: Datum crashdump_crashme(PG_FUNCTION_ARGS) { int * ptr = NULL; *ptr = 1;--- highlighted return NULL; } If you're using Visual Studio Professional (not the free Express edition, unfortunately) you should also be able to debug the crash dump that way. I don't have it to test with. -- Craig Ringer Tech-related writing at http://soapyfrogs.blogspot.com/ /*- * * crashdump.c *Automatic crash dump creation for PostgreSQL on Windows * *- */ #include postgres.h #include fmgr.h #define VC_EXTRALEAN #include windows.h #include string.h #include dbghelp.h PG_MODULE_MAGIC; #if !(defined(_WIN32) || defined(_WIN64)) /* Don't add lots of ifdefs here - build different files for different platforms instead, * if adding crash dump support for additional platforms. */ #error crashdump_win32.c is only supported on MS Windows #endif /* * Much of the following code is based on CodeProject and MSDN examples, * particularly http://www.codeproject.com/KB/debug/postmortemdebug_standalone1.aspx * * Useful MSDN articles: * * http://msdn.microsoft.com/en-us/library/ms679294(VS.85).aspx * http://msdn.microsoft.com/en-us/library/ms679291(VS.85).aspx * http://msdn.microsoft.com/en-us/library/ms680519(v=VS.85).aspx * http://msdn.microsoft.com/en-us/library/ms680360(VS.85).aspx */ // FIXME/TODO: elog is probably unsafe when we're crashing out. Should we just write to stderr? typedef BOOL (WINAPI
Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)
OK, it's pretty much ready for proper testing now. If a few people are happy with the results and think it's a good idea I'll chuck it in the commitfest app. As built, the crash dump handler works with a stock PostgreSQL 9.0 (release build) as shipped in EDB's installer. Just drop crashdump.dll in lib\, optionally pop the dbghelp.dll redist in bin\, add 'crashdump' to shared_preload_libraries, and crash some backends however you feel like doing so. The current build of crashdump.dll for release versions of PostgreSQL 9.0 on 32-bit Windows is here: http://www.postnewspapers.com.au/~craig/webfiles/crashdump.dll If folks are happy with how this works, all it needs is: - Consideration of whether elog should be used or not. I'm inclined to suggest using elog to tell the user about the dump, but only after the crash dump has been written successfully. - Comments on whether this should be left as a loadable module, or integerated into core so it loads early in backend startup. The latter will permit crash dumping of early backend startup problems, and will have tiny overhead because there's no DLL to find and load. OTOH, it's harder to pull out if somehow something breaks. I'd want to start with loadable module in shared_preload_libraries and if that proves useful, only then consider integrating in core. I'm way too bad a programmer to want my code anywhere near Pg's core without plenty of real world testing. - (Maybe) a method to configure the crash dump type. I'm not too sure it's necessary given the set of dump flags I've landed up with, so I'd leave this be unless it proves to be necessary in real-world testing. Remember that with these crash dumps the user only has to email the (~4MB in my tests) crash dump. They don't need debugging tools or the ability to use them, only the recipient does. I'm using a tweaked set of minidump flags now, to generate considerably bigger dumps (around 4MB for the configuration I'm testing) that contain pretty much everything except shared memory contents, the contents of memory mapped files, and the loaded read-only code segments of the executables and DLLs. You can see the results of using it to debug that autovac crash at the end of this mail. When testing the script provided by Andrea Peri, when the autovacuum worker crashes, the message: 2010-10-05 22:23:44 WST 2040 WARNING: crashdump: wrote crash dump to crashdumps\postgres-2040.mdmp is emitted before the process resumes crashing out as it would've normally. Opening and displaying that dump in windbg shows a useful stack trace from the crashing process, and it's possible to examine the state of local/global variables etc. 0:000 .ecxr eax=90fe ebx=040af140 ecx=68f08610 edx=040af248 esi=011f64d4 edi=040b001c eip=015d5267 esp=0055f1cc ebp=011f5bc8 iopl=0 nv up ei pl zr na pe nc cs=001b ss=0023 ds=0023 es=0023 fs=003b gs= efl=00010246 postgres!pfree+0x7: 015d5267 8b5004 mov edx,dword ptr [eax+4] ds:0023:9102= 0:000 ~*k . 0 Id: 7f8.7b8 Suspend: 0 Teb: 7ffdf000 Unfrozen ChildEBP RetAddr 0055e210 75b4c1ee ntdll!KiFastSystemCallRet 0055e250 76e7100e KERNELBASE!FindClose+0x93 0055e514 679160c3 kernel32!_SEH_epilog4_GS+0xa 0055e6cc 779734e0 dbghelp!Win32LiveSystemProvider::OpenMapping+0x2c3 0055e7a8 7797353a ntdll!RtlpAllocateHeap+0xab2 0055e828 77965edc ntdll!RtlAllocateHeap+0x23a 0055e840 76e7123a ntdll!ZwWriteFile+0xc 0055e874 67916861 kernel32!WriteFileImplementation+0x76 0055e8ac 67916910 dbghelp!Win32LiveSystemProvider::ReadVirtual+0x71 0055e8d8 67908f09 dbghelp!Win32LiveSystemProvider::ReadAllVirtual+0x30 0055e910 679094b4 dbghelp!WriteMemoryFromProcess+0xa9 0055e9a8 6790d522 dbghelp!WriteThreadList+0x184 0055e9c0 6790e65a dbghelp!WriteDumpData+0x102 0055eb58 6790e9cb dbghelp!MiniDumpProvideDump+0x3fa 0055ebd0 73231353 dbghelp!MiniDumpWriteDump+0x1cb 0055ed14 76e82c4a crashdump!crashDumpHandler+0x183 [c:\users\craig\developer\postgresql-9.0.0\contrib\crashdump\crashdump.c @ 159] 0055ed9c 77995af4 kernel32!UnhandledExceptionFilter+0x127 0055eda4 7793d964 ntdll!__RtlUserThreadStart+0x62 0055edb8 7793d7fc ntdll!_EH4_CallFilterFunc+0x12 0055ede0 779665f9 ntdll!_except_handler4+0x8e 0055ee04 779665cb ntdll!ExecuteHandler2+0x26 0055eeb4 77966457 ntdll!ExecuteHandler+0x24 0055eeb4 015d5267 ntdll!KiUserExceptionDispatcher+0xf 0055f1c8 0139eee7 postgres!pfree+0x7 [c:\pginstaller-repo\postgres.windows\src\backend\utils\mmgr\mcxt.c @ 591] 0055f1e0 0139f14a postgres!examine_attribute+0x207 [c:\pginstaller-repo\postgres.windows\src\backend\commands\analyze.c @ 877] 0055f284 0139f94c postgres!do_analyze_rel+0x24a [c:\pginstaller-repo\postgres.windows\src\backend\commands\analyze.c @ 357] 0055f2ac 013eb74a postgres!analyze_rel+0x1bc [c:\pginstaller-repo\postgres.windows\src\backend\commands\analyze.c @ 232] 0055f330 014b30c6 postgres!vacuum+0x1ea
Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)
On 04.10.2010 15:21, Craig Ringer wrote: Thinking about it, a possibly better alternative is to ship it as a library as is done with the pl/pgsql debugger, and use (I think) shared_preload_libraries to load it when desired. That way there's zero cost if disabled, though a somewhat higher cost if enabled. Hmm. That'll let me put a test version together that'll work with 9.0 as well, so that seems like a no-brainer really, it's just a better way to do it. Time for a Pg-on-windows build, yay. If it's not a lot of code, it's better to have it built-in always. Loading extra libraries in debug-mode could lead to heisenbugs. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)
On 4/10/2010 8:27 PM, Heikki Linnakangas wrote: On 04.10.2010 15:21, Craig Ringer wrote: Thinking about it, a possibly better alternative is to ship it as a library as is done with the pl/pgsql debugger, and use (I think) shared_preload_libraries to load it when desired. That way there's zero cost if disabled, though a somewhat higher cost if enabled. Hmm. That'll let me put a test version together that'll work with 9.0 as well, so that seems like a no-brainer really, it's just a better way to do it. Time for a Pg-on-windows build, yay. If it's not a lot of code, it's better to have it built-in always. Loading extra libraries in debug-mode could lead to heisenbugs. Good point. The built-in approach would also permit it to be turned on in an already-running server, which would be nice as for those fun only-shows-up-in-production bugs where you may not want to restart Pg. I'll chuck together a library version first, though, so it can be tested easily on existing installs of Pg 9.0. Compiling Pg on Windows isn't as quick and easy as on *nix so that should help for testing/consideration. If people are happy with the results I'll put together a patch to integrate it directly instead of using a library. -- Craig Ringer Tech-related writing at http://soapyfrogs.blogspot.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)
On 4/10/2010 8:09 PM, Dave Page wrote: Any idea of the performance overhead? A GUC read and string compare when off, and (untested as yet) I expect little more when on. Thinking about it, a possibly better alternative is to ship it as a library as is done with the pl/pgsql debugger, and use (I think) shared_preload_libraries to load it when desired. That way there's zero cost if disabled, though a somewhat higher cost if enabled. Hmm. That'll let me put a test version together that'll work with 9.0 as well, so that seems like a no-brainer really, it's just a better way to do it. Time for a Pg-on-windows build, yay. -- Craig Ringer Tech-related writing at http://soapyfrogs.blogspot.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers