Re: Using ProcSignal to get memory context stats from a running backend

2018-01-23 Thread Alvaro Herrera
Did this go anywhere?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Using ProcSignal to get memory context stats from a running backend

2017-12-27 Thread Maksim Milyutin

On 27.12.2017 10:44, Craig Ringer wrote:

On 22 December 2017 at 23:19, Maksim Milyutin > wrote:


Noticing the interest in the calling some routines on the remote
backend through signals, in parallel thread[1] I have proposed the
possibility to define user defined signal handlers in extensions.
There is a patch taken from pg_query_state module.

1.

https://www.postgresql.org/message-id/3f905f10-cf7a-d4e0-64a1-7fd9b8351592%40gmail.com




Haven't read the patch, but the idea seems useful if a bit hairy in 
practice. It'd be done on a "use at your own risk, and if it breaks 
you get to keep the pieces" basis, where no guarantees are made by Pg 
about how and when the function is called so it must be utterly 
defensive. The challenge there being that you can't always learn 
enough about the state of the system without doing things that might 
break it, so lots of things you could do would be fragile.


Yes, I agree with your thesis that the state of the system have to be 
checked/rechecked before starting its manipulation in signal handler. 
And the responsibility is down to developer of extension to provide the 
necessary level of defensive. Perhaps, it makes sense to add try-catch 
block around signal handler to interrupt potential errors therefrom.


--
Regards,
Maksim Milyutindefe



Re: Using ProcSignal to get memory context stats from a running backend

2017-12-26 Thread Craig Ringer
On 22 December 2017 at 23:19, Maksim Milyutin  wrote:

> On 22.12.2017 16:56, Craig Ringer wrote:
>
> On 22 December 2017 at 20:50, Maksim Milyutin 
> wrote:
>
>> On 19.12.2017 16:54, Pavel Stehule wrote:
>>
>> sorry for small offtopic. Can be used this mechanism for log of executed
>> plan or full query?
>>
>>
> That's a really good idea. I'd love to be able to pg_explain_backend(...)
>
> I left the mechanism as a generic diagnostic signal exactly so that we
> could add other things we wanted to be able to ask backends. I think a
> follow-on patch that adds support for dumping explain-format plans would be
> great, if it's practical to do that while a query's already running.
>
>
> Noticing the interest in the calling some routines on the remote backend
> through signals, in parallel thread[1] I have proposed the possibility to
> define user defined signal handlers in extensions. There is a patch taken
> from pg_query_state module.
>
> 1. https://www.postgresql.org/message-id/3f905f10-cf7a-d4e0-
> 64a1-7fd9b8351592%40gmail.com
>
>
Haven't read the patch, but the idea seems useful if a bit hairy in
practice. It'd be done on a "use at your own risk, and if it breaks you get
to keep the pieces" basis, where no guarantees are made by Pg about how and
when the function is called so it must be utterly defensive. The challenge
there being that you can't always learn enough about the state of the
system without doing things that might break it, so lots of things you
could do would be fragile.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Using ProcSignal to get memory context stats from a running backend

2017-12-22 Thread Maksim Milyutin

On 22.12.2017 16:56, Craig Ringer wrote:

On 22 December 2017 at 20:50, Maksim Milyutin > wrote:


On 19.12.2017 16:54, Pavel Stehule wrote:


sorry for small offtopic. Can be used this mechanism for log of
executed plan or full query?



That's a really good idea. I'd love to be able to pg_explain_backend(...)

I left the mechanism as a generic diagnostic signal exactly so that we 
could add other things we wanted to be able to ask backends. I think a 
follow-on patch that adds support for dumping explain-format plans 
would be great, if it's practical to do that while a query's already 
running.


Noticing the interest in the calling some routines on the remote backend 
through signals, in parallel thread[1] I have proposed the possibility 
to define user defined signal handlers in extensions. There is a patch 
taken from pg_query_state module.


1. 
https://www.postgresql.org/message-id/3f905f10-cf7a-d4e0-64a1-7fd9b8351592%40gmail.com


--
Regards,
Maksim Milyutin



Re: Using ProcSignal to get memory context stats from a running backend

2017-12-22 Thread Pavel Stehule
2017-12-22 13:50 GMT+01:00 Maksim Milyutin :

> On 19.12.2017 16:54, Pavel Stehule wrote:
>
> Hi
>
> 2017-12-19 14:44 GMT+01:00 Craig Ringer :
>
>> On 18 December 2017 at 10:05, Robert Haas  wrote:
>>
>>> On Thu, Dec 14, 2017 at 9:34 PM, Craig Ringer 
>>> wrote:
>>> > On 15 December 2017 at 09:24, Greg Stark  wrote:
>>> >> Another simpler option would be to open up a new file in the log
>>> >> directory
>>> >
>>> > ... if we have one.
>>> >
>>> > We might be logging to syslog, or whatever else.
>>> >
>>> > I'd rather keep it simple(ish).
>>>
>>> +1.  I still think just printing it to the log is fine.
>>>
>>>
>> Here we go. Implemented pretty much as outlined above. A new
>> pg_diag_backend(pid) function sends a new ProcSignalReason 
>> PROCSIG_DIAG_REQUEST.
>> It's handled by CHECK_FOR_INTERRUPTS() and logs MemoryContextStats() output
>> to elog(...).
>>
>> I didn't want to mess with the MemoryContextMethods and expose a
>> printf-wrapper style typedef in memnodes.h, so I went with a hook global.
>> It's a diagnostic routine so I don't think that's going to be a great
>> bother. By default it's set to write_stderr. That just writes to vfprintf
>> on unix so the outcome's the same as our direct use of fprintf now.
>>
>> On Windows, though, using write_stderr will make Pg attempt to write
>> memory context dumps to the eventlog with ERROR level if running as a
>> service with no console. That means we vsnprintf the string into a static
>> buffer first. I'm not 100% convinced of the wisdom of that because it could
>> flood the event log, which is usually size limited by # of events and
>> recycled. It'd be better to write one event than write one per memory
>> context line, but it's not safe to do that when OOM. I lean toward this
>> being a win: at least Windows users should have some hope of finding out
>> why Pg OOM'd, which currently they don't when it runs as a service. If we
>> don't, we should look at using SetStdHandle to write stderr to a secondary
>> logfile instead.
>>
>> I'm happy to just add a trivial vfprintf wrapper so we preserve exactly
>> the same behaviour instead, but I figured I'd start with reusing
>> write_stderr.
>>
>> I'd really like to preserve the writing to elog(...) not fprintf, because
>> on some systems simply finding where stderr is written can be a challenge,
>> if it's not going to /dev/null via some detached session. Is it in
>> journald? In some separate log? Being captured by the logging collector
>> (and probably silently eaten)? Who knows!
>>
>> Using elog(...) provides a log_line_prefix and other useful info to
>> associate the dump with the process of interest and what it's doing at the
>> time.
>>
>> Also, an astonishing number of deployed systems I've worked with actually
>> don't put the pid or anything useful in log_line_prefix to make grouping up
>> multi-line output practical. Which is insane. But it's only PGC_SIGHUP so
>> fixing it is easy enough.
>>
>
> sorry for small offtopic. Can be used this mechanism for log of executed
> plan or full query?
>
>
> This idea (but without logging) is implemented in the work on pg_progress
> command proposed by Remi Colinet[1] and in extension pg_query_state[2].
>
> 1. https://www.postgresql.org/message-id/CADdR5nxQUSh5kCm9MKmNga8%
> 2Bc1JLxLHDzLhAyXpfo9-Wmc6s5g%40mail.gmail.com
> 2. https://github.com/postgrespro/pg_query_state
>
>
I remember the discussion - and I hope so one time we will have some
EXPLAIN PROCESS pid command.

Using signal and pg log can be very simple solution immediately available

Regards

Pavel

> --
> Regards,
> Maksim Milyutin
>
>


Re: Using ProcSignal to get memory context stats from a running backend

2017-12-22 Thread Craig Ringer
On 22 December 2017 at 20:50, Maksim Milyutin  wrote:

> On 19.12.2017 16:54, Pavel Stehule wrote:
>
> Hi
>
> 2017-12-19 14:44 GMT+01:00 Craig Ringer :
>
>> On 18 December 2017 at 10:05, Robert Haas  wrote:
>>
>>> On Thu, Dec 14, 2017 at 9:34 PM, Craig Ringer 
>>> wrote:
>>> > On 15 December 2017 at 09:24, Greg Stark  wrote:
>>> >> Another simpler option would be to open up a new file in the log
>>> >> directory
>>> >
>>> > ... if we have one.
>>> >
>>> > We might be logging to syslog, or whatever else.
>>> >
>>> > I'd rather keep it simple(ish).
>>>
>>> +1.  I still think just printing it to the log is fine.
>>>
>>>
>> Here we go. Implemented pretty much as outlined above. A new
>> pg_diag_backend(pid) function sends a new ProcSignalReason 
>> PROCSIG_DIAG_REQUEST.
>> It's handled by CHECK_FOR_INTERRUPTS() and logs MemoryContextStats() output
>> to elog(...).
>>
>> I didn't want to mess with the MemoryContextMethods and expose a
>> printf-wrapper style typedef in memnodes.h, so I went with a hook global.
>> It's a diagnostic routine so I don't think that's going to be a great
>> bother. By default it's set to write_stderr. That just writes to vfprintf
>> on unix so the outcome's the same as our direct use of fprintf now.
>>
>> On Windows, though, using write_stderr will make Pg attempt to write
>> memory context dumps to the eventlog with ERROR level if running as a
>> service with no console. That means we vsnprintf the string into a static
>> buffer first. I'm not 100% convinced of the wisdom of that because it could
>> flood the event log, which is usually size limited by # of events and
>> recycled. It'd be better to write one event than write one per memory
>> context line, but it's not safe to do that when OOM. I lean toward this
>> being a win: at least Windows users should have some hope of finding out
>> why Pg OOM'd, which currently they don't when it runs as a service. If we
>> don't, we should look at using SetStdHandle to write stderr to a secondary
>> logfile instead.
>>
>> I'm happy to just add a trivial vfprintf wrapper so we preserve exactly
>> the same behaviour instead, but I figured I'd start with reusing
>> write_stderr.
>>
>> I'd really like to preserve the writing to elog(...) not fprintf, because
>> on some systems simply finding where stderr is written can be a challenge,
>> if it's not going to /dev/null via some detached session. Is it in
>> journald? In some separate log? Being captured by the logging collector
>> (and probably silently eaten)? Who knows!
>>
>> Using elog(...) provides a log_line_prefix and other useful info to
>> associate the dump with the process of interest and what it's doing at the
>> time.
>>
>> Also, an astonishing number of deployed systems I've worked with actually
>> don't put the pid or anything useful in log_line_prefix to make grouping up
>> multi-line output practical. Which is insane. But it's only PGC_SIGHUP so
>> fixing it is easy enough.
>>
>
> sorry for small offtopic. Can be used this mechanism for log of executed
> plan or full query?
>
>
That's a really good idea. I'd love to be able to pg_explain_backend(...)

I left the mechanism as a generic diagnostic signal exactly so that we
could add other things we wanted to be able to ask backends. I think a
follow-on patch that adds support for dumping explain-format plans would be
great, if it's practical to do that while a query's already running.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Using ProcSignal to get memory context stats from a running backend

2017-12-22 Thread Maksim Milyutin

On 19.12.2017 16:54, Pavel Stehule wrote:


Hi

2017-12-19 14:44 GMT+01:00 Craig Ringer >:


On 18 December 2017 at 10:05, Robert Haas > wrote:

On Thu, Dec 14, 2017 at 9:34 PM, Craig Ringer
> wrote:
> On 15 December 2017 at 09:24, Greg Stark > wrote:
>> Another simpler option would be to open up a new file in
the log
>> directory
>
> ... if we have one.
>
> We might be logging to syslog, or whatever else.
>
> I'd rather keep it simple(ish).

+1.  I still think just printing it to the log is fine.


Here we go. Implemented pretty much as outlined above. A new
pg_diag_backend(pid) function sends a new
ProcSignalReason PROCSIG_DIAG_REQUEST. It's handled by
CHECK_FOR_INTERRUPTS() and logs MemoryContextStats() output to
elog(...).

I didn't want to mess with the MemoryContextMethods and expose a
printf-wrapper style typedef in memnodes.h, so I went with a hook
global. It's a diagnostic routine so I don't think that's going to
be a great bother. By default it's set to write_stderr. That just
writes to vfprintf on unix so the outcome's the same as our direct
use of fprintf now.

On Windows, though, using write_stderr will make Pg attempt to
write memory context dumps to the eventlog with ERROR level if
running as a service with no console. That means we vsnprintf the
string into a static buffer first. I'm not 100% convinced of the
wisdom of that because it could flood the event log, which is
usually size limited by # of events and recycled. It'd be better
to write one event than write one per memory context line, but
it's not safe to do that when OOM. I lean toward this being a win:
at least Windows users should have some hope of finding out why Pg
OOM'd, which currently they don't when it runs as a service. If we
don't, we should look at using SetStdHandle to write stderr to a
secondary logfile instead.

I'm happy to just add a trivial vfprintf wrapper so we preserve
exactly the same behaviour instead, but I figured I'd start with
reusing write_stderr.

I'd really like to preserve the writing to elog(...) not fprintf,
because on some systems simply finding where stderr is written can
be a challenge, if it's not going to /dev/null via some detached
session. Is it in journald? In some separate log? Being captured
by the logging collector (and probably silently eaten)? Who knows!

Using elog(...) provides a log_line_prefix and other useful info
to associate the dump with the process of interest and what it's
doing at the time.

Also, an astonishing number of deployed systems I've worked with
actually don't put the pid or anything useful in log_line_prefix
to make grouping up multi-line output practical. Which is insane.
But it's only PGC_SIGHUP so fixing it is easy enough.


sorry for small offtopic. Can be used this mechanism for log of 
executed plan or full query?


This idea (but without logging) is implemented in the work on 
pg_progress command proposed by Remi Colinet[1] and in extension 
pg_query_state[2].


1. 
https://www.postgresql.org/message-id/CADdR5nxQUSh5kCm9MKmNga8%2Bc1JLxLHDzLhAyXpfo9-Wmc6s5g%40mail.gmail.com

2. https://github.com/postgrespro/pg_query_state

--
Regards,
Maksim Milyutin



Re: Using ProcSignal to get memory context stats from a running backend

2017-12-21 Thread Craig Ringer
On 22 December 2017 at 13:05, Craig Ringer  wrote:

> On 21 December 2017 at 15:55, Craig Ringer  wrote:
>
>> On 21 December 2017 at 15:24, Andres Freund  wrote:
>>
>>> Hi,
>>>
>>> On 2017-12-21 15:13:13 +0800, Craig Ringer wrote:
>>> > There tons of callers to  enlargeStringInfo, so a 'noerror' parameter
>>> would
>>> > be viable.
>>>
>>> Not sure what you mean with that sentence?
>>>
>>
>> Mangled in editing and sent prematurely, disregard.
>>
>> There are NOT tons of callers to  enlargeStringInfo, so adding a
>> parameter that allowed it to return a failure result rather than ERROR on
>> OOM seems to be a reasonable option. But it relies on repalloc(), which
>> will ERROR on OOM. AFAICS we don't have "no error" variants of the memory
>> allocation routines and I'm not eager to add them. Especially since we
>> can't trust that we're not on misconfigured Linux where the OOM killer will
>> go wild instead of giving us a nice NULL result.
>>
>> So I guess that means we should probably just do individual elog(...)s
>> and live with the ugliness of scraping the resulting mess out of the logs.
>> After all, a log destination that could possibly copy and modify the string
>> being logged a couple of times it's not a good idea to try to drop the
>> whole thing into the logs in one blob. And we can't trust things like
>> syslog with large messages.
>>
>> I'll follow up with a revised patch that uses individual elog()s soon.
>>
>
> I intend to add an elog_internal_raw
>


Er, I mean

errmsg_internal_raw(const char * errmsg)

i.e. like errmsg_internal, but with no varargs or format string, to be used
when we want to prepare a simple preformatted log entry.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Using ProcSignal to get memory context stats from a running backend

2017-12-21 Thread Craig Ringer
On 21 December 2017 at 15:55, Craig Ringer  wrote:

> On 21 December 2017 at 15:24, Andres Freund  wrote:
>
>> Hi,
>>
>> On 2017-12-21 15:13:13 +0800, Craig Ringer wrote:
>> > There tons of callers to  enlargeStringInfo, so a 'noerror' parameter
>> would
>> > be viable.
>>
>> Not sure what you mean with that sentence?
>>
>
> Mangled in editing and sent prematurely, disregard.
>
> There are NOT tons of callers to  enlargeStringInfo, so adding a parameter
> that allowed it to return a failure result rather than ERROR on OOM seems
> to be a reasonable option. But it relies on repalloc(), which will ERROR on
> OOM. AFAICS we don't have "no error" variants of the memory allocation
> routines and I'm not eager to add them. Especially since we can't trust
> that we're not on misconfigured Linux where the OOM killer will go wild
> instead of giving us a nice NULL result.
>
> So I guess that means we should probably just do individual elog(...)s and
> live with the ugliness of scraping the resulting mess out of the logs.
> After all, a log destination that could possibly copy and modify the string
> being logged a couple of times it's not a good idea to try to drop the
> whole thing into the logs in one blob. And we can't trust things like
> syslog with large messages.
>
> I'll follow up with a revised patch that uses individual elog()s soon.
>

I intend to add an elog_internal_raw for this, which takes a pre-formatted
string and bypasses EVALUATE_MESSAGE. In this case, one written to a static
buffer by vsnprintf.

That bypasses two rounds of allocations by elog - expand_fmt_string for %m
substitution, and the appendStringInfoVA for formatting. And it's a whole
lot cleaner than

char buffer[2048];
...
vsnprintf(buffer, sizeof(buffer), ...)
...
elog(LOG, "%s", buffer);

It still imposes a single-line length limit, but no worse than write_stderr
already does on win32. If that's not OK, preformatting each line a
StringInfo before dumping straight to elog works too.

Complaints or seem OK?

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Using ProcSignal to get memory context stats from a running backend

2017-12-20 Thread Craig Ringer
On 21 December 2017 at 15:24, Andres Freund  wrote:

> Hi,
>
> On 2017-12-21 15:13:13 +0800, Craig Ringer wrote:
> > There tons of callers to  enlargeStringInfo, so a 'noerror' parameter
> would
> > be viable.
>
> Not sure what you mean with that sentence?
>

Mangled in editing and sent prematurely, disregard.

There are NOT tons of callers to  enlargeStringInfo, so adding a parameter
that allowed it to return a failure result rather than ERROR on OOM seems
to be a reasonable option. But it relies on repalloc(), which will ERROR on
OOM. AFAICS we don't have "no error" variants of the memory allocation
routines and I'm not eager to add them. Especially since we can't trust
that we're not on misconfigured Linux where the OOM killer will go wild
instead of giving us a nice NULL result.

So I guess that means we should probably just do individual elog(...)s and
live with the ugliness of scraping the resulting mess out of the logs.
After all, a log destination that could possibly copy and modify the string
being logged a couple of times it's not a good idea to try to drop the
whole thing into the logs in one blob. And we can't trust things like
syslog with large messages.

I'll follow up with a revised patch that uses individual elog()s soon.

BTW, I also pgindented it in my tree, so it'll have formatting fixed up.
pgindent's handling of

static void printwrapper_stringinfo(void *extra, const char *fmt,...)
pg_attribute_printf(2, 3);

upsets me a little, since if I break the line it mangles it into

static void
printwrapper_stringinfo(void *extra, const char *fmt,...)
pg_attribute_printf(2, 3);

so it'll break line-length rules a little, but otherwise conform.

> But I'm not convinced it's worth it personally. If we OOM in response to a
> > ProcSignal request for memory context output, we're having pretty bad
> luck.
> > The output is 8k in my test. But even if it were a couple of hundred kb,
> > happening to hit OOM just then isn't great luck on modern systems with
> many
> > gigabytes of RAM.
>
> I've seen plenty memory dumps in the dozens to hundreds of
> megabytes. And imo such cases are more likely to invite use of this
> facility.
>

OK, that's more concerning then. Also impressively huge.

As written it's using MemoryContextStatsDetail(TopMemoryContext, 100) so it
shouldn't really be *getting* multimegabyte dumps, but I'm not thrilled by
hardcoding that. It doesn't merit a GUC though, so I'm not sure what to do
about it and figured I'd make it a "later" problem.


> > If that *does* happen, repalloc(...) will call
> > MemoryContextStats(TopMemoryContext) before returning NULL. So we'll get
> > our memory context dump anyway, albeit to stderr.
>
> That would still abort the query that might otherwise continue to work,
> so that seems no excuse.


Eh. It'd probably die soon enough anyway. But yeah, I agree it's not safe
to hope we can sling around up to a couple of hundred MB under presumed
memory pressure.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Using ProcSignal to get memory context stats from a running backend

2017-12-20 Thread Andres Freund
Hi,

On 2017-12-21 15:13:13 +0800, Craig Ringer wrote:
> There tons of callers to  enlargeStringInfo, so a 'noerror' parameter would
> be viable.

Not sure what you mean with that sentence?


> But I'm not convinced it's worth it personally. If we OOM in response to a
> ProcSignal request for memory context output, we're having pretty bad luck.
> The output is 8k in my test. But even if it were a couple of hundred kb,
> happening to hit OOM just then isn't great luck on modern systems with many
> gigabytes of RAM.

I've seen plenty memory dumps in the dozens to hundreds of
megabytes. And imo such cases are more likely to invite use of this
facility.


> If that *does* happen, repalloc(...) will call
> MemoryContextStats(TopMemoryContext) before returning NULL. So we'll get
> our memory context dump anyway, albeit to stderr.

That would still abort the query that might otherwise continue to work,
so that seems no excuse.

Greetings,

Andres Freund



Re: Using ProcSignal to get memory context stats from a running backend

2017-12-20 Thread Craig Ringer
On 20 December 2017 at 08:46, Craig Ringer  wrote:

> On 20 December 2017 at 02:35, Andres Freund  wrote:
>
>
>> > Yeah.  But please don't mess with MemoryContextStats per se ---
>> > I dunno about you guys but "call MemoryContextStats(TopMemoryContext)"
>> > is kinda wired into my gdb reflexes.  I think what'd make sense
>> > is a new function "MemoryContextStatsTo(context, function_pointer)".
>> > It's okay to redefine the APIs of the per-context-type functions
>> > these would call, though, because nobody calls those functions directly.
>>
>> We already have MemoryContextStatsDetail() - it seems to make sense to
>> expand that API and leave MemoryContextStats() alone. I don't think
>> there's a need for a third variant?
>>
>
> Cool, can do.
>
> I'll have to expose a typedef for the printf-wrapper callback in
> memnodes.h and add it to the stats method, which I thought would be more
> likely to get complaints than the global hook. I'm actually happier to do
> it with a passed callback.
>
> Will revise when I get a chance in the next couple of days.
>

Done.

It uses vfprintf unconditionally, even on Windows where we'd usually use
write_stderr, so it doesn't change the current MemoryContextStats behaviour.

2017-12-21 14:39:20.045 AWST [10588] pg_regress/misc_functions LOG:
 diagnostic dump requested; memory context info: TopMemoryContext: 67440
total in 5 blocks; 13376 free (6 chunks); 54064 used
  pgstat TabStatusArray lookup hash table: 8192 total in 1 blocks;
1376 free (0 chunks); 6816 used
  TopTransactionContext: 8192 total in 1 blocks; 7728 free (1
chunks); 464 used
  ...

To verify that stderr output still works properly I ran:


SELECT
  repeat('spamspamspam', 2000),
  repeat('spamspamspam', 2000),
  repeat('spamspamspam', 2000),
  repeat('spamspamspam', 2000),
  ... lots ...;


and got the expected

+ERROR:  out of memory
+DETAIL:  Failed on request of size 24004.

and memory context dump to stderr. I didn't add a TAP test for that because
I'd rather avoid exercising OOM in tests where we don't know what the
host's swap config is like, how its ulimit behaves, whether it has
craziness like the OOM killer, etc. But it might make sense to add a TAP
test to set a low ulimit and exercise OOM recovery at some point.

I've added a separate vfprintf wrapper for memory context use that doesn't
try to use the windows error log like write_stderr(...) does. If we want to
change OOM behaviour on Windows it can be done in a followup.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
From db5caa16cd3c78544f7a6e336c3d8b9118f7cdaf Mon Sep 17 00:00:00 2001
From: Craig Ringer 
Date: Tue, 19 Dec 2017 20:47:36 +0800
Subject: [PATCH v2] Add pg_diag_backend to print memory context info

The new pg_diag_backend(pid) function signals a backend to dump
a summary of its memory context state when it next calls
CHECK_FOR_INTERRUPTS(). The memory context information is
output to the server log.

This works on any backend that uses the standard
procsignal_sigusr1_handler or explicitly handles PROCSIG_DIAG_REQUEST in
its own handler. Currently this includes normal user backends, the
walsender and autovacuum workers but not the other system processes.
Background workers must handle it explicitly handle SIGUSR1 with
procsignal_sigusr1_handler.
---
 doc/src/sgml/bgworker.sgml   | 10 -
 doc/src/sgml/func.sgml   | 13 ++
 src/backend/storage/ipc/procsignal.c |  3 ++
 src/backend/tcop/postgres.c  | 66 
 src/backend/utils/adt/misc.c | 16 +++
 src/backend/utils/init/globals.c |  1 +
 src/backend/utils/mmgr/aset.c| 10 +++--
 src/backend/utils/mmgr/mcxt.c| 50 -
 src/include/catalog/pg_proc.h|  2 +
 src/include/miscadmin.h  |  4 ++
 src/include/nodes/memnodes.h | 10 -
 src/include/storage/procsignal.h |  5 ++-
 src/include/utils/memutils.h |  4 +-
 src/test/regress/expected/misc_functions.out | 12 +
 src/test/regress/sql/misc_functions.sql  |  9 
 15 files changed, 194 insertions(+), 21 deletions(-)

diff --git a/doc/src/sgml/bgworker.sgml b/doc/src/sgml/bgworker.sgml
index 4bc2b69..50bb54f 100644
--- a/doc/src/sgml/bgworker.sgml
+++ b/doc/src/sgml/bgworker.sgml
@@ -185,7 +185,8 @@ typedef struct BackgroundWorker
when the process is started or exits.  It should be 0 for workers registered
at postmaster startup time, or when the backend registering the worker does
not wish to wait for the worker to start up.  Otherwise, it should be
-   initialized to MyProcPid.
+   initialized to MyProcPid. Typically this sets the
+   receiving process's latch via its signal handler.
   
 
  

Re: Using ProcSignal to get memory context stats from a running backend

2017-12-19 Thread Craig Ringer
On 20 December 2017 at 02:35, Andres Freund  wrote:


>
>
> > Yeah.  But please don't mess with MemoryContextStats per se ---
> > I dunno about you guys but "call MemoryContextStats(TopMemoryContext)"
> > is kinda wired into my gdb reflexes.  I think what'd make sense
> > is a new function "MemoryContextStatsTo(context, function_pointer)".
> > It's okay to redefine the APIs of the per-context-type functions
> > these would call, though, because nobody calls those functions directly.
>
> We already have MemoryContextStatsDetail() - it seems to make sense to
> expand that API and leave MemoryContextStats() alone. I don't think
> there's a need for a third variant?
>

Cool, can do.

I'll have to expose a typedef for the printf-wrapper callback in memnodes.h
and add it to the stats method, which I thought would be more likely to get
complaints than the global hook. I'm actually happier to do it with a
passed callback.

Will revise when I get a chance in the next couple of days.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Using ProcSignal to get memory context stats from a running backend

2017-12-19 Thread Andres Freund
On 2017-12-19 13:17:52 -0500, Tom Lane wrote:
> Robert Haas  writes:
> > On Tue, Dec 19, 2017 at 8:44 AM, Craig Ringer  wrote:
> >> I didn't want to mess with the MemoryContextMethods and expose a
> >> printf-wrapper style typedef in memnodes.h, so I went with a hook global.
> 
> > That looks pretty grotty to me.  I think if you want to elog/ereport
> > this, you need to pass another argument to MemoryContextStats() or add
> > another memory context method.  This is pretty much a textbook example
> > of the wrong way to use a global variable, IMHO.

Agreed.


> Yeah.  But please don't mess with MemoryContextStats per se ---
> I dunno about you guys but "call MemoryContextStats(TopMemoryContext)"
> is kinda wired into my gdb reflexes.  I think what'd make sense
> is a new function "MemoryContextStatsTo(context, function_pointer)".
> It's okay to redefine the APIs of the per-context-type functions
> these would call, though, because nobody calls those functions directly.

We already have MemoryContextStatsDetail() - it seems to make sense to
expand that API and leave MemoryContextStats() alone. I don't think
there's a need for a third variant?

Greetings,

Andres Freund



Re: Using ProcSignal to get memory context stats from a running backend

2017-12-19 Thread Tom Lane
Robert Haas  writes:
> On Tue, Dec 19, 2017 at 8:44 AM, Craig Ringer  wrote:
>> I didn't want to mess with the MemoryContextMethods and expose a
>> printf-wrapper style typedef in memnodes.h, so I went with a hook global.

> That looks pretty grotty to me.  I think if you want to elog/ereport
> this, you need to pass another argument to MemoryContextStats() or add
> another memory context method.  This is pretty much a textbook example
> of the wrong way to use a global variable, IMHO.

Yeah.  But please don't mess with MemoryContextStats per se ---
I dunno about you guys but "call MemoryContextStats(TopMemoryContext)"
is kinda wired into my gdb reflexes.  I think what'd make sense
is a new function "MemoryContextStatsTo(context, function_pointer)".
It's okay to redefine the APIs of the per-context-type functions
these would call, though, because nobody calls those functions directly.

regards, tom lane



Re: Using ProcSignal to get memory context stats from a running backend

2017-12-19 Thread Robert Haas
On Tue, Dec 19, 2017 at 8:44 AM, Craig Ringer  wrote:
> I didn't want to mess with the MemoryContextMethods and expose a
> printf-wrapper style typedef in memnodes.h, so I went with a hook global.

That looks pretty grotty to me.  I think if you want to elog/ereport
this, you need to pass another argument to MemoryContextStats() or add
another memory context method.  This is pretty much a textbook example
of the wrong way to use a global variable, IMHO.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Using ProcSignal to get memory context stats from a running backend

2017-12-19 Thread Pavel Stehule
Hi

2017-12-19 14:44 GMT+01:00 Craig Ringer :

> On 18 December 2017 at 10:05, Robert Haas  wrote:
>
>> On Thu, Dec 14, 2017 at 9:34 PM, Craig Ringer 
>> wrote:
>> > On 15 December 2017 at 09:24, Greg Stark  wrote:
>> >> Another simpler option would be to open up a new file in the log
>> >> directory
>> >
>> > ... if we have one.
>> >
>> > We might be logging to syslog, or whatever else.
>> >
>> > I'd rather keep it simple(ish).
>>
>> +1.  I still think just printing it to the log is fine.
>>
>>
> Here we go. Implemented pretty much as outlined above. A new
> pg_diag_backend(pid) function sends a new ProcSignalReason 
> PROCSIG_DIAG_REQUEST.
> It's handled by CHECK_FOR_INTERRUPTS() and logs MemoryContextStats() output
> to elog(...).
>
> I didn't want to mess with the MemoryContextMethods and expose a
> printf-wrapper style typedef in memnodes.h, so I went with a hook global.
> It's a diagnostic routine so I don't think that's going to be a great
> bother. By default it's set to write_stderr. That just writes to vfprintf
> on unix so the outcome's the same as our direct use of fprintf now.
>
> On Windows, though, using write_stderr will make Pg attempt to write
> memory context dumps to the eventlog with ERROR level if running as a
> service with no console. That means we vsnprintf the string into a static
> buffer first. I'm not 100% convinced of the wisdom of that because it could
> flood the event log, which is usually size limited by # of events and
> recycled. It'd be better to write one event than write one per memory
> context line, but it's not safe to do that when OOM. I lean toward this
> being a win: at least Windows users should have some hope of finding out
> why Pg OOM'd, which currently they don't when it runs as a service. If we
> don't, we should look at using SetStdHandle to write stderr to a secondary
> logfile instead.
>
> I'm happy to just add a trivial vfprintf wrapper so we preserve exactly
> the same behaviour instead, but I figured I'd start with reusing
> write_stderr.
>
> I'd really like to preserve the writing to elog(...) not fprintf, because
> on some systems simply finding where stderr is written can be a challenge,
> if it's not going to /dev/null via some detached session. Is it in
> journald? In some separate log? Being captured by the logging collector
> (and probably silently eaten)? Who knows!
>
> Using elog(...) provides a log_line_prefix and other useful info to
> associate the dump with the process of interest and what it's doing at the
> time.
>
> Also, an astonishing number of deployed systems I've worked with actually
> don't put the pid or anything useful in log_line_prefix to make grouping up
> multi-line output practical. Which is insane. But it's only PGC_SIGHUP so
> fixing it is easy enough.
>

sorry for small offtopic. Can be used this mechanism for log of executed
plan or full query?

Regards

Pavel

>
> --
>  Craig Ringer   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>


Re: Using ProcSignal to get memory context stats from a running backend

2017-12-19 Thread Craig Ringer
On 18 December 2017 at 10:05, Robert Haas  wrote:

> On Thu, Dec 14, 2017 at 9:34 PM, Craig Ringer 
> wrote:
> > On 15 December 2017 at 09:24, Greg Stark  wrote:
> >> Another simpler option would be to open up a new file in the log
> >> directory
> >
> > ... if we have one.
> >
> > We might be logging to syslog, or whatever else.
> >
> > I'd rather keep it simple(ish).
>
> +1.  I still think just printing it to the log is fine.
>
>
Here we go. Implemented pretty much as outlined above. A new
pg_diag_backend(pid) function sends a new
ProcSignalReason PROCSIG_DIAG_REQUEST. It's handled by
CHECK_FOR_INTERRUPTS() and logs MemoryContextStats() output to elog(...).

I didn't want to mess with the MemoryContextMethods and expose a
printf-wrapper style typedef in memnodes.h, so I went with a hook global.
It's a diagnostic routine so I don't think that's going to be a great
bother. By default it's set to write_stderr. That just writes to vfprintf
on unix so the outcome's the same as our direct use of fprintf now.

On Windows, though, using write_stderr will make Pg attempt to write memory
context dumps to the eventlog with ERROR level if running as a service with
no console. That means we vsnprintf the string into a static buffer first.
I'm not 100% convinced of the wisdom of that because it could flood the
event log, which is usually size limited by # of events and recycled. It'd
be better to write one event than write one per memory context line, but
it's not safe to do that when OOM. I lean toward this being a win: at least
Windows users should have some hope of finding out why Pg OOM'd, which
currently they don't when it runs as a service. If we don't, we should look
at using SetStdHandle to write stderr to a secondary logfile instead.

I'm happy to just add a trivial vfprintf wrapper so we preserve exactly the
same behaviour instead, but I figured I'd start with reusing write_stderr.

I'd really like to preserve the writing to elog(...) not fprintf, because
on some systems simply finding where stderr is written can be a challenge,
if it's not going to /dev/null via some detached session. Is it in
journald? In some separate log? Being captured by the logging collector
(and probably silently eaten)? Who knows!

Using elog(...) provides a log_line_prefix and other useful info to
associate the dump with the process of interest and what it's doing at the
time.

Also, an astonishing number of deployed systems I've worked with actually
don't put the pid or anything useful in log_line_prefix to make grouping up
multi-line output practical. Which is insane. But it's only PGC_SIGHUP so
fixing it is easy enough.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
From 663e31ffbe471b5dac18ff6322c0ea45a948350a Mon Sep 17 00:00:00 2001
From: Craig Ringer 
Date: Tue, 19 Dec 2017 20:47:36 +0800
Subject: [PATCH v1] Add pg_diag_backend to print memory context info

The new pg_diag_backend(pid) function signals a backend to dump
a summary of its memory context state when it next calls
CHECK_FOR_INTERRUPTS(). The memory context information is
output to the server log.

This works on any backend that uses the standard
procsignal_sigusr1_handler or explicitly handles PROCSIG_DIAG_REQUEST in
its own handler. Currently this includes normal user backends, the
walsender and autovacuum workers but not the other system processes.
Background workers must handle it explicitly handle SIGUSR1 with
procsignal_sigusr1_handler.
---
 doc/src/sgml/bgworker.sgml  | 10 -
 doc/src/sgml/func.sgml  | 13 ++
 src/backend/storage/ipc/procsignal.c|  3 ++
 src/backend/tcop/postgres.c | 70 +
 src/backend/utils/adt/misc.c| 16 
 src/backend/utils/init/globals.c|  1 +
 src/backend/utils/mmgr/aset.c   |  4 +-
 src/backend/utils/mmgr/mcxt.c   | 10 +++--
 src/include/catalog/pg_proc.h   |  2 +
 src/include/miscadmin.h |  4 ++
 src/include/storage/procsignal.h|  5 ++-
 src/include/utils/memutils.h|  4 ++
 src/test/regress/sql/misc_functions.sql |  9 +
 13 files changed, 143 insertions(+), 8 deletions(-)

diff --git a/doc/src/sgml/bgworker.sgml b/doc/src/sgml/bgworker.sgml
index 4bc2b69..50bb54f 100644
--- a/doc/src/sgml/bgworker.sgml
+++ b/doc/src/sgml/bgworker.sgml
@@ -185,7 +185,8 @@ typedef struct BackgroundWorker
when the process is started or exits.  It should be 0 for workers registered
at postmaster startup time, or when the backend registering the worker does
not wish to wait for the worker to start up.  Otherwise, it should be
-   initialized to MyProcPid.
+   initialized to MyProcPid. Typically this sets the
+   receiving process's latch via its signal handler.
   
 
   Once running, the process can connect to a 

Re: Using ProcSignal to get memory context stats from a running backend

2017-12-17 Thread Robert Haas
On Thu, Dec 14, 2017 at 9:34 PM, Craig Ringer  wrote:
> On 15 December 2017 at 09:24, Greg Stark  wrote:
>> Another simpler option would be to open up a new file in the log
>> directory
>
> ... if we have one.
>
> We might be logging to syslog, or whatever else.
>
> I'd rather keep it simple(ish).

+1.  I still think just printing it to the log is fine.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Using ProcSignal to get memory context stats from a running backend

2017-12-14 Thread Craig Ringer
On 15 December 2017 at 09:24, Greg Stark  wrote:

> Another simpler option would be to open up a new file in the log
> directory


... if we have one.

We might be logging to syslog, or whatever else.

I'd rather keep it simple(ish).

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Using ProcSignal to get memory context stats from a running backend

2017-12-14 Thread Greg Stark
Another simpler option would be to open up a new file in the log
directory something like "debug_dump..txt" and print whatever you
want there. Then print out a reasonable size log entry saying "Debug
dump output to file 'debug_dump..txt'". You could provide a
function that reads such files out of the log directory or just
document how to access them using the pg_read_file().

It's not perfect but it's got most of the advantages of communicating
with the requesting process without the complexities of a DSM segment
and it's a bit more flexible too. Users can have automated monitoring
tools watch for dumps for example. And regular tools can be used to
back up and clean out old files.



Re: Using ProcSignal to get memory context stats from a running backend

2017-12-12 Thread Andres Freund
On 2017-12-12 14:16:59 -0500, Robert Haas wrote:
> On Tue, Dec 12, 2017 at 2:12 PM, Andres Freund  wrote:
> > I agree that it's more reliable - I hope there's no meaningful safety
> > difference.  I think you overestimate users a bit however - far from
> > most of them are going to be able to extract a very long log entry from
> > a busy log file. There's no generally available easy way to copy a few
> > pages of text from a logfile that's a few gigabytes large...
> 
> Well, a lot of users will send us the whole logfile rather than just
> the relevant bits, but that doesn't bother me.

That doesn't really work on some busy servers, and also often in cases
where the log potentially contains sensitive (e.g. HIPPA) data.

I think a function returning the dump would be great, a function "just"
dumping to the server log still pretty good.

Greetings,

Andres Freund



Re: Using ProcSignal to get memory context stats from a running backend

2017-12-12 Thread Robert Haas
On Tue, Dec 12, 2017 at 2:12 PM, Andres Freund  wrote:
> I agree that it's more reliable - I hope there's no meaningful safety
> difference.  I think you overestimate users a bit however - far from
> most of them are going to be able to extract a very long log entry from
> a busy log file. There's no generally available easy way to copy a few
> pages of text from a logfile that's a few gigabytes large...

Well, a lot of users will send us the whole logfile rather than just
the relevant bits, but that doesn't bother me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Using ProcSignal to get memory context stats from a running backend

2017-12-12 Thread Andres Freund
On 2017-12-12 14:04:55 -0500, Robert Haas wrote:
> On Tue, Dec 12, 2017 at 1:50 PM, Andres Freund  wrote:
> > I do wonder if the right thing here wouldn't be to put the result into a
> > dsm segment, and then return that to the UDF on the requesting
> > side. Logging to the server log and then have the requestor dig that out
> > doesn't seem particularly user friendly.
> 
> I think that dumping it to the server log will be fine for most
> people, and it's *significantly* safer.

I agree that it's more reliable - I hope there's no meaningful safety
difference.  I think you overestimate users a bit however - far from
most of them are going to be able to extract a very long log entry from
a busy log file. There's no generally available easy way to copy a few
pages of text from a logfile that's a few gigabytes large...

I'd suggest adding two functions.


> Creating a DSM segment could fail, and the last thing we want is for
> interrogating a long-running backend to make it crap out.

It'd need to handle memory alloc failures gracefully, I agree. That
doesn't seem impossible.  Also don't think that's meaningfully different
in the non-dsm case, it'd be good to handle malloc failures gracefully
in that case as well.

The requesting side should create the dsm segment and preallocate a
reasonable amount of memory, if we go for it.


> +1 for this whole concept, just BTW.  As I've said before, I grow
> weary of asking customers to run gdb.

Indeed.

Greetings,

Andres Freund



Re: Using ProcSignal to get memory context stats from a running backend

2017-12-12 Thread Robert Haas
On Tue, Dec 12, 2017 at 1:50 PM, Andres Freund  wrote:
> I do wonder if the right thing here wouldn't be to put the result into a
> dsm segment, and then return that to the UDF on the requesting
> side. Logging to the server log and then have the requestor dig that out
> doesn't seem particularly user friendly.

I think that dumping it to the server log will be fine for most
people, and it's *significantly* safer.  Creating a DSM segment could
fail, and the last thing we want is for interrogating a long-running
backend to make it crap out.

+1 for this whole concept, just BTW.  As I've said before, I grow
weary of asking customers to run gdb.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Using ProcSignal to get memory context stats from a running backend

2017-12-12 Thread Andres Freund
On 2017-12-12 14:25:48 +0800, Craig Ringer wrote:
> If we were willing to wrap variadic calls in a callback and rely on
> vfprintf, we could:

I don't think there's problems with relying on variadic calls, we do
that in plenty places.


> and let the ProcSignal based caller pass an elog wrapper instead, or form a
> StringInfo with appendStringInfoVA then elog it in one go after the
> MemoryContextStatsDetail call returns.

I don't think we want a simple elog wrapper - outputting the context
stats as hundreds of log messages doesn't seem right. So at the least it
seems we should bunch it up in stringinfo - which seems to at least
require expanding the API to pass around a void *callback_data or such.

I do wonder if the right thing here wouldn't be to put the result into a
dsm segment, and then return that to the UDF on the requesting
side. Logging to the server log and then have the requestor dig that out
doesn't seem particularly user friendly.

Greetings,

Andres Freund



Re: Using ProcSignal to get memory context stats from a running backend

2017-12-11 Thread Craig Ringer
On 12 December 2017 at 12:43, Andres Freund  wrote:

> Hi,
>
> On 2017-12-12 11:57:41 +0800, Craig Ringer wrote:
> > TL;DR: Lets add a ProcSignalReason that makes a backend
> > call MemoryContextStats when it sees it and a C func that users can use
> to
> > set it on a proc. Sane?
>
> It's not unproblematic. procsignal_sigusr1_handler() runs in a signal
> handler, so you can't really rely on a lot of stuff being legal to
> do.
>
> It'd be easy to set a flag in the handler and then have
> CHECK_FOR_INTERRUPTS() do the MemoryContextStats() call.


Yes, definitely. That was my intention. Trying to write to stderr, mess
with memory contexts, etc from a signal handler context seems awfully hairy
and definitely not something I'd want to risk on a live system.


> But that'd have
> the disadvanatage that it possibly would take a while till the
> MemoryContextStats() is executed. Not sure if that's still good enough
> for you?
>

Definitely. Sure, it won't be perfect, but it'd be a big improvement on
what we have.


> Another question is whether printing to stderr, bypassing proper
> logging!, is good enough for something like this.
>

I think the reason it prints to stderr now is that it's intended to run in
OOM situations.

Arguably that's not such a concern when being triggered by a procsignal. So
elog(...) in that context could make sense. I'd probably add a
print-wrapper callback arg to MemoryContextStatsDetail that you can use to
write to a stringinfo / elog / fprintf(stderr), as desired.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Using ProcSignal to get memory context stats from a running backend

2017-12-11 Thread Tom Lane
Andres Freund  writes:
> On 2017-12-12 11:57:41 +0800, Craig Ringer wrote:
>> TL;DR: Lets add a ProcSignalReason that makes a backend
>> call MemoryContextStats when it sees it and a C func that users can use to
>> set it on a proc. Sane?

> It's not unproblematic. procsignal_sigusr1_handler() runs in a signal
> handler, so you can't really rely on a lot of stuff being legal to do.

Indeed, calling MemoryContextStats in a signal handler would be a damfool
idea, but Craig didn't propose that AFAICS.

> Another question is whether printing to stderr, bypassing proper
> logging!, is good enough for something like this.

Yeah, this is an issue.  MemoryContextStats is designed to print
to stderr in the (possibly vain) hope that it will work even when
we are up against an OOM failure.  That's not a property I much
want to give up, but you're right that it's not very desirable
if a user is trying to capture state during normal running.

regards, tom lane



Re: Using ProcSignal to get memory context stats from a running backend

2017-12-11 Thread Craig Ringer
On 12 December 2017 at 12:25, Tsunakawa, Takayuki <
tsunakawa.ta...@jp.fujitsu.com> wrote:

> From: Craig Ringer [mailto:cr...@2ndquadrant.com]
> > TL;DR: Lets add a ProcSignalReason that makes a backend call
> > MemoryContextStats when it sees it and a C func that users can use to set
> > it on a proc. Sane?
>
> > So how about borrowing a ProcSignalReason entry for "dump a memory
> context
> > summary at your earliest convenience" ? We could name it a more generic
> > "dump debug data" in case we want to add things later.
> >
> > Then a new pg_log_debug_backend(int) function or something like that
> could
> > signal the proc and let CHECK_FOR_INTERRUPTS handle calling
> > MemoryContextStats next time it's called.
>
> +1
> That's one of things I wanted to do.  It will be more useful on Windows.
> Would it work for autovac processes and background workers, etc. that
> connect to shared memory?
>

Anything that uses CHECK_FOR_INTERRUPTS() and is attached to PGXACT. So
yeah, pretty much anything attached to shmem.


> I have also wanted to dump stack traces.  Linux (glibc) has
> backtrace_symbols(), and Windows has StackWalk()/StackWalk64().  Is it sane
> to make the function a hook?
>

In-proc stack traces are immensely useful, and IMO relatively safe in a
proc that's already in a reasonable state. If your stack is mangled, making
it worse with an in-proc stack trace is rarely your biggest concern. I'd
LOVE to be able to do this.

However, I'd want to address anything like that quite separately to the
change I proposed to expose an existing facility.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Using ProcSignal to get memory context stats from a running backend

2017-12-11 Thread Andres Freund
Hi,

On 2017-12-12 11:57:41 +0800, Craig Ringer wrote:
> TL;DR: Lets add a ProcSignalReason that makes a backend
> call MemoryContextStats when it sees it and a C func that users can use to
> set it on a proc. Sane?

It's not unproblematic. procsignal_sigusr1_handler() runs in a signal
handler, so you can't really rely on a lot of stuff being legal to
do.

It'd be easy to set a flag in the handler and then have
CHECK_FOR_INTERRUPTS() do the MemoryContextStats() call. But that'd have
the disadvanatage that it possibly would take a while till the
MemoryContextStats() is executed. Not sure if that's still good enough
for you?

Another question is whether printing to stderr, bypassing proper
logging!, is good enough for something like this.

Greetings,

Andres Freund



RE: Using ProcSignal to get memory context stats from a running backend

2017-12-11 Thread Tsunakawa, Takayuki
From: Craig Ringer [mailto:cr...@2ndquadrant.com]
> TL;DR: Lets add a ProcSignalReason that makes a backend call
> MemoryContextStats when it sees it and a C func that users can use to set
> it on a proc. Sane?

> So how about borrowing a ProcSignalReason entry for "dump a memory context
> summary at your earliest convenience" ? We could name it a more generic
> "dump debug data" in case we want to add things later.
> 
> Then a new pg_log_debug_backend(int) function or something like that could
> signal the proc and let CHECK_FOR_INTERRUPTS handle calling
> MemoryContextStats next time it's called.

+1
That's one of things I wanted to do.  It will be more useful on Windows.  Would 
it work for autovac processes and background workers, etc. that connect to 
shared memory?

I have also wanted to dump stack traces.  Linux (glibc) has 
backtrace_symbols(), and Windows has StackWalk()/StackWalk64().  Is it sane to 
make the function a hook?

Regards
Takayuki Tsunakawa