Re: Send duration output to separate log files

2024-07-24 Thread Pavel Stehule
Hi

st 10. 7. 2024 v 17:58 odesílatel Greg Sabino Mullane 
napsal:

> Please find attached a patch to allow for durations to optionally be sent
> to separate log files. In other words, rather than cluttering up our
> postgres202007.log file with tons of output from
> log_min_duration_statement, duration lines are sent instead to the file
> postgres202007.duration.
>
> Over the years, durations have been the number one cause of very large log
> files, in which the more "important" items get buried in the noise. Also,
> programs that are scanning for durations typically do not care about the
> normal, non-duration output. Some people have a policy of logging
> everything, which in effect means setting log_min_duration_statement to 0,
> which in turn makes your log files nearly worthless for spotting
> non-duration items. This feature will also be very useful for those who
> need to temporarily turn on log_min_duration_statement, for some quick
> auditing of exactly what is being run on their database. When done, you can
> move or remove the duration file without messing up your existing log
> stream.
>
> This only covers the case when both the duration and statement are set on
> the same line. In other words, log_min_duration_statement output, but not
> log_duration (which is best avoided anyway). It also requires
> logging_collector to be on, obviously.
>
> Details:
>
> The edata structure is expanded to have a new message_type, with a
> matching function errmessagetype() created.
> [include/utils/elog.h]
> [backend/utils/elog.c]
>
> Any errors that have both a duration and a statement are marked via
> errmessagetype()
> [backend/tcop/postgres.c]
>
> A new GUC named "log_duration_destination" is created, which supports any
> combination of stderr, csvlog, and jsonlog. It does not need to match
> log_destination, in order to support different use cases. For example, the
> user wants durations sent to a CSV file for processing by some other tool,
> but still wants everything else going to a normal text log file.
>
> Code: [include/utils/guc_hooks.h] [backend/utils/misc/guc_tables.c]
> Docs: [sgml/config.sgml]  [backend/utils/misc/postgresql.conf.sample]
>
> Create a new flag called PIPE_PROTO_DEST_DURATION
> [include/postmaster/syslogger.h]
>
> Create new flags:
>   LOG_DESTINATION_DURATION,
>   LOG_DESTINATION_DURATION_CSV,
>   LOG_DESTINATION_DURATION_JSON
> [include/utils/elog.h]
>
> Routing and mapping LOG_DESTINATION to PIPE_PROTO
> [backend/utils/error/elog.c]
>
> Minor rerouting when using alternate forms
> [backend/utils/error/csvlog.c]
> [backend/utils/error/jsonlog.c]
>
> Create new filehandles, do log rotation, map PIPE_PROTO to
> LOG_DESTINATION. Rotation and entry into the "current_logfiles" file are
> the same as existing log files. The new names/suffixes are duration,
> duration.csv, and duration.json.
> [backend/postmaster/syslogger.c]
>
> Testing to ensure combinations of log_destination and
> log_duration_destination work as intended
> [bin/pg_ctl/meson.build]
> [bin/pg_ctl/t/005_log_duration_destination.pl]
>
> Questions I've asked along the way, and perhaps other might as well:
>
> What about logging other things? Why just duration?
>
> Duration logging is a very unique event in our logs. There is nothing
> quite like it - it's always client-driven, yet automatically generated. And
> it can be extraordinarily verbose. Removing it from the existing logging
> stream has no particular downsides. Almost any other class of log message
> would likely meet resistance as far as moving it to a separate log file,
> with good reason.
>
> Why not build a more generic log filtering case?
>
> I looked into this, but it would be a large undertaking, given the way our
> logging system works. And as per above, I don't think the pain would be
> worth it, as duration covers 99% of the use cases for separate logs.
> Certainly, nothing else other than a recurring ERROR from the client can
> cause massive bloat in the size of the files. (There is a nearby patch to
> exclude certain errors from the log file as a way to mitigate the error
> spam - I don't like that idea, but should mention it here as another effort
> to keep the log files a manageable size)
>
> Why not use an extension for this?
>
> I did start this as an extension, but it only goes so far. We can use
> emit_log_hook, but it requires careful coordination of open filehandles,
> has to do inefficient regex of every log message, and cannot do things like
> log rotation.
>
> Why not bitmap PIPE_PROTO *and* LOG_DESTINATION?
>
> I tried to do both as simple bitmaps (i.e. duration+csv = duration.csv),
> and not have to use e.g. LOG_DESTIATION_DURATION_CSV, but size_rotation_for
> ruined it for me. Since our PIPE always sends one thing at a time, a single
> new flag enables it to stay as a clean bits8 type.
>
> What about Windows?
>
> Untested. I don't have access to a Windows build, but I think in theory it
> should work fine.
>

I like the proposed fea

Re: Send duration output to separate log files

2024-07-22 Thread Alastair Turner
On Fri, 12 Jul 2024 at 15:58, Greg Sabino Mullane 
wrote:

> On Thu, Jul 11, 2024 at 6:47 AM Alastair Turner 
> wrote:
>
>>  The other category of logging which would benefit from a separate file
>> is audit. It also can create massive volumes of log content. Splitting
>> audit information off into a separate file for use by a separate team or
>> function is also a request I have heard from some financial institutions
>> adopting Postgres. With audit being provided by an extension, this would
>> become quite an intrusive change.
>>
>
> Thanks for the feedback. I agree pgaudit is another thing that can create
> massive log files, and should be solved at some point.  However, I wanted
> to keep this patch self-contained to in-core stuff. And pgaudit is already
> an odd duck, in that it puts CSV into your stderr stream (and into your
> json!). Ideally it would put a single CSV stream into a separate csv file.
> Perhaps even something that did not necessarily live in log_directory.
>
> Would an extension be able to safely modify the message_type field you
>>> have added using emit_log_hook? If so, the field becomes more of a log
>>> destination label than a type marker. If an extension could hook into the
>>> log file creation/rotation process, that would be a nice basis for enabling
>>> extensions (I'm particularly thinking of pgAudit) to manage separate
>>> logging destinations.
>>>
>>
> Yes, I had more than duration in mind when I created errmessagetype. A
> hook to set it would be the obvious next step, and then some sort of way of
> mapping that to arbitrary log files. But I see that as mostly orthagonal to
> this patch (and certainly a much larger endeavor).
>

Ok. This facility to separate out the logging of the duration messages is
useful on its own and I can see the reasoning for using the core
functionality for log rotation to manage these separate logs rather than
redoing all the file handling work in an extension. A broader framework for
mapping messages to arbitrary log files is a far larger set of changes
which can be tackled later if desired.

I've had a look at the patch. The test cases look comprehensive. The patch
applies cleanly. The newly supplied tests (13 of the 40) and the
test_misc/003_check_guc (1 - no parameters missing from
postgresql.conf.sample) fail.

To leave some runway for this idea to be extended on without disrupting the
user experience could the GUC name be feature qualified as
duration_log.log_destination? This would provide a clear naming structure
for the most obvious follow-on patch to this one - allowing users to set
log_directory separately for these duration logs - as well as any further
separate logging efforts. I know that these dot-separated GUC names are
generally associated with extensions, but I can't find a hard rule on the
issue anywhere, and it feels like a reasonable way to group up the purpose
(in this case logging duration messages) for which there are specific
values of a number of GUCs (log_destination, log_directory, even
log_filename, ...).

Cheers
Alastair

>


Re: Send duration output to separate log files

2024-07-12 Thread Greg Sabino Mullane
On Thu, Jul 11, 2024 at 6:47 AM Alastair Turner  wrote:

>  The other category of logging which would benefit from a separate file is
> audit. It also can create massive volumes of log content. Splitting audit
> information off into a separate file for use by a separate team or function
> is also a request I have heard from some financial institutions adopting
> Postgres. With audit being provided by an extension, this would become
> quite an intrusive change.
>

Thanks for the feedback. I agree pgaudit is another thing that can create
massive log files, and should be solved at some point.  However, I wanted
to keep this patch self-contained to in-core stuff. And pgaudit is already
an odd duck, in that it puts CSV into your stderr stream (and into your
json!). Ideally it would put a single CSV stream into a separate csv file.
Perhaps even something that did not necessarily live in log_directory.

Would an extension be able to safely modify the message_type field you have
>> added using emit_log_hook? If so, the field becomes more of a log
>> destination label than a type marker. If an extension could hook into the
>> log file creation/rotation process, that would be a nice basis for enabling
>> extensions (I'm particularly thinking of pgAudit) to manage separate
>> logging destinations.
>>
>
Yes, I had more than duration in mind when I created errmessagetype. A hook
to set it would be the obvious next step, and then some sort of way of
mapping that to arbitrary log files. But I see that as mostly orthagonal to
this patch (and certainly a much larger endeavor).

(wades in anyways). I'm not sure about hooking into the log rotation
process so much as registering something on startup, then letting Postgres
handle all the log files in the queue. Although as I alluded to above,
sometimes having large log files NOT live in the data directory (or more
specifically, not hang out with the log_directory crowd), could be a plus
for space, efficiency, and security reasons. That makes log rotation
harder, however. And do we / should we put extension-driven logfiles into
current_logfiles? Do we still fall back to stderr even for extension logs?
Lots to ponder. :)

Cheers,
Greg


Re: Send duration output to separate log files

2024-07-11 Thread Alastair Turner
On Wed, 10 Jul 2024 at 16:58, Greg Sabino Mullane 
wrote:
--snip--

> Why not build a more generic log filtering case?
>
> I looked into this, but it would be a large undertaking, given the way our
> logging system works. And as per above, I don't think the pain would be
> worth it, as duration covers 99% of the use cases for separate logs.
>

 The other category of logging which would benefit from a separate file is
audit. It also can create massive volumes of log content. Splitting audit
information off into a separate file for use by a separate team or function
is also a request I have heard from some financial institutions adopting
Postgres. With audit being provided by an extension, this would become
quite an intrusive change.

Why not use an extension for this?
>
> I did start this as an extension, but it only goes so far. We can use
> emit_log_hook, but it requires careful coordination of open filehandles,
> has to do inefficient regex of every log message, and cannot do things like
> log rotation.
>

Would an extension be able to safely modify the message_type field you have
added using emit_log_hook? If so, the field becomes more of a log
destination label than a type marker. If an extension could hook into the
log file creation/rotation process, that would be a nice basis for enabling
extensions (I'm particularly thinking of pgAudit) to manage separate
logging destinations.


Send duration output to separate log files

2024-07-10 Thread Greg Sabino Mullane
Please find attached a patch to allow for durations to optionally be sent
to separate log files. In other words, rather than cluttering up our
postgres202007.log file with tons of output from
log_min_duration_statement, duration lines are sent instead to the file
postgres202007.duration.

Over the years, durations have been the number one cause of very large log
files, in which the more "important" items get buried in the noise. Also,
programs that are scanning for durations typically do not care about the
normal, non-duration output. Some people have a policy of logging
everything, which in effect means setting log_min_duration_statement to 0,
which in turn makes your log files nearly worthless for spotting
non-duration items. This feature will also be very useful for those who
need to temporarily turn on log_min_duration_statement, for some quick
auditing of exactly what is being run on their database. When done, you can
move or remove the duration file without messing up your existing log
stream.

This only covers the case when both the duration and statement are set on
the same line. In other words, log_min_duration_statement output, but not
log_duration (which is best avoided anyway). It also requires
logging_collector to be on, obviously.

Details:

The edata structure is expanded to have a new message_type, with a matching
function errmessagetype() created.
[include/utils/elog.h]
[backend/utils/elog.c]

Any errors that have both a duration and a statement are marked via
errmessagetype()
[backend/tcop/postgres.c]

A new GUC named "log_duration_destination" is created, which supports any
combination of stderr, csvlog, and jsonlog. It does not need to match
log_destination, in order to support different use cases. For example, the
user wants durations sent to a CSV file for processing by some other tool,
but still wants everything else going to a normal text log file.

Code: [include/utils/guc_hooks.h] [backend/utils/misc/guc_tables.c]
Docs: [sgml/config.sgml]  [backend/utils/misc/postgresql.conf.sample]

Create a new flag called PIPE_PROTO_DEST_DURATION
[include/postmaster/syslogger.h]

Create new flags:
  LOG_DESTINATION_DURATION,
  LOG_DESTINATION_DURATION_CSV,
  LOG_DESTINATION_DURATION_JSON
[include/utils/elog.h]

Routing and mapping LOG_DESTINATION to PIPE_PROTO
[backend/utils/error/elog.c]

Minor rerouting when using alternate forms
[backend/utils/error/csvlog.c]
[backend/utils/error/jsonlog.c]

Create new filehandles, do log rotation, map PIPE_PROTO to LOG_DESTINATION.
Rotation and entry into the "current_logfiles" file are the same as
existing log files. The new names/suffixes are duration, duration.csv, and
duration.json.
[backend/postmaster/syslogger.c]

Testing to ensure combinations of log_destination and
log_duration_destination work as intended
[bin/pg_ctl/meson.build]
[bin/pg_ctl/t/005_log_duration_destination.pl]

Questions I've asked along the way, and perhaps other might as well:

What about logging other things? Why just duration?

Duration logging is a very unique event in our logs. There is nothing quite
like it - it's always client-driven, yet automatically generated. And it
can be extraordinarily verbose. Removing it from the existing logging
stream has no particular downsides. Almost any other class of log message
would likely meet resistance as far as moving it to a separate log file,
with good reason.

Why not build a more generic log filtering case?

I looked into this, but it would be a large undertaking, given the way our
logging system works. And as per above, I don't think the pain would be
worth it, as duration covers 99% of the use cases for separate logs.
Certainly, nothing else other than a recurring ERROR from the client can
cause massive bloat in the size of the files. (There is a nearby patch to
exclude certain errors from the log file as a way to mitigate the error
spam - I don't like that idea, but should mention it here as another effort
to keep the log files a manageable size)

Why not use an extension for this?

I did start this as an extension, but it only goes so far. We can use
emit_log_hook, but it requires careful coordination of open filehandles,
has to do inefficient regex of every log message, and cannot do things like
log rotation.

Why not bitmap PIPE_PROTO *and* LOG_DESTINATION?

I tried to do both as simple bitmaps (i.e. duration+csv = duration.csv),
and not have to use e.g. LOG_DESTIATION_DURATION_CSV, but size_rotation_for
ruined it for me. Since our PIPE always sends one thing at a time, a single
new flag enables it to stay as a clean bits8 type.

What about Windows?

Untested. I don't have access to a Windows build, but I think in theory it
should work fine.

Cheers,
Greg
From c6d19d07cfa7eac51e5ead79f1c1b230b5b2d364 Mon Sep 17 00:00:00 2001
From: Greg Sabino Mullane 
Date: Tue, 2 Jul 2024 15:13:44 -0400
Subject: [PATCH] Add new parameter log_duration_destination

This allows writing of items from log_min_duration_statement to be sent to separ