Re: New PostgreSQL Contributors

2024-10-04 Thread Greg Sabino Mullane
Congratulations to all! This seems the sort of thing we should crosspost to
-general and/or -announce, no?

Cheers,
Greg


Re: Truncate logs by max_log_size

2024-10-01 Thread Greg Sabino Mullane
On Fri, Sep 27, 2024 at 6:37 AM Andrey M. Borodin 
wrote:

> Consider max_log_size = 10Mb. The perspective might look very different.
> It’s not about WHERE anymore. It's a guard against heavy abuse.
>

Can you elaborate on this? Do you mean someone purposefully writing large
entries to your log file?


Re: Changing the default random_page_cost value

2024-09-30 Thread Greg Sabino Mullane
On Fri, Sep 27, 2024 at 12:03 PM Dagfinn Ilmari Mannsåker 
wrote:

> It might also be worth mentioning cloudy block storage (e.g. AWS' EBS),
> which is typically backed by SSDs, but has extra network latency.
>

That seems a little too in the weeds for me, but wording suggestions are
welcome. To get things moving forward, I made a doc patch which changes a
few things, namely:

* Mentions the distinction between ssd and hdd right up front.
* Moves the tablespace talk to the very end, as tablespace use is getting
rarer (again, thanks in part to ssds)
* Mentions the capability to set per-database and per-role since we mention
per-tablespace.
* Removes a lot of the talk of caches and justifications for the 4.0
setting. While those are interesting, I've been tuning this parameter for
many years and never really cared about the "90% cache rate". The proof is
in the pudding: rpc is the canonical "try it and see" parameter. Tweak.
Test. Repeat.

Cheers,
Greg


0001-Lower-random_page_cost-default-to-1.2-and-update-docs-about-it.patch
Description: Binary data


Changing the default random_page_cost value

2024-09-27 Thread Greg Sabino Mullane
tl;dr let's assume SSDs are popular and HDDs are the exception and flip our
default

As I write this email, it's the year 2024. I think it is time we lower our
"default" setting of random_page_cost (as set in postgresql.conf.sample and
the docs). Even a decade ago, the current default of 4 was considered
fairly conservative and often lowered. The git logs shows that this value
was last touched in 2006, during the age of spinning metal. We are now in a
new era, the age of SSDs, and thus we should lower this default value to
reflect the fact that the vast majority of people using Postgres these days
are doing so on solid state drives. We tend to stay ultra-conservative in
all of our settings, but we also need to recognize when there has been a
major shift in the underlying hardware - and calculations that our defaults
are based on.

Granted, there are other factors involved, and yes, perhaps we should tweak
some of the similar settings as well, but ranom_page_cost is the one
setting most out of sync with today's hardware realities. So I'll be brave
and throw a number out there: 1.2. And change our docs to say wordage like
"if you are using an older hard disk drive technology, you may want to try
raising rpc" to replace our fairly-hidden note about SSDs buried in the
last sentence - of the fourth paragraph - of the rpc docs.

Real data about performance on today's SSDs are welcome, and/or some way to
generate a more accurate default.

Cheers,
Greg


Re: Why mention to Oracle ?

2024-09-25 Thread Greg Sabino Mullane
> And if we create a page like
> https://www.postgresql.org/about/featurematrix/
> But instead of Postgres versions we have other vendors.
>

This sounds like something that would fit well on the Postgres wiki:

https://wiki.postgresql.org/

Cheers,
Greg


Re: Exit walsender before confirming remote flush in logical replication

2024-09-17 Thread Greg Sabino Mullane
Thanks for everyone's work on this, I am very interested in it getting into
a release. What is the status of this?

My use case is Patroni - when it needs to do a failover, it shuts down the
primary. However, large transactions can cause it to stay in the "shutting
down" state for a long time, which means your entire HA system is now
non-functional. I like the idea of a new flag. I'll test this out soon if
the original authors want to make a rebased patch. This thread is old, so
if I don't hear back in a bit, I'll create and test a new one myself. :)

Cheers,
Greg


Re: Jargon and acronyms on this mailing list

2024-09-10 Thread Greg Sabino Mullane
On Tue, Sep 3, 2024 at 11:50 AM Nathan Bossart 
wrote:

> Do you think these acronyms make it difficult for some to contribute to
> Postgres?  I've always felt that they were pretty easy to figure out and a
> nice way to save some typing for common phrases, but I'm not sure it's ever
> really been discussed
>

I do think it raises the bar a bit, especially for non-native-English
speakers. Granted, the learning curve is not super high, and context plus
web searching can usually help people out, but the lists are dense enough
already, so I wanted to help people out. Also, mailing lists in general are
a pretty foreign concept to young developers, and as AFAICT [1], not all
the acronyms have crossed to the texting world.

Cheers,
Greg

[1] See what I did there? [2]

[2] Do people still learn about and use footnotes?


Jargon and acronyms on this mailing list

2024-08-30 Thread Greg Sabino Mullane
I normally wouldn't mention my blog entries here, but this one was about
the hackers mailing list, so wanted to let people know about it in case you
don't follow Planet Postgres. I scanned the last year's worth of posts and
gathered the most used acronyms and jargon. The most commonly used acronym
was IMO (in my opinion), followed by FWIW (for what it's worth), and IIUC
(if I understand correctly). The complete list can be found in the post
below, I'll refrain from copying everything here.

https://www.crunchydata.com/blog/understanding-the-postgres-hackers-mailing-list

Cheers,
Greg


Re: Better error message when --single is not the first arg to postgres executable

2024-08-27 Thread Greg Sabino Mullane
On Mon, Aug 26, 2024 at 11:43 AM Nathan Bossart 
wrote:

> On Sun, Aug 25, 2024 at 01:14:36PM -0400, Greg Sabino Mullane wrote:
> > I'm not happy about making this and the const char[] change their
> structure
> > based on the ifdefs - could we not just leave forkchild in? Their usage
> is
> > already protected by the ifdefs in the calling code.
>
> Here's an attempt at this.
>

Looks great, thank you.


Re: Enable data checksums by default

2024-08-27 Thread Greg Sabino Mullane
On Mon, Aug 26, 2024 at 3:46 PM Nathan Bossart 
wrote:

> Should we error if both --data-checksum and --no-data-checksums are
> specified?  IIUC with 0001, we'll use whichever is specified last.
>

Hmmm, that is a good question. We have never (to my recollection) flipped a
default quite like this before. I'm inclined to leave it as "last one
wins", as I can see automated systems appending their desired selection to
the end of the arg list, and expecting it to work.

nitpick: these 4 patches are small enough that they could likely be
> combined and committed together.
>

This was split per request upthread, which I do agree with.

I think it's fair to say we should make the pg_upgrade experience nicer
> once the default changes, but IMHO that needn't block actually changing the
> default.
>

+1

Cheers,
Greg


Re: Better error message when --single is not the first arg to postgres executable

2024-08-25 Thread Greg Sabino Mullane
I'm not opposed to this new method, as long as the error code improves. :)

+typedef enum Subprogram
+{
+ SUBPROGRAM_CHECK,
+ SUBPROGRAM_BOOT,
+#ifdef EXEC_BACKEND
+ SUBPROGRAM_FORKCHILD,
+#endif

I'm not happy about making this and the const char[] change their structure
based on the ifdefs - could we not just leave forkchild in? Their usage is
already protected by the ifdefs in the calling code.

Heck, we could put SUBPROGRAM_FORKCHILD first in the list, keep the ifdef
in parse_subprogram, and start regular checking with i = 1;
This would reduce to a single #ifdef

Cheers,
Greg


Re: Enable data checksums by default

2024-08-23 Thread Greg Sabino Mullane
Rebased and reworked patches attached:

v2-0001-Add-new-initdb-argument-no-data-checksums-to-force-checksums-off.patch
- Adds --no-data-checksums to initdb.c, adjusts --help and sgml docs, adds
simple tests

v2-0002-Allow-tests-to-force-checksums-off-when-calling-init.patch
- Adjusts the Perl tests to use the new flag as needed

v2-0003-Change-initdb-to-default-to-using-data-checksums.patch
- Flips the boolean to true, adjusts the tests to match it, tweaks docs

v2-0004-Tweak-docs-to-reduce-possible-impact-of-data-checksums.patch
- Changes "noticeable" penalty to "small" penalty

Cheers,
Greg


v2-0004-Tweak-docs-to-reduce-possible-impact-of-data-checksums.patch
Description: Binary data


v2-0001-Add-new-initdb-argument-no-data-checksums-to-force-checksums-off.patch
Description: Binary data


v2-0003-Change-initdb-to-default-to-using-data-checksums.patch
Description: Binary data


v2-0002-Allow-tests-to-force-checksums-off-when-calling-init.patch
Description: Binary data


Re: [PATCH] GROUP BY ALL

2024-08-13 Thread Greg Sabino Mullane
On Tue, Jul 23, 2024 at 9:37 AM David Christensen  wrote:

> I'm not married to the exact syntax of this feature; anything else short
> and consistent could work if `ALL` is considered to potentially
> gain a different interpretation in the future.
>

GROUP BY *

Just kidding. But a big +1 to the whole concept. It would have been
extraordinarily useful over the years.

Cheers,
Greg


Re: Normalize queries starting with SET for pg_stat_statements

2024-08-13 Thread Greg Sabino Mullane
Now that I've spent some time away from this, I'm reconsidering why we are
going through all the trouble of semi-jumbling SET statements. Maybe we
just keep it simple and everything becomes "SET myvar = $1" or even "SET
myvar" full stop? I'm having a hard time finding a real-world situation in
which we need to distinguish different SET/RESET items within
pg_stat_statements.

Cheers,
Greg


Re: Enable data checksums by default

2024-08-13 Thread Greg Sabino Mullane
On Thu, Aug 8, 2024 at 6:11 AM Peter Eisentraut 
wrote:


> My understanding was that the reason for some hesitation about adopting
> data checksums was the performance impact.  Not the checksumming itself,
> but the overhead from hint bit logging.  The last time I looked into that,
> you could get performance impacts on the order of 5% tps.  Maybe that's
> acceptable, and you of course can turn it off if you want the extra
> performance.  But I think this should be discussed in this thread.
>

Fair enough. I think the performance impact is acceptable, as evidenced by
the large number of people that turn it on. And it is easy enough to turn
it off again, either via --no-data-checksums or pg_checksums --disable.
I've come across people who have regretted not throwing a -k into their
initial initdb, but have not yet come across someone who has the opposite
regret. When I did some measurements some time ago, I found numbers much
less than 5%, but of course it depends on a lot of factors.

About the claim that it's already the de-facto standard.  Maybe that is
> approximately true for "serious" installations.  But AFAICT, the popular
> packagings don't enable checksums by default, so there is likely a
> significant middle tier between "just trying it out" and serious
> production use that don't have it turned on.
>

I would push back on that "significant" a good bit. The number of Postgres
installations in the cloud is very likely to dwarf the total package
installations. Maybe not 10 years ago, but now? Maybe someone from Amazon
can share some numbers. Not that we have any way to compare against package
installs :) But anecdotally the number of people who mention RDS etc. on
the various fora has exploded.


> For those uses, this change would render pg_upgrade useless for upgrades
> from an old instance with default settings to a new instance with default
> settings.  And then users would either need to re-initdb with checksums
> turned back off, or I suppose run pg_checksums on the old instance before
> upgrading?  This is significant additional complication.
>

Meh, re-running initdb with --no-data-checksums seems a fairly low hurdle.


> And packagers who have built abstractions on top of pg_upgrade (such as
> Debian pg_upgradecluster) would also need to implement something to manage
> this somehow.
>

How does it deal with clusters with checksums enabled now?


> I'm thinking pg_upgrade could have a mode where it adds the checksum
> during the upgrade as it copies the files (essentially a subset
> of pg_checksums).  I think that would be useful for that middle tier of
> users who just want a good default experience.
>

Hm...might be a bad experience if it forces a switch out of --link mode.
Perhaps a warning at the end of pg_upgrade that suggests running
pg_checksums on your new cluster if you want to enable checksums?

Cheers,
Greg


Re: Subscription to Postgres Releases via ICS

2024-08-13 Thread Greg Sabino Mullane
Sounds like a good idea. You probably want to ask on pgsql-www. I imagine
it would have to be coded against this:

https://git.postgresql.org/gitweb/?p=pgweb.git

A working patch would probably be nice to get things started. but I
recognize it's a little bit of chicken-and-egg.

Cheers,
Greg


Re: Enable data checksums by default

2024-08-08 Thread Greg Sabino Mullane
Thank you for the feedback. Please find attached three separate patches.
One to add a new flag to initdb (--no-data-checksums), one to adjust the
tests to use this flag as needed, and the final to make the actual switch
of the default value (along with tests and docs).

Cheers,
Greg


0001-Add-new-initdb-argument-no-data-checksums-to-force-checksums-off.patch
Description: Binary data


0002-Allow-tests-to-force-checksums-off-when-calling-init.patch
Description: Binary data


0003-Change-initdb-to-default-to-using-data-checksums.patch
Description: Binary data


Re: Enable data checksums by default

2024-08-07 Thread Greg Sabino Mullane
On Wed, Aug 7, 2024 at 4:43 AM Michael Banck  wrote:

> I think the last time we dicussed this the consensus was that
> computational overhead of computing the checksums is pretty small for
> most systems (so the above change seems warranted regardless of whether
> we switch the default), but turning on wal_compression also turns on
> wal_log_hints, which can increase WAL by quite a lot. Maybe this is
> covered elsewhere in the documentation (I just looked at the patch), but
> if not, it probably should be added here as a word of caution.
>

Yeah, that seems something beyond this patch? Certainly we should mention
wal_compression in the release notes if the default changes. I mean, I feel
wal_log_hints should probably default to on as well, but I've honestly
never really given it much thought because my fingers are trained to type
"initdb -k". I've been using data checksums for roughly a decade now. I
think the only time I've NOT used checksums was when I was doing checksum
overhead measurements, or hacking on the pg_checksums program.


> I think we usually do not mention when a feature was added/changed, do
> we? So I'd just write "(default: enabled)" or whatever is the style of
> the surrounding options.
>

+1


> > + {"no-data-checksums", no_argument, NULL, 20},
>
> Does it make sense to add -K (capital k) as a short-cut for this? I
> think this is how we distinguish on/off for pg_dump (-t/-T etc.) but
> maybe that is not wider project policy.
>

I'd rather not. Better to keep it explicit rather than some other weird
letter that has no mnemonic value.

Cheers,
Greg


Enable data checksums by default

2024-08-06 Thread Greg Sabino Mullane
Please find attached a patch to enable data checksums by default.

Currently, initdb only enables data checksums if passed the
--data-checksums or -k argument. There was some hesitation years ago when
this feature was first added, leading to the current situation where the
default is off. However, many years later, there is wide consensus that
this is an extraordinarily safe, desirable setting. Indeed, most (if not
all) of the major commercial and open source Postgres systems currently
turn this on by default. I posit you would be hard-pressed to find many
systems these days in which it has NOT been turned on. So basically we have
a de-facto standard, and I think it's time we flipped the switch to make it
on by default.

The patch is simple enough: literally flipping the boolean inside of
initdb.c, and adding a new argument '--no-data-checksums' for those
instances that truly want the old behavior. One place that still needs the
old behavior is our internal tests for pg_checksums and pg_amcheck, so I
added a new argument to init() in PostgreSQL/Test/Cluster.pm to allow those
to still pass their tests.

This is just the default - people are still more than welcome to turn it
off with the new flag. The pg_checksums program is another option that
actually argues for having the default "on", as switching to "off" once
initdb has been run is trivial.

Yes, I am aware of the previous discussions on this, but the world moves
fast - wal compression is better than in the past, vacuum is better now,
and data-checksums being on is such a complete default in the wild, it
feels weird and a disservice that we are not running all our tests like
that.

Cheers,
Greg
From 12ce067f5ba64414d1d14c5f2e763d04cdfacd13 Mon Sep 17 00:00:00 2001
From: Greg Sabino Mullane 
Date: Tue, 6 Aug 2024 18:18:56 -0400
Subject: [PATCH] Make initdb enable data checksums by default

---
 doc/src/sgml/ref/initdb.sgml  |  4 ++-
 src/bin/initdb/initdb.c   |  6 -
 src/bin/initdb/t/001_initdb.pl| 31 +--
 src/bin/pg_amcheck/t/003_check.pl |  2 +-
 src/bin/pg_amcheck/t/004_verify_heapam.pl |  2 +-
 src/bin/pg_checksums/t/002_actions.pl |  2 +-
 src/test/perl/PostgreSQL/Test/Cluster.pm  |  7 +
 7 files changed, 41 insertions(+), 13 deletions(-)

diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index bdd613e77f..511f489d34 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -267,12 +267,14 @@ PostgreSQL documentation

 Use checksums on data pages to help detect corruption by the
 I/O system that would otherwise be silent. Enabling checksums
-may incur a noticeable performance penalty. If set, checksums
+may incur a small performance penalty. If set, checksums
 are calculated for all objects, in all databases. All checksum
 failures will be reported in the
 
 pg_stat_database view.
 See  for details.
+As of version 18, checksums are enabled by default. They can be
+disabled by use of --no-data-checksums.

   
  
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index f00718a015..ce7d3e99e5 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -164,7 +164,7 @@ static bool noinstructions = false;
 static bool do_sync = true;
 static bool sync_only = false;
 static bool show_setting = false;
-static bool data_checksums = false;
+static bool data_checksums = true;
 static char *xlog_dir = NULL;
 static int	wal_segment_size_mb = (DEFAULT_XLOG_SEG_SIZE) / (1024 * 1024);
 static DataDirSyncMethod sync_method = DATA_DIR_SYNC_METHOD_FSYNC;
@@ -3121,6 +3121,7 @@ main(int argc, char *argv[])
 		{"waldir", required_argument, NULL, 'X'},
 		{"wal-segsize", required_argument, NULL, 12},
 		{"data-checksums", no_argument, NULL, 'k'},
+		{"no-data-checksums", no_argument, NULL, 20},
 		{"allow-group-access", no_argument, NULL, 'g'},
 		{"discard-caches", no_argument, NULL, 14},
 		{"locale-provider", required_argument, NULL, 15},
@@ -3319,6 +3320,9 @@ main(int argc, char *argv[])
 if (!parse_sync_method(optarg, &sync_method))
 	exit(1);
 break;
+			case 20:
+data_checksums = false;
+break;
 			default:
 /* getopt_long already emitted a complaint */
 pg_log_error_hint("Try \"%s --help\" for more information.", progname);
diff --git a/src/bin/initdb/t/001_initdb.pl b/src/bin/initdb/t/001_initdb.pl
index 06a35ac0b7..64395ec531 100644
--- a/src/bin/initdb/t/001_initdb.pl
+++ b/src/bin/initdb/t/001_initdb.pl
@@ -69,16 +69,11 @@ mkdir $datadir;
 	}
 }
 
-# Control file should tell that data checksums are disabled by default.
+# Control file should tell that data checksums are enabled by default.
 command_like(
 	[ 'pg_c

Normalize queries starting with SET for pg_stat_statements

2024-07-22 Thread Greg Sabino Mullane
I saw a database recently where some app was inserting the source port into
the application_name field, which meant that pg_stat_statements.max was
quickly reached and queries were simply pouring in and out of
pg_stat_statements, dominated by some "SET application_name = 'myapp
10.0.0.1:1234'" calls. Which got me thinking, is there really any value to
having non-normalized 'SET application_name' queries inside of
pg_stat_statements? Or any SET stuff, for that matter?

Attached please find a small proof-of-concept for normalizing/de-jumbling
certain SET queries. Because we only want to cover the VAR_SET_VALUE parts
of VariableSetStmt, a custom jumble func was needed. There are a lot of
funky SET things inside of gram.y as well that don't do the standard SET X
= Y formula (e.g. SET TIME ZONE, SET SCHEMA). I tried to handle those as
best I could, and carved a couple of exceptions for time zones and xml.

I'm not sure where else to possibly draw lines. Obviously calls to time
zone have a small and finite pool of possible values, so easy enough to
exclude them, while things like application_name and work_mem are fairly
infinite, so great candidates for normalizing. One could argue for simply
normalizing everything, as SET is trivially fast for purposes of
performance tracking via pg_stat_statements, so who cares if we don't have
the exact string? That's what regular logging is for, after all. Most
importantly, less unique queryids means less chance that errant SETs will
crowd out the more important stuff.

In summary, we want to change this:

SELECT calls, query from pg_stat_statements where query ~ 'set' order by 1;
 1 | set application_name = 'alice'
 1 | set application_name = 'bob'
 1 | set application_name = 'eve'
 1 | set application_name = 'mallory'

to this:

SELECT calls, query from pg_stat_statements where query ~ 'set' order by 1;
 4 | set application_name = $1

I haven't updated the regression tests yet, until we reach a consensus on
how thorough the normalizing should be. But there is a new test to exercise
the changes in gram.y.

Cheers,
Greg


0001-Normalize-queries-starting-with-SET.patch
Description: Binary data


Re: Set log_lock_waits=on by default

2024-07-19 Thread Greg Sabino Mullane
>
> Are there any others who have an opinion on this?


Big +1 to having it on by default. It's already one of the first things I
turn on by default on any system I come across. The log spam is minimal,
compared to all the other stuff that ends up in there. And unlike most of
that stuff, this is output you generally want and need, when problems start
occurring.

Cheers,
Greg


Re: Send duration output to separate log files

2024-07-12 Thread Greg Sabino Mullane
On Thu, Jul 11, 2024 at 6:47 AM Alastair Turner  wrote:

>  The other category of logging which would benefit from a separate file is
> audit. It also can create massive volumes of log content. Splitting audit
> information off into a separate file for use by a separate team or function
> is also a request I have heard from some financial institutions adopting
> Postgres. With audit being provided by an extension, this would become
> quite an intrusive change.
>

Thanks for the feedback. I agree pgaudit is another thing that can create
massive log files, and should be solved at some point.  However, I wanted
to keep this patch self-contained to in-core stuff. And pgaudit is already
an odd duck, in that it puts CSV into your stderr stream (and into your
json!). Ideally it would put a single CSV stream into a separate csv file.
Perhaps even something that did not necessarily live in log_directory.

Would an extension be able to safely modify the message_type field you have
>> added using emit_log_hook? If so, the field becomes more of a log
>> destination label than a type marker. If an extension could hook into the
>> log file creation/rotation process, that would be a nice basis for enabling
>> extensions (I'm particularly thinking of pgAudit) to manage separate
>> logging destinations.
>>
>
Yes, I had more than duration in mind when I created errmessagetype. A hook
to set it would be the obvious next step, and then some sort of way of
mapping that to arbitrary log files. But I see that as mostly orthagonal to
this patch (and certainly a much larger endeavor).

(wades in anyways). I'm not sure about hooking into the log rotation
process so much as registering something on startup, then letting Postgres
handle all the log files in the queue. Although as I alluded to above,
sometimes having large log files NOT live in the data directory (or more
specifically, not hang out with the log_directory crowd), could be a plus
for space, efficiency, and security reasons. That makes log rotation
harder, however. And do we / should we put extension-driven logfiles into
current_logfiles? Do we still fall back to stderr even for extension logs?
Lots to ponder. :)

Cheers,
Greg


Re: Logging which local address was connected to in log_line_prefix

2024-07-11 Thread Greg Sabino Mullane
Thanks for the review. Please find attached a new version with proper tabs
and indenting.

Cheers,
Greg


0002-Add-local-address-to-log_line_prefix.patch
Description: Binary data


Send duration output to separate log files

2024-07-10 Thread Greg Sabino Mullane
Please find attached a patch to allow for durations to optionally be sent
to separate log files. In other words, rather than cluttering up our
postgres202007.log file with tons of output from
log_min_duration_statement, duration lines are sent instead to the file
postgres202007.duration.

Over the years, durations have been the number one cause of very large log
files, in which the more "important" items get buried in the noise. Also,
programs that are scanning for durations typically do not care about the
normal, non-duration output. Some people have a policy of logging
everything, which in effect means setting log_min_duration_statement to 0,
which in turn makes your log files nearly worthless for spotting
non-duration items. This feature will also be very useful for those who
need to temporarily turn on log_min_duration_statement, for some quick
auditing of exactly what is being run on their database. When done, you can
move or remove the duration file without messing up your existing log
stream.

This only covers the case when both the duration and statement are set on
the same line. In other words, log_min_duration_statement output, but not
log_duration (which is best avoided anyway). It also requires
logging_collector to be on, obviously.

Details:

The edata structure is expanded to have a new message_type, with a matching
function errmessagetype() created.
[include/utils/elog.h]
[backend/utils/elog.c]

Any errors that have both a duration and a statement are marked via
errmessagetype()
[backend/tcop/postgres.c]

A new GUC named "log_duration_destination" is created, which supports any
combination of stderr, csvlog, and jsonlog. It does not need to match
log_destination, in order to support different use cases. For example, the
user wants durations sent to a CSV file for processing by some other tool,
but still wants everything else going to a normal text log file.

Code: [include/utils/guc_hooks.h] [backend/utils/misc/guc_tables.c]
Docs: [sgml/config.sgml]  [backend/utils/misc/postgresql.conf.sample]

Create a new flag called PIPE_PROTO_DEST_DURATION
[include/postmaster/syslogger.h]

Create new flags:
  LOG_DESTINATION_DURATION,
  LOG_DESTINATION_DURATION_CSV,
  LOG_DESTINATION_DURATION_JSON
[include/utils/elog.h]

Routing and mapping LOG_DESTINATION to PIPE_PROTO
[backend/utils/error/elog.c]

Minor rerouting when using alternate forms
[backend/utils/error/csvlog.c]
[backend/utils/error/jsonlog.c]

Create new filehandles, do log rotation, map PIPE_PROTO to LOG_DESTINATION.
Rotation and entry into the "current_logfiles" file are the same as
existing log files. The new names/suffixes are duration, duration.csv, and
duration.json.
[backend/postmaster/syslogger.c]

Testing to ensure combinations of log_destination and
log_duration_destination work as intended
[bin/pg_ctl/meson.build]
[bin/pg_ctl/t/005_log_duration_destination.pl]

Questions I've asked along the way, and perhaps other might as well:

What about logging other things? Why just duration?

Duration logging is a very unique event in our logs. There is nothing quite
like it - it's always client-driven, yet automatically generated. And it
can be extraordinarily verbose. Removing it from the existing logging
stream has no particular downsides. Almost any other class of log message
would likely meet resistance as far as moving it to a separate log file,
with good reason.

Why not build a more generic log filtering case?

I looked into this, but it would be a large undertaking, given the way our
logging system works. And as per above, I don't think the pain would be
worth it, as duration covers 99% of the use cases for separate logs.
Certainly, nothing else other than a recurring ERROR from the client can
cause massive bloat in the size of the files. (There is a nearby patch to
exclude certain errors from the log file as a way to mitigate the error
spam - I don't like that idea, but should mention it here as another effort
to keep the log files a manageable size)

Why not use an extension for this?

I did start this as an extension, but it only goes so far. We can use
emit_log_hook, but it requires careful coordination of open filehandles,
has to do inefficient regex of every log message, and cannot do things like
log rotation.

Why not bitmap PIPE_PROTO *and* LOG_DESTINATION?

I tried to do both as simple bitmaps (i.e. duration+csv = duration.csv),
and not have to use e.g. LOG_DESTIATION_DURATION_CSV, but size_rotation_for
ruined it for me. Since our PIPE always sends one thing at a time, a single
new flag enables it to stay as a clean bits8 type.

What about Windows?

Untested. I don't have access to a Windows build, but I think in theory it
should work fine.

Cheers,
Greg
From c6d19d07cfa7eac51e5ead79f1c1b230b5b2d364 Mon Sep 17 00:00:00 2001
From: Greg Sabino Mullane 
Date: Tue, 2 Jul 2024 15:13:44 -0400
Subject: [PATCH] Add new parameter log_duration_destination

This allows writing

Re: Better error message when --single is not the first arg to postgres executable

2024-06-18 Thread Greg Sabino Mullane
If I am reading your patch correctly, we have lost the behavior of least
surprise in which the first "meta" argument overrides all others:

$ bin/postgres --version --boot --extrastuff
postgres (PostgreSQL) 16.2

What about just inlining --version and --help e.g.

else if (strcmp(argv[i], "--version") == 0 || strcmp(argv[i], "-V") == 0)
{
 fputs(PG_BACKEND_VERSIONSTR, stdout);
 exit(0);
}

I'm fine with being more persnickety about the other options; they are much
rarer and not unixy.

However, there's a complication:
> ...
> This remaining discrepancy might be okay, but I was really hoping to reduce
> the burden on users to figure out the correct ordering of options.  The
> situations in which I've had to use single-user mode are precisely the
> situations in which I'd rather not have to spend time learning these kinds
> of details.
>

Yes, that's unfortunate. But I'd be okay with the db-last requirement as
long as the error message is sane and points one in the right direction.

Cheers,
Greg


Re: cost delay brainstorming

2024-06-17 Thread Greg Sabino Mullane
On Mon, Jun 17, 2024 at 3:39 PM Robert Haas  wrote:

> So, a very simple algorithm would be: If the maximum number of workers
> have been running continuously for more than, say,
> 10 minutes, assume we're falling behind


Hmm, I don't know about the validity of this. I've seen plenty of cases
where we hit the max workers but all is just fine. On the other hand, I
don't have an alternative trigger point yet. But I do overall like the idea
of dynamically changing the delay. And agree it is pretty conservative.


> 2. If we decided to gradually increase the rate of vacuuming instead of
> just removing the throttling all at once, what formula would we use
> and why would that be the right idea?


Well, since the idea of disabling the delay is on the table, we could raise
the cost every minute by X% until we effectively reach an infinite cost /
zero delay situation. I presume this would only affect currently running
vacs, and future ones would get the default cost until things get triggered
again?

Cheers,
Greg


Re: RFC: adding pytest as a supported test framework

2024-06-15 Thread Greg Sabino Mullane
On Fri, Jun 14, 2024 at 5:09 PM Jelte Fennema-Nio 
wrote:

> Test::More on the other hand, while indeed still maintained, it's
> definitely not getting significant new feature development or
> improvements[2]. Especially when comparing it to pytest[3].
>

That's fair, although it's a little hard to tell if the lack of new
features is because they are not needed for a stable, mature project, or
because few people are asking for and developing new features. Probably a
bit of both. But I'll be the first to admit Perl is dying; I just don't
know what should replace it (or how - or when). Python has its quirks, but
all languages do, and your claim that it will encourage more and easier
test writing by developers is a good one.

Cheers,
Greg


Re: RFC: adding pytest as a supported test framework

2024-06-15 Thread Greg Sabino Mullane
On Sat, Jun 15, 2024 at 12:48 PM Jelte Fennema-Nio 
wrote:

> Afaict, there's a significant part of our current community who feel the
> same way (and I'm pretty sure every sub-30 year old person who
> newly joins the community would feel the exact same way too).
>

Those young-uns are also the same group who hold their nose when coding in
C, and are always clamoring for rewriting Postgres in Rust. And before
that, C++. And next year, some other popular language that is clearly
better and more popular than C.

And I agree with Robbert that Python seems like the best choice for this
> other language, given its current popularity level. But as I said
> before, I'm open to other languages as well.
>

Despite my previous posts, I am open to other languages too, including
Python, but the onus is really on the new language promoters to prove that
the very large amount of time and trouble is worth it, and worth it for
language X.

Cheers,
Greg


Re: RFC: adding pytest as a supported test framework

2024-06-14 Thread Greg Sabino Mullane
On Thu, Jun 13, 2024 at 1:08 PM Jelte Fennema-Nio 
wrote:

> But Perl is at the next level of unmaintained infrastructure. It is
> actually clear how you can contribute to it, but still no new
> community members actually want to contribute to it. Also, it's not
> only unmaintained by us but it's also pretty much unmaintained by the
> upstream community.


I am not happy with the state of Perl, as it has made some MAJOR missteps
along the way, particularly in the last 5 years. But can we dispel this
strawman? There is a difference between "unpopular" and "unmaintained". The
latest version of Perl was released May 20, 2024. The latest release of
Test::More was April 25, 2024. Both are heavily used. Just not as heavily
as they used to be. :)

Cheers,
Greg


Re: RFC: adding pytest as a supported test framework

2024-06-13 Thread Greg Sabino Mullane
On Thu, Jun 13, 2024 at 9:38 AM Robert Haas  wrote:

> I agree with you, but I'm skeptical that solving it will be as easy as
> switching to Python. For whatever reason, it seems like every piece of
> infrastructure that the PostgreSQL community has suffers from severe
> neglect. Literally everything I know of either has one or maybe two
> very senior hackers maintaining it, or no maintainer at all.

...

> All of this stuff is critical project infrastructure and yet it feels like
> nobody wants to work on
> it.


I feel at least some of this is a visibility / marketing problem. I've not
seen any dire requests for help come across on the lists, nor things on the
various todos/road maps/ blog posts people make from time to time. If I
had, I would have jumped in. And for the record, I'm very proficient with
Perl.

Cheers,
Greg


Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs

2024-06-05 Thread Greg Sabino Mullane
On Wed, Jun 5, 2024 at 9:03 PM Robert Haas  wrote:

>  It's a funny use of "max" and "min", because the max is really what we're
> trying to do and the min is what we end
> up with, and those terms don't necessarily bring those ideas to mind.


requested_protocol_version and minimum_protocol_version?


Re: Better error message when --single is not the first arg to postgres executable

2024-06-05 Thread Greg Sabino Mullane
On Wed, Jun 5, 2024 at 3:18 PM Nathan Bossart 
wrote:

> Could we remove the requirement that --single must be first?  I'm not
> thrilled about adding a list of "must be first" options that needs to stay
> updated, but given this list probably doesn't change too frequently, maybe
> that's still better than a more invasive patch to allow specifying these
> options in any order...
>

It would be nice, and I briefly looked into removing the "first"
requirement, but src/backend/tcop/postgres.c for one assumes that --single
is always argv[1], and it seemed not worth the extra effort to make it work
for argv[N] instead of argv[1]. I don't mind it being the first argument,
but that confusing error message needs to go.

Thanks,
Greg


Better error message when --single is not the first arg to postgres executable

2024-06-05 Thread Greg Sabino Mullane
Please find attached a quick patch to prevent this particularly bad error
message for running "postgres", when making the common mistake of
forgetting to put the "--single" option first because you added an earlier
arg (esp. datadir)

Current behavior:

$ ~/pg/bin/postgres -D ~/pg/data --single
2024-06-05 18:30:40.296 GMT [22934] FATAL:  --single requires a value

Improved behavior:

$ ~/pg/bin/postgres -D ~/pg/data --single
--single must be first argument.

I applied it for all the "first arg only" flags (boot, check,
describe-config, and fork), as they suffer the same fate.

Cheers,
Greg


0001-Give-a-more-accurate-error-message-if-single-not-number-one.patch
Description: Binary data


Re: Logging which local address was connected to in log_line_prefix

2024-05-24 Thread Greg Sabino Mullane
Peter, thank you for the feedback. Attached is a new patch with "address"
rather than "interface", plus a new default of "local" if there is no
address. I also removed the questionable comment, and updated the
commitfest title.

Cheers,
Greg
From bfa69fc2fffcb29dee0c6acfa4fc3749f987b272 Mon Sep 17 00:00:00 2001
From: Greg Sabino Mullane 
Date: Fri, 24 May 2024 11:25:48 -0400
Subject: [PATCH] Add local address to log_line_prefix

---
 doc/src/sgml/config.sgml  |  5 
 src/backend/utils/error/elog.c| 25 +++
 src/backend/utils/misc/postgresql.conf.sample |  1 +
 3 files changed, 31 insertions(+)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 698169afdb..d0b5e4d9ea 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7470,6 +7470,11 @@ local0.*/var/log/postgresql
  Remote host name or IP address
  yes
 
+
+ %L
+ Local address
+ yes
+
 
  %b
  Backend type
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index d91a85cb2d..b1525d901c 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -67,6 +67,7 @@
 #endif
 
 #include "access/xact.h"
+#include "common/ip.h"
 #include "libpq/libpq.h"
 #include "libpq/pqformat.h"
 #include "mb/pg_wchar.h"
@@ -79,6 +80,7 @@
 #include "storage/ipc.h"
 #include "storage/proc.h"
 #include "tcop/tcopprot.h"
+#include "utils/builtins.h"
 #include "utils/guc_hooks.h"
 #include "utils/memutils.h"
 #include "utils/ps_status.h"
@@ -3023,6 +3025,29 @@ log_status_format(StringInfo buf, const char *format, ErrorData *edata)
 	appendStringInfoSpaces(buf,
 		   padding > 0 ? padding : -padding);
 break;
+			case 'L':
+if (MyProcPort
+	&& (MyProcPort->laddr.addr.ss_family == AF_INET
+		||
+		MyProcPort->laddr.addr.ss_family == AF_INET6)
+	)
+{
+	Port *port = MyProcPort;
+	char local_host[NI_MAXHOST];
+
+	local_host[0] = '\0';
+
+	if (0 == pg_getnameinfo_all(&port->laddr.addr, port->laddr.salen,
+			   local_host, sizeof(local_host),
+			   NULL, 0,
+			   NI_NUMERICHOST | NI_NUMERICSERV)
+		)
+		appendStringInfo(buf, "%s", local_host);
+}
+else
+	appendStringInfo(buf, "[local]");
+
+break;
 			case 'r':
 if (MyProcPort && MyProcPort->remote_host)
 {
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 83d5df8e46..85a9c59116 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -587,6 +587,7 @@
 	#   %d = database name
 	#   %r = remote host and port
 	#   %h = remote host
+#   %L = local address
 	#   %b = backend type
 	#   %p = process ID
 	#   %P = process ID of parallel group leader
-- 
2.30.2



Re: commitfest.postgresql.org is no longer fit for purpose

2024-05-17 Thread Greg Sabino Mullane
On Fri, May 17, 2024 at 7:11 AM Tomas Vondra 
wrote:

> It does seem to me a part of the solution needs to be helping to get
> those patches reviewed. I don't know how to do that, but perhaps there's
> a way to encourage people to review more stuff, or review stuff from a
> wider range of contributors. Say by treating reviews more like proper
> contributions.


This is a huge problem. I've been in the situation before where I had some
cycles to do a review, but actually finding one to review is
super-difficult. You simply cannot tell without clicking on the link and
wading through the email thread. Granted, it's easy as an
occasional reviewer to simply disregard potential patches if the email
thread is over a certain size, but it's still a lot of work. Having some
sort of summary/status field would be great, even if not everything was
labelled. It would also be nice if simpler patches were NOT picked up by
experienced hackers, as we want to encourage new/inexperienced people, and
having some "easy to review" patches available will help people gain
confidence and grow.


> Long time ago there was a "rule" that people submitting patches are
> expected to do reviews. Perhaps we should be more strict this.
>

Big -1. How would we even be more strict about this? Public shaming?
Withholding a commit?

Cheers,
Greg


Re: Logging which interface was connected to in log_line_prefix

2024-05-01 Thread Greg Sabino Mullane
Thank you for taking the time to review this. I've attached a new rebased
version, which has no significant changes.


> There is a comment in the patch that states:
>
> /* We do not need clean_ipv6_addr here: just report verbatim */
>
> I am not quite sure what it means, but I am guessing it means that the
> patch does not need to format the IPv6 addresses in any specific way.


Yes, basically correct. There is a kluge (their word, not mine) in
utils/adt/network.c to strip the zone - see the comment for the
clean_ipv6_addr() function in that file. I added the patch comment in case
some future person wonders why we don't "clean up" the ipv6 address, like
other places in the code base do. We don't need to pass it back to anything
else, so we can simply output the correct version, zone and all.

Cheers,
Greg


add_local_interface_to_log_line_prefix.v2.patch
Description: Binary data


Re: Tarball builds in the new world order

2024-04-24 Thread Greg Sabino Mullane
On Tue, Apr 23, 2024 at 6:06 PM Tom Lane  wrote:

> This change seems like a good thing anyway for anyone who's tempted
> to use "make dist" manually, since they wouldn't necessarily want
> to package HEAD either.  Now, if we just do it exactly like that
> then trying to "make dist" without setting PG_COMMIT_HASH will
> fail, since "git archive" has no default for its 
> argument.  I can't quite decide if that's a good thing, or if we
> should hack the makefile a little further to allow PG_COMMIT_HASH
> to default to HEAD.
>

Just having it fail seems harsh. What if we had plain "make dist" at least
output a friendly hint about "please specify a hash"? That seems better
than an implicit HEAD default, as they can manually set it to HEAD
themselves per the hint.

Cheers,
Greg


Re: psql: Greatly speed up "\d tablename" when not using regexes

2024-04-10 Thread Greg Sabino Mullane
Patch looks good to me. Great idea overall, that forced regex has always
bugged me.

+   char   *regexChars = "|*+?()[]{}.^$\\";


One super minor optimization is that we technically do not need to scan for
')' and ']'. If they appear without their partner, the query will fail
anyway. :)
Cheers,
Greg


Re: Security lessons from liblzma

2024-04-04 Thread Greg Sabino Mullane
>
> It would be better if we created the required test files as part of the
> test run.  (Why not?  Too slow?)  Alternatively, I have been thinking
> that maybe we could make the output more reproducible by messing with
> whatever random seed OpenSSL uses.  Or maybe use a Python library to
> create the files.  Some things to think about.
>

I think this last idea is the way to go. I've hand-crafted GIF images and
PGP messages in the past; surely we have enough combined brain power around
here to craft our own SSL files? It may even be a wheel that someone has
invented already.

Cheers,
Greg


Re: Reports on obsolete Postgres versions

2024-04-04 Thread Greg Sabino Mullane
On Thu, Apr 4, 2024 at 2:23 PM Bruce Momjian  wrote:

> -end-of-life (EOL) and no longer supported.
> +after its initial release. After this, a final minor version will be
> released
> +and the software will then be unsupported (end-of-life).


Would be a shame to lose the EOL acronym.

+Such upgrades might require manual changes to complete so always read
> +the release notes first.


Proposal:
"Such upgrades might require additional steps, so always read the release
notes first."

I went with frequently-encountered and low risk bugs".
>

But neither of those classifications are really true. Especially the "low
risk" part - I could see various ways a reader could wrongly interpret that.

Cheers,
Greg


Re: On disable_cost

2024-04-03 Thread Greg Sabino Mullane
On Wed, Apr 3, 2024 at 3:21 PM Robert Haas  wrote:

> It's also pretty clear to me that the fact that enable_indexscan
> and enable_indexonlyscan work completely differently from each other
> is surprising at best, wrong at worst, but here again, what this patch
> does about that is not above reproach.


Yes, that is wrong, surely there is a reason we have two vars. Thanks for
digging into this: if nothing else, the code will be better for this
discussion, even if we do nothing for now with disable_cost.

Cheers,
Greg


Re: On disable_cost

2024-04-02 Thread Greg Sabino Mullane
On Mon, Apr 1, 2024 at 7:54 PM Robert Haas  wrote:

> What I think we're mostly doing in the regression tests is shutting
> off every relevant type of plan except one. I theorize that what we
> actually want to do is tell the planner what we do want to happen,
> rather than what we don't want to happen, but we've got this weird set
> of GUCs that do the opposite of that and we're super-attached to them
> because they've existed forever.


So rather than listing all the things we don't want to happen, we need a
way to force (nay, highly encourage) a particular solution. As our costing
is a based on positive numbers, what if we did something like this in
costsize.c?

 Costdisable_cost = 1.0e10;
 Costpromotion_cost = 1.0e10; // or higher or lower, depending on
how strongly we want to "beat" disable_costs effects.
...

if (!enable_seqscan)
startup_cost += disable_cost;
else if (promote_seqscan)
startup_cost -= promotion_cost; // or replace "promote" with
"encourage"?


Cheers,
Greg


Re: Possibility to disable `ALTER SYSTEM`

2024-03-27 Thread Greg Sabino Mullane
>
> The purpose of the setting is to prevent accidental
> modifications via ALTER SYSTEM in environments where


The emphasis on 'accidental' seems a bit heavy here, and odd. Surely, just
"to prevent modifications via ALTER SYSTEM in environments where..." is
enough?

Cheers,
Greg


Re: Adding comments to help understand psql hidden queries

2024-03-22 Thread Greg Sabino Mullane
On Fri, Mar 22, 2024 at 11:39 AM David Christensen 
wrote:

> I think it's easier to keep the widths balanced than constant (patch
> version included here)


Yeah, I'm fine with that, especially because nobody is translating it, nor
are they likely to, to be honest.

Cheers,
Greg


Re: Adding comments to help understand psql hidden queries

2024-03-22 Thread Greg Sabino Mullane
On Thu, Mar 21, 2024 at 6:20 PM Peter Eisentraut 
wrote:

> lines are supposed to align vertically.  With your patch, the first line
> would have variable length depending on the command.
>

Yes, that is a good point. Aligning those would be quite tricky, what if we
just kept a standard width for the closing query? Probably the 24 stars we
currently have to match "QUERY", which it appears nobody has changed for
translation purposes yet anyway. (If I am reading the code correctly, it
would be up to the translators to maintain the vertical alignment).

Cheers,
Greg


Re: Avoiding inadvertent debugging mode for pgbench

2024-03-20 Thread Greg Sabino Mullane
My mistake. Attached please find version 3, which should hopefully make
cfbot happy again.


pgbench.dash.d.or.not.dash.d.v3.patch
Description: Binary data


Re: Possibility to disable `ALTER SYSTEM`

2024-03-20 Thread Greg Sabino Mullane
>
> As a bonus, if that GUC is set, we could even check at server startup that
> all the configuration files are not writable by the postgres user,
> and print a warning or refuse to start up if they are.
>

Ugh, please let's not do this. This was bouncing around in my head last
night, and this is really a quite radical change - especially just to
handle the given ask, which is to prevent a specific command from running.
Not implement a brand new security system. There are so many ways this
could go wrong if we start having separate permissions for some of our
files. In addition to backups and other tools that need to write to the
conf files as the postgres user, what about systems that create a new
cluster automatically e.g. Patroni? It will now need elevated privs just to
create the conf files and assign the new ownership to them. Lots of moving
pieces there and ways things could go wrong. So a big -1 from me, as they
say/ :)

Cheers,
Greg


Re: Avoiding inadvertent debugging mode for pgbench

2024-03-19 Thread Greg Sabino Mullane
Rebased version attached (v2), with another sentence in the sgml to explain
the optional use of -d

Cheers,
Greg


pgbench.dash.d.or.not.dash.d.v2.patch
Description: Binary data


Re: Possibility to disable `ALTER SYSTEM`

2024-03-19 Thread Greg Sabino Mullane
On Tue, Mar 19, 2024 at 12:05 PM Tom Lane  wrote:

> If you aren't willing to build a solution that blocks off mods
> using COPY TO FILE/PROGRAM and other readily-available-to-superusers
> tools (plpythonu for instance), I think you shouldn't bother asking
> for a feature at all.  Just trust your superusers.
>

There is a huge gap between using a well-documented standard tool like
ALTER SYSTEM and going out of your way to modify the configuration files
through trickery. I think we need to only solve the former as in "hey,
please don't do that because your changes will be overwritten"

Cheers,
Greg


Re: Possibility to disable `ALTER SYSTEM`

2024-03-18 Thread Greg Sabino Mullane
Going to agree with Robert Treat here about an extension being a great
solution. I resisted posting earlier as I wanted to see how this all pans
out, but I wrote a quick little POC extension some months ago that does
the disabling and works well (and cannot be easily worked around).

On Mon, Mar 18, 2024 at 4:59 PM Robert Haas  wrote:

> I think that all of this is true except for (c). I think we'd need a
> new hook to make it work.
>

Seems we can just use ProcessUtility and:
if (IsA(parsetree, AlterSystemStmt) { ereport(ERROR, ...

When we know that a feature is
> widely-needed, it's better to have one good implementation of it in
> core than several perhaps not-so-good implementations out of core.
>

Meh, maybe. This one seems pretty dirt simple. Granted, I have expanded my
original POC to allow *some* things to be changed by ALTER SYSTEM, but the
original use case warrants a very small extension.

That allows us to focus all of our efforts on that one implementation
> instead of splitting them across several -- which is the whole selling
> point of open source, really -- and it makes it easier for users who
> want the feature to get access to it.
>

Well, yeah, but they have to wait until version 18 at best, while an
extension can run on any current version and probably be pretty
future-proof as well.

Cheers,
Greg


Re: Reports on obsolete Postgres versions

2024-03-11 Thread Greg Sabino Mullane
On Mon, Mar 11, 2024 at 4:38 PM Bruce Momjian  wrote:

> https://www.postgresql.org/support/versioning/
>
> This web page should correct the idea that "upgrades are more risky than
> staying with existing versions".  Is there more we can do?  Should we have
> a more consistent response for such reporters?
>

It could be helpful to remove this sentence:

"Upgrading to a minor release does not normally require a dump and restore"

While technically true, "not normally" is quite the understatement, as the
true answer is "never" or at least "not in the last few decades". I have a
hard time even imagining a scenario that would require a minor revision to
do a dump and restore - surely, that in itself would warrant a major
release?


> It would be a crazy idea to report something in the logs if a major
> version is run after a certain date, since we know the date when major
> versions will become unsupported.
>

Could indeed be useful to spit something out at startup. Heck, even minor
versions are fairly regular now. Sure would be nice to be able to point a
client at the database and say "See? Even Postgres itself thinks you should
upgrade from 11.3!!" (totally made up example, not at all related to an
actual production system /sarcasm)

Cheers,
Greg


Logging which interface was connected to in log_line_prefix

2024-03-06 Thread Greg Sabino Mullane
Someone on -general was asking about this, as they are listening on
multiple IPs and would like to know which exact one clients were hitting. I
took a quick look and we already have that information, so I grabbed some
stuff from inet_server_addr and added it as part of a "%L" (for 'local
interface'). Quick patch / POC attached.

Cheers,
Greg


add_local_interface_to_log_line_prefix.v1.patch
Description: Binary data


Re: Reducing the log spam

2024-03-06 Thread Greg Sabino Mullane
On Tue, Mar 5, 2024 at 7:55 AM Laurenz Albe 
wrote:

> My experience from the field is that a lot of log spam looks like
>
>   database/table/... "xy" does not exist
>   duplicate key value violates unique constraint "xy"


Forcibly hiding those at the Postgres level seems a heavy hammer for what
is ultimately an application problem.

Tell me about a system that logs different classes of errors to different
log files, and I'm interested again.

Cheers,
Greg


Avoiding inadvertent debugging mode for pgbench

2024-02-29 Thread Greg Sabino Mullane
Attached please find a patch to adjust the behavior of the pgbench program
and make it behave like the other programs that connect to a database
(namely, psql and pg_dump). Specifically, add support for using -d and
--dbname to specify the name of the database. This means that -d can no
longer be used to turn on debugging mode, and the long option --debug must
be used instead.

This removes a long-standing footgun, in which people assume that the -d
option behaves the same as other programs. Indeed, because it takes no
arguments, and because the first non-option argument is the database name,
it still appears to work. However, people then wonder why pgbench is so
darn verbose all the time! :)

This is a breaking change, but fixing it this way seems to have the least
total impact, as the number of people using the debug mode of pgbench is
likely quite small. Further, those already using the long option are
unaffected, and those using the short one simply need to replace '-d' with
'--debug', arguably making their scripts a little more self-documenting in
the process.

Cheers,
Greg


pgbench.dash.d.or.not.dash.d.v1.patch
Description: Binary data


Re: An improved README experience for PostgreSQL

2024-02-28 Thread Greg Sabino Mullane
+1 on the general idea. Maybe make that COPYRIGHT link go to an absolute
URI, like all the other links, in case this file gets copied somewhere?
Perhaps point it to https://www.postgresql.org/about/licence/
Cheers,
Greg


Re: What about Perl autodie?

2024-02-08 Thread Greg Sabino Mullane
>
> 2. Don't wait, migrate them all now.  This would mean requiring
> Perl 5.10.1 or later to run the TAP tests, even in back branches.
>

#2 please. For context, meson did not even exist in 2009.

Cheers,
Greg


Re: What about Perl autodie?

2024-02-07 Thread Greg Sabino Mullane
On Wed, Feb 7, 2024 at 9:05 AM Peter Eisentraut 
wrote:

> I came across the Perl autodie pragma
> (https://perldoc.perl.org/autodie).  This seems pretty useful; is this
> something we can use?  Any drawbacks?  Any minimum Perl version?


Big +1

No drawbacks. I've been using it heavily for many, many years. Came out in
5.10.1,
which should be available everywhere at this point (2009 was the year of
release)

Cheers,
Greg


Adding comments to help understand psql hidden queries

2023-12-11 Thread Greg Sabino Mullane
The use of the --echo-hidden flag in psql is used to show people the way
psql performs its magic for its backslash commands. None of them has more
magic than "\d relation", but it suffers from needing a lot of separate
queries to gather all of the information it needs. Unfortunately, those
queries can get overwhelming and hard to figure out which one does what,
especially for those not already very familiar with the system catalogs.
Attached is a patch to add a small SQL comment to the top of each SELECT
query inside describeOneTableDetail. All other functions use a single
query, and thus need no additional context. But "\d mytable" has the
potential to run over a dozen SQL queries! The new format looks like this:

/ QUERY */
/* Get information about row-level policies */
SELECT pol.polname, pol.polpermissive,
  CASE WHEN pol.polroles = '{0}' THEN NULL ELSE
pg_catalog.array_to_string(array(select rolname from pg_catalog.pg_roles
where oid = any (pol.polroles) order by 1),',') END,
  pg_catalog.pg_get_expr(pol.polqual, pol.polrelid),
  pg_catalog.pg_get_expr(pol.polwithcheck, pol.polrelid),
  CASE pol.polcmd
WHEN 'r' THEN 'SELECT'
WHEN 'a' THEN 'INSERT'
WHEN 'w' THEN 'UPDATE'
WHEN 'd' THEN 'DELETE'
END AS cmd
FROM pg_catalog.pg_policy pol
WHERE pol.polrelid = '134384' ORDER BY 1;
//

Cheers,
Greg


psql.echo.hidden.comments.v1.patch
Description: Binary data


Adding ordering to list of available extensions

2023-10-30 Thread Greg Sabino Mullane
Please find attached a patch to provide some basic ordering to the system
views pg_available_extensions and pg_available_extension_versions. It is
sorely tempting to add ORDER BYs to many of the other views in that file,
but I understand that would be contentious as there are reasons for not
adding an ORDER BY. However, in the case of pg_available_extensions, it's a
very, very small resultset, with an obvious default ordering, and extremely
unlikely to be a part of a larger complex query. It's much more likely
people like myself are just doing a "SELECT * FROM pg_available_extensions"
and then get annoyed at the random ordering.

Cheers,
Greg


show_extensions_in_natural_order.pg.patch
Description: Binary data


Re: [PATCH] Add inline comments to the pg_hba_file_rules view

2023-09-26 Thread Greg Sabino Mullane
Also a reluctant -1, as the comment-at-EOL style is very rare in my
experience over the years of seeing many a pg_hba file.


Re: Make --help output fit within 80 columns per line

2023-09-18 Thread Greg Sabino Mullane
On Fri, Sep 15, 2023 at 11:11 AM torikoshia 
wrote:

> I do not intend to adhere to this rule(my terminals are usually bigger
> than 80 chars per line), but wouldn't it be a not bad direction to use
> 80 characters for all commands?
>

Well, that's the question du jour, isn't it? The 80 character limit is
based on punch cards, and really has no place in modern systems. While gnu
systems are stuck in the past, many other ones have moved on to more
sensible defaults:

$ wget --help | wc -L
110

$ gcloud --help | wc -L
122

$ yum --help | wc -L
122

git is an interesting one, as they force things through a pager for their
help, but if you look at their raw help text files, they have plenty of
times they go past 80 when needed:

$ wc -L git/Documentation/git-*.txt | sort -g | tail -20
109 git-filter-branch.txt
109 git-rebase.txt
116 git-diff-index.txt
116 git-http-fetch.txt
117 git-restore.txt
122 git-checkout.txt
122 git-ls-tree.txt
129 git-init-db.txt
131 git-push.txt
132 git-update-ref.txt
142 git-maintenance.txt
144 git-interpret-trailers.txt
146 git-cat-file.txt
148 git-repack.txt
161 git-config.txt
162 git-notes.txt
205 git-stash.txt
251 git-submodule.txt

So in summary, I think 80 is a decent soft limit, but let's not stress out
about some lines going over that, and make a hard limit of perhaps 120.

See also: https://hilton.org.uk/blog/source-code-line-length

Cheers,
Greg

P.S. I know this won't change anything right away, but it will get the
conversation started, so we can escape the inertia of punch cards / VT100
terminals someday. :)


Re: Make --help output fit within 80 columns per line

2023-09-13 Thread Greg Sabino Mullane
On Tue, Jul 4, 2023 at 9:47 PM torikoshia 
wrote:

> Since it seems preferable to have consistent line break policy and some
> people use 80-column terminal, wouldn't it be better to make all commands
> in 80
> columns per line?
>

All this seems an awful lot of work to support this mythical 80-column
terminal user. It's 2023, perhaps it's time to widen the default assumption
past 80 characters?

Cheers,
Greg


Re: Possibility to disable `ALTER SYSTEM`

2023-09-13 Thread Greg Sabino Mullane
Seems to be some resistance to getting this in core, so why not just use an
extension? I was able to create a quick POC to do just that. Hook into PG
and look for AlterSystemStmt, throw a "Sorry, ALTER SYSTEM is not currently
allowed" error. Put into shared_preload_libraries and you're done. As a
bonus, works on all supported versions, so no need to wait for Postgres 17
- or Postgres 18/19 given the feature drift this thread is experiencing :)

Cheers,
Greg


Re: Prevent psql \watch from running queries that return no rows

2023-08-22 Thread Greg Sabino Mullane
Thank you for the feedback, everyone. Attached is version 4 of the patch,
featuring a few tests and minor rewordings.

Cheers,
Greg


psql_watch_exit_on_zero_rows_v4.patch
Description: Binary data


Re: Improve pg_stat_statements by making jumble handle savepoint names better

2023-07-25 Thread Greg Sabino Mullane
On Mon, Jul 24, 2023 at 6:46 PM Michael Paquier  wrote:

> Shouldn't this new field be marked as query_jumble_location
>

Yes, it should. I had some trouble getting it to work that way in the first
place, but now I realize it was just my unfamiliarity with this part of the
code. So thanks for the hint: v2 of the patch is much simplified by adding
two attributes to theTransactionStmt node. I've also added some tests per
your suggestion.

Unrelated to this patch, I'm struggling with meson testing. Why doesn't
this update the postgres test binary?:

meson test --suite pg_stat_statements

It runs "ninja" as expected, but it does not put a new
build/tmp_install/home/greg/pg/17/bin/postgres in place until I do a "meson
test"

Cheers,
Greg


savepoint_jumble.v2.patch
Description: Binary data


Improve pg_stat_statements by making jumble handle savepoint names better

2023-07-24 Thread Greg Sabino Mullane
Please find attached a patch to jumble savepoint name, to prevent certain
transactional commands from filling up pg_stat_statements.
This has been a problem with some busy systems that use django, which likes
to wrap everything in uniquely named savepoints. Soon, over 50% of
your pg_stat_statements buffer is filled with savepoint stuff, pushing out
the more useful queries. As each query is unique, it looks like this:

postgres=# select calls, query, queryid from pg_stat_statements where query
~ 'save|release|rollback' order by 2;

 calls |query |   queryid
---+--+--
 1 | release b900150983cd24fb0d6963f7d28e17f7 |  8797482500264589878
 1 | release ed9407630eb1000c0f6b63842defa7de | -9206510099095862114
 1 | rollback | -2049453941623996126
 1 | rollback to c900150983cd24fb0d6963f7d28e17f7 | -5335832667999552746
 1 | savepoint b900150983cd24fb0d6963f7d28e17f7   | -117254996647181
 1 | savepoint c47bce5c74f589f4867dbd57e9ca9f80   |   355123032993044571
 1 | savepoint c900150983cd24fb0d6963f7d28e17f7   | -5921314469994822125
 1 | savepoint d8f8e0260c64418510cefb2b06eee5cd   |  -981090856656063578
 1 | savepoint ed9407630eb1000c0f6b63842defa7de   |   -25952890433218603

As the actual name of the savepoint is not particularly useful, the patch
will basically ignore the savepoint name and allow things to be collapsed:

 calls | query  |   queryid
---++--
 2 | release $1 | -7998168840889089775
 1 | rollback   |  3749380189022910195
 1 | rollback to $1 | -1816677871228308673
 5 | savepoint $1   |  6160699978368237767

Without the patch, the only solution is to keep raising
pg_stat_statements.max to larger and larger values to compensate for the
pollution of the
statement pool.

Cheers,
Greg


savepoint_jumble.v1.patch
Description: Binary data


Re: Forgive trailing semicolons inside of config files

2023-07-11 Thread Greg Sabino Mullane
On Tue, Jul 11, 2023 at 11:04 AM Isaac Morland 
wrote:

> Please, no!
>
> There is no end to accepting sloppy syntax. What next, allow "SET
> random_page_cost = 2.5;" (with or without semicolon) in config files?
>

Well yes, there is an end. A single, trailing semicolon. Full stop. It's
not a slippery slope in which we end up asking the AI parser to interpret
our haikus to derive the actual value. The postgresql.conf file is not some
finicky YAML/JSON beast - we already support some looseness in quoting or
not quoting values, optional whitespace, etc. Think of the trailing
semicolon as whitespace, if you like. You can see from the patch that this
does not replace EOL/EOF.


> I'd be more interested in improvements in visibility of errors. For
> example, maybe if I try to start the server and there is a config file
> problem, I could somehow get a straightforward error message right in the
> terminal window complaining about the line of the configuration which is
> wrong.
>

That ship has long since sailed. We already send a detailed error message
with the line number, but in today's world of "service start", "systemctl
start", and higher level of control such as Patroni and Kubernetes, getting
things to show in a terminal window isn't happening. We can't work around
2>&1.


> Or maybe there could be a "check configuration" subcommand which checks
> the configuration.
>

There are things inside of Postgres once it has started, but yeah,
something akin to visudo would be nice for editing config files.


> But I think it would be way more useful than a potentially never-ending
> series of patches to liberalize the config parser.
>

It's a single semicolon, not a sign of the parser apocalypse. I've no plans
for future enhancements, but if they do no harm and make Postgres more user
friendly, I will support them.

Cheers,
Greg


Forgive trailing semicolons inside of config files

2023-07-11 Thread Greg Sabino Mullane
This has been a long-standing annoyance of mine. Who hasn't done something
like this?:

psql> SET random_page_cost = 2.5;
(do some stuff, realize that rpc was too high)

Let's put that inside of postgresql.conf:

#--
# CUSTOMIZED OPTIONS
#--

# Add settings for extensions here

random_page_cost = 2.5;


Boom! Server will not start. Surely, we can be a little more liberal in
what we accept? Attached patch allows a single trailing semicolon to be
silently discarded. As this parsing happens before the logging collector
starts up, the error about the semicolon is often buried somewhere in a
separate logfile or journald - so let's just allow postgres to start up
since there is no ambiguity about what random_page_cost (or any other GUC)
is meant to be set to.

I also considered doing an additional ereport(LOG) when we find one, but
seemed better on reflection to simply ignore it.

Cheers,
Greg


postgresql.conf_allow_trailing_semicolon.v1.patch
Description: Binary data


Re: Prevent psql \watch from running queries that return no rows

2023-07-05 Thread Greg Sabino Mullane
Thanks for the feedback!

On Wed, Jul 5, 2023 at 5:51 AM Daniel Gustafsson  wrote:

>
> The comment on ExecQueryAndProcessResults() needs to be updated with an
> explanation of what this parameter is.
>

I added a comment in the place where min_rows is used, but not sure what
you mean by adding it to the main comment at the top of the function?  None
of the other args are explained there, even the non-intuitive ones  (e.g.
svpt_gone_p)

-   return cancel_pressed ? 0 : success ? 1 : -1;
> +   return (cancel_pressed || return_early) ? 0 : success ? 1 : -1;
>
> I think this is getting tangled up enough that it should be replaced with
> separate if() statements for the various cases.
>

Would like to hear others weigh in, I think it's still only three states
plus a default, so I'm not convinced it warrants multiple statements yet. :)

+   HELP0("  \\watch [[i=]SEC] [c=N] [m=ROW]\n");
> +   HELP0("  execute query every SEC seconds,
> up to N times\n");
> +   HELP0("  stop if less than ROW minimum
> rows are rerturned\n");
>
> "less than ROW minimum rows" reads a bit awkward IMO, how about calling it
> [m=MIN] and describe as "stop if less than MIN rows are returned"?  Also,
> there
> is a typo: s/rerturned/returned/.
>

Great idea: changed and will attach a new patch

Cheers,
Greg


psql_watch_exit_on_zero_rows_v3.patch
Description: Binary data


Re: Bytea PL/Perl transform

2023-06-22 Thread Greg Sabino Mullane
>
> So I decided to propose a simple transform extension to pass bytea as
> native Perl octet strings.


Quick review, mostly housekeeping things:

* Needs a rebase, minor failure on Mkvcbuild.pm
* Code needs standardized formatting, esp. bytea_plperl.c
* Needs to be meson-i-fied (i.e. add a "meson.build" file)
* Do all of these transforms need to be their own contrib modules? So much
duplicated code across contrib/*_plperl already (and *plpython too for that
matter) ...

Cheers,
Greg


Re: Bypassing shared_buffers

2023-06-17 Thread Greg Sabino Mullane
On Thu, Jun 15, 2023 at 4:16 AM Vladimir Churyukin 
wrote:

> We're trying to see what is the worst performance in terms of I/O, i.e.
>> when the database just started up or the data/indexes being queried are not
>> cached at all.
>
>
You could create new tables that are copies of the existing ones (CREATE
TABLE foo as SELECT * FROM ...), create new indexes, and run a query on
those. Use schemas and search_path to keep the queries the same. No restart
needed! (just potentially lots of I/O, time, and disk space :) Don't forget
to do explain (analyze, buffers) to double check things.


Re: Let's make PostgreSQL multi-threaded

2023-06-08 Thread Greg Sabino Mullane
On Thu, Jun 8, 2023 at 8:44 AM Hannu Krosing  wrote:

> Do we have any statistics for the distribution of our user base ?
>
> My gut feeling says that for performance-critical use the non-Linux is
> in low single digits at best.
>

Stats are probably not possible, but based on years of consulting, as well
as watching places like SO, Slack, IRC, etc. over the years, IMO that's a
very accurate gut feeling. I'd hazard 1% or less for non-Linux systems.

Cheers,
Greg


Re: Prevent psql \watch from running queries that return no rows

2023-06-04 Thread Greg Sabino Mullane
On Sat, Jun 3, 2023 at 5:58 PM Michael Paquier  wrote:

>
> Wouldn't something like a target_rows be more flexible?  You could use
> this parameter with a target number of rows to expect, zero being one
> choice in that.
>

Thank you! That does feel better to me. Please see attached a new v2 patch
that uses
a min_rows=X syntax (defaults to 0). Also added some help.c changes.

Cheers,
Greg


psql_watch_exit_on_zero_rows_v2.patch
Description: Binary data


Prevent psql \watch from running queries that return no rows

2023-06-02 Thread Greg Sabino Mullane
Attached is a patch to allow a new behavior for the \watch command in psql.
When enabled, this instructs \watch to stop running once the query returns
zero rows. The use case is the scenario in which you are watching the
output of something long-running such as pg_stat_progress_create_index, but
once it finishes you don't need thousands of runs showing empty rows from
the view.

This adds a new argument "zero" to the existing i=SEC and c=N arguments

Notes:

* Not completely convinced of the name "zero" (better than
"stop_when_no_rows_returned"). Considered adding a new x=y argument, or
overloading c (c=-1) but neither seemed very intuitive. On the other hand,
it's tempting to stick to a single method moving forward, although this is
a boolean option not a x=y one like the other two.

* Did not update help.c on purpose - no need to make \watch span two lines
there.

* Considered leaving early (e.g. don't display the last empty result) but
seemed better to show the final empty result as an explicit confirmation as
to why it stopped.

* Quick way to test:
select * from pg_stat_activity where backend_start > now() - '20
seconds'::interval;
\watch zero

Cheers,
Greg


psql_watch_exit_on_zero_rows_v1.patch
Description: Binary data


Re: PostgreSQL 14 press release draft

2021-09-22 Thread Greg Sabino Mullane
Some super quick nitpicks; feel free to ignore/apply/laugh off.

...
administrators to deploy their data-backed applications. PostgreSQL
continues to
add innovations on complex data types, including more conveniences for
accessing
JSON and support for noncontiguous ranges of data. This latest release also
adds
to PostgreSQL's trend on improvements for high performance and distributed
data workloads, with advances in support for connection concurrency,
high-write
workloads, query parallelism and logical replication.

>> add innovations on complex data types -> add innovations to? "on" sounds
odd

>> comma after "query parallelism" please

now works. This aligns PostgreSQL with commonly recognized syntax for
retrieving information from JSON data. The subscripting framework added to
PostgreSQL 14 can be generally extended to other nested data structures,
and is
also applied to the `hstore` data type in this release.

>> with commonly recognized syntax -> with the commonly recognized syntax

>> hyperlink hstore?


[Range types](https://www.postgresql.org/docs/14/rangetypes.html), also
first
released in PostgreSQL 9.2, now have support for noncontiguous ranges
through
the introduction of the "[multirange](
https://www.postgresql.org/docs/14/rangetypes.html#RANGETYPES-BUILTIN)".
A multirange is an ordered list of ranges that are nonoverlapping, which
allows
for developers to write simpler queries for dealing with complex sequences
of

>> introduction of the multirange -> introduction of the multirange type

>> which allows for developers to write -> which lets developers write


on the recent improvements to the overall management of B-tree indexes by

>> to the overall management -> to the management

operations. As this is a client-side feature, you can use pipeline mode
with any
modern PostgreSQL database so long as you use the version 14 client.

>> more complicated than "version 14 client", more like - if the
application has explicit
>> support for it and was compiled via libpq against PG 14.
>> Just don't want to overpromise here

[Foreign data wrappers](
https://www.postgresql.org/docs/14/sql-createforeigndatawrapper.html),
which are used for working with federated workloads across PostgreSQL and
other

>> which are used for -> used for

In addition to supporting query parallelism, `postgres_fdw` can now also
bulk
insert data on foreign tables and import table partitions with the
[`IMPORT FOREIGN SCHEMA`](
https://www.postgresql.org/docs/14/sql-importforeignschema.html)
directive.

>> can now also bulk insert -> can now bulk insert

>> on foreign tables and import -> on foreign tables, and can import

PostgreSQL 14 extends its performance gains to the [vacuuming](
https://www.postgresql.org/docs/14/routine-vacuuming.html)
system, including optimizations for reducing overhead from B-Trees.

>> B-Tree or B-tree - pick one (latter used earlier in this doc)

lets you uniquely track a query through several PostgreSQL systems,
including
[`pg_stat_activity`](
https://www.postgresql.org/docs/14/monitoring-stats.html#MONITORING-PG-STAT-ACTIVITY-VIEW
),
[`EXPLAIN VERBOSE`](https://www.postgresql.org/docs/14/sql-explain.html),
and
through several logging functions.

>> "several PostgreSQL systems" sounds weird.

>> "several logging functions" - not sure what this means


parallel queries when using the `RETURN QUERY` directive, and enabling

>> directive -> command

now benefit from incremental sorts, a feature that was introduced in
[PostgreSQL 13](
https://www.postgresql.org/about/news/postgresql-13-released-2077/).

>> that was introduced in -> introduced in

function. This release also adds the SQL conforming
[`SEARCH`](
https://www.postgresql.org/docs/14/queries-with.html#QUERIES-WITH-SEARCH)
and [`CYCLE`](
https://www.postgresql.org/docs/14/queries-with.html#QUERIES-WITH-CYCLE)
directives to help with ordering and cycle detection for recursive
[common table expressions](
https://www.postgresql.org/docs/14/queries-with.html#QUERIES-WITH-RECURSIVE
).

>> directives -> clauses


PostgreSQL 14 makes it convenient to assign read-only and write-only
privileges

>> convenient -> easy

companies and organizations. Built on over 30 years of engineering,
starting at

>> companies -> companies,

>> we claims 25 years at the top, 30 here. That's 5 non-open-source years?
:)


Re: Add option --drop-cascade for pg_dump/restore

2021-08-12 Thread Greg Sabino Mullane
On Wed, Aug 11, 2021 at 10:53 PM Wu Haotian  wrote:

> Maybe we can add checks like "option --clean requires plain text format"?
> If so, should I start a new mail thread for this?
>

Shrug. To me, that seems related enough it could go into the existing
patch/thread.

Cheers,
Greg


Re: make MaxBackends available in _PG_init

2021-08-12 Thread Greg Sabino Mullane
On Wed, Aug 11, 2021 at 10:08 AM Tom Lane  wrote:

> You must not have enabled EXEC_BACKEND properly.  It's a compile-time
> #define that affects multiple modules, so it's easy to get wrong.
> The way I usually turn it on is
>

Thank you. I was able to get it all working, and withdraw any objections to
that bit of the patch. :)

Cheers,
Greg


Re: make MaxBackends available in _PG_init

2021-08-11 Thread Greg Sabino Mullane
On Mon, Aug 9, 2021 at 8:22 PM Bossart, Nathan  wrote:

> > Is this going to get tripped by a call from restore_backend_variables?
>
> I ran 'make check-world' with EXEC_BACKEND with no problems, so I
> don't think so.
>

v3 looks good, but I'm still not sure how to test the bit mentioned above.
I'm not familiar with this part of the code (SubPostmasterMain etc.), but
running make check-world with EXEC_BACKEND does not seem to execute that
code, as I added exit(1) to restore_backend_variables() and the tests still
ran fine. Further digging shows that even though the #ifdef EXEC_BACKEND
path is triggered, no --fork argument was being passed. Is there something
else one needs to provide to force that --fork (see line 189 of
src/backend/main/main.c) when testing?

Cheers,
Greg


Re: Add option --drop-cascade for pg_dump/restore

2021-08-10 Thread Greg Sabino Mullane
On Fri, Jul 16, 2021 at 9:40 AM Tom Lane  wrote:

> That would require pg_restore to try to edit the DROP commands during
> restore, which sounds horribly fragile.  I'm inclined to think that
> supporting this option only during initial dump is safer.
>

Safer, but not nearly as useful. Maybe see what the OP (Wu Haotian) can
come up with as a first implementation?

Cheers,
Greg


Re: Avoid stuck of pbgench due to skipped transactions

2021-08-10 Thread Greg Sabino Mullane
Apologies, just saw this. I found no problems, those "failures" were just
me missing checkboxes on the commitfest interface. +1 on the patch.

Cheers,
Greg


Re: make MaxBackends available in _PG_init

2021-08-09 Thread Greg Sabino Mullane
On Sat, Aug 7, 2021 at 2:01 PM Bossart, Nathan  wrote:

> Here is a rebased version of the patch.
>

Giving this a review.

Patch applies cleanly and `make check` works as of
e12694523e7e4482a052236f12d3d8b58be9a22c

Overall looks very nice and tucks MaxBackends safely away.
I have a few suggestions:

> + size = add_size(size, mul_size(GetMaxBackends(0),
sizeof(BTOneVacInfo)));

The use of GetMaxBackends(0) looks weird - can we add another constant in
there
for the "default" case? Or just have GetMaxBackends() work?


> + GMB_MAX_PREPARED_XACTS = 1 << 0, /* include max_prepared_xacts */
s/include/add in/


>  +typedef enum GMBOption
> +{
> + GMB_MAX_PREPARED_XACTS = 1 << 0, /* include max_prepared_xacts */
> + GMB_NUM_AUXILIARY_PROCS = 1 << 1 /* include NUM_AUXILIARY_PROCS */
> +} GMBOption;

Is a typedef enum really needed? As opposed to something like this style:

#define WL_LATCH_SET (1 << 0)
#define WL_SOCKET_READABLE   (1 << 1)
#define WL_SOCKET_WRITEABLE  (1 << 2)


> -  (MaxBackends + max_prepared_xacts + 1));
> +  (GetMaxBackends(GMB_MAX_PREPARED_XACTS) + 1));

This is a little confusing - there is no indication to the reader that this
is an additive function. Perhaps a little more intuitive name:

> +  (GetMaxBackends(GMB_PLUS_MAX_PREPARED_XACTS) + 1));

However, the more I went through this patch, the more the GetMaxBackends(0)
nagged at me. The vast majority of the calls are with "0". I'd argue for
just having no arguments at all, which removes a bit of code and actually
makes things like this easier to read:

Original change:
> - uint32  TotalProcs = MaxBackends + NUM_AUXILIARY_PROCS +
max_prepared_xacts;
> + uint32  TotalProcs = GetMaxBackends(GMB_NUM_AUXILIARY_PROCS |
GMB_MAX_PREPARED_XACTS);

Versus:
> - uint32  TotalProcs = MaxBackends + NUM_AUXILIARY_PROCS +
max_prepared_xacts;
> + uint32  TotalProcs = GetMaxBackends() + NUM_AUXILIARY_PROCS +
max_prepared_xacts;


> + * This must be called after modules have had the change to alter GUCs in
> + * shared_preload_libraries, and before shared memory size is determined.
s/change/chance/;


> +void
> +SetMaxBackends(int max_backends)
> +{
> + if (MaxBackendsInitialized)
> +  elog(ERROR, "MaxBackends already initialized");

Is this going to get tripped by a call from restore_backend_variables?

Cheers,
Greg


Mention --enable-tap-tests in the TAP section page

2021-07-01 Thread Greg Sabino Mullane
Quick patch to add mention of the need for compiling with
--enable-tap-tests on the TAP section page

https://www.postgresql.org/docs/current/regress-tap.html

Searching about the TAP tests often leads to this page, but there is no
easy link or mention of the fact that the sample invocations will not work
without the special config flag.

Cheers,
Greg


mention-enable-tap.patch
Description: Binary data


Re: Speed up pg_checksums in cases where checksum already set

2021-06-29 Thread Greg Sabino Mullane
On Tue, Jun 29, 2021 at 2:59 AM Michael Paquier  wrote:

> Does that look fine to you?
>

Looks great, I appreciate the renaming.

Cheers,
Greg


Re: Remove useless int64 range checks on BIGINT sequence MINVALUE/MAXVALUE values

2021-06-22 Thread Greg Sabino Mullane
Those code comments look good.

Re: Avoid stuck of pbgench due to skipped transactions

2021-06-22 Thread Greg Sabino Mullane
The following review has been posted through the commitfest application:
make installcheck-world:  tested, failed
Implements feature:   tested, failed
Spec compliant:   not tested
Documentation:not tested

Looks fine to me, as a way of catching this edge case.

Re: Speed up pg_checksums in cases where checksum already set

2021-06-18 Thread Greg Sabino Mullane
On Fri, Jun 18, 2021 at 1:57 AM Michael Paquier  wrote:

> This doc addition is a bit confusing, as it could mean that each file
> has just one single checksum.  We could be more precise, say:
> "When enabling checksums, each relation file block with a changed
> checksum is rewritten in place."
>

Agreed, I like that wording. New patch attached.


> Should we also mention that the sync happens even if no blocks are
> rewritten based on the reasoning of upthread (aka we'd better do the
> final flush as an interrupted pg_checksums may let a portion of the
> files as not flushed)?
>

I don't know that we need to bother: the default is already to sync and one
has to go out of one's way using the -N argument to NOT sync, so I think
it's a pretty safe assumption to everyone (except those who read my first
version of my patch!) that syncing always happens.

Cheers,
Greg


003.pg_checksums.optimize.writes.and.always.sync.patch
Description: Binary data


Re: Update maintenance_work_mem/autovacuum_work_mem to reflect the 1GB limitation with VACUUM

2021-06-17 Thread Greg Sabino Mullane
The following review has been posted through the commitfest application:
make installcheck-world:  not tested
Implements feature:   not tested
Spec compliant:   not tested
Documentation:not tested

Latest patch looks fine to me, to be clear.

The new status of this patch is: Ready for Committer


Re: Update maintenance_work_mem/autovacuum_work_mem to reflect the 1GB limitation with VACUUM

2021-06-04 Thread Greg Sabino Mullane
Should we say "currently has"?

Re: Speed up pg_checksums in cases where checksum already set

2021-06-02 Thread Greg Sabino Mullane
Newer version attach that adds a small documentation tweak as well.

Cheers,
Greg


pg_checksums.optimize.writes.and.always.sync.patch
Description: Binary data


Re: Speed up pg_checksums in cases where checksum already set

2021-06-02 Thread Greg Sabino Mullane
Fair enough; thanks for the feedback. Attached is a new version that does
an unconditional sync (well, unless do_sync is false, a flag I am not
particularly fond of).

Cheers,
Greg


pg_checksums.optimize.writes.always.sync.patch
Description: Binary data


Re: Update maintenance_work_mem/autovacuum_work_mem to reflect the 1GB limitation with VACUUM

2021-06-01 Thread Greg Sabino Mullane
s/Node/Note/

Other than that, +1 to the patch and +1 to backpatching.

The new status of this patch is: Waiting on Author


Re: Add option --drop-cascade for pg_dump/restore

2021-05-28 Thread Greg Sabino Mullane
Overall the patch looks good, but I did notice a few small things:

1. In pg_dumpall.c, the section  /* Add long options to the pg_dump argument 
list */, we are now 
passing along the --drop-cascade option. However, --clean is not passed in, so 
any call to pg_dumpall using --drop-cascade fails a the pg_dump step. You'll 
note 
that --if-exists it not passed along either; because we are dropping the whole 
database, we don't 
need to have pg_dump worry about dropping objects at all. So I think that 
--drop-cascade should NOT be passed along from pg_dumpall to pg_dump.

2. I'm not even sure if --drop-cascade makes sense for pg_dumpall, as you 
cannot cascade global things like databases and roles.

3. In the file pg_backup_archiver.c, the patch does a 
stmtEnd = strstr(mark + strlen(buffer), ";");" and then spits 
out things "past" the semicolon as the final %s in the appendPQExpBuffer line. 
I'm not clear why: are we expecting more things to appear after the semi-colon? 
Why not just append a "\n" manually as part of the previous %s?

Cheers,
Greg

The new status of this patch is: Waiting on Author


Re: Reduce lock level for ALTER TABLE ... ADD CHECK .. NOT VALID

2021-05-28 Thread Greg Sabino Mullane
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:tested, passed

Looks fine to me

The new status of this patch is: Ready for Committer


Re: Speed up pg_checksums in cases where checksum already set

2021-05-27 Thread Greg Sabino Mullane
Thanks for the quick replies, everyone.

On Wed, May 26, 2021 at 10:17 PM Michael Paquier 
wrote:

>
> -if (do_sync)
> +if (do_sync && total_files_modified)
> Here, I am on the edge.  It could be an advantage to force a flush of
> the data folder anyway, no?


I was originally on the fence about including this as well, but it seems
like since the database is shut down and already in a consistent state,
there seems no advantage to syncing if we have not made any changes. Things
are no better or worse than when we arrived. However, the real-world use
case of running pg_checksums --enable and getting no changed blocks is
probably fairly rare, so if there is a strong objection, I'm happy
reverting to just (do_sync). (I'm not sure how cheap a sync is, I assume
it's low impact as the database is shut down, I guess it becomes a "might
as well while we are here"?)

On Wed, May 26, 2021 at 10:29 PM Justin Pryzby  wrote:

> In one of the checksum patches, there was an understanding that the pages
> should be written even if the checksum is correct, to handle replicas.
> ...
> Does your patch complicate things for the "stop all the clusters before
> switching them all" case?
>

I cannot imagine how it would, but, like Michael, I'm not really
understanding the reasoning here. We only run when safely shutdown, so no
WAL or dirty buffers need concern us :). Of course, once the postmaster is
up and running, fiddling with checksums becomes vastly more complicated, as
evidenced by that thread. I'm happy sticking to and speeding up the offline
version for now.

Cheers,
Greg


Speed up pg_checksums in cases where checksum already set

2021-05-26 Thread Greg Sabino Mullane
The attached patch makes an optimization to pg_checksums which prevents
rewriting the block if the checksum is already what we expect. This can
lead to much faster runs in cases where it is already set (e.g. enabled ->
disabled -> enable, external helper process, interrupted runs, future
parallel processes). There is also an effort to not sync the data directory
if no changes were written. Finally, added a bit more output on how many
files were actually changed, e.g.:

Checksum operation completed
Files scanned:   1236
Blocks scanned:  23283
Files modified:  38
Blocks modified: 19194
pg_checksums: syncing data directory
pg_checksums: updating control file
Checksums enabled in cluster

Cheers,
Greg


pg_checksums.optimize.writes.patch
Description: Binary data


Re: psql \df choose functions by their arguments

2021-04-07 Thread Greg Sabino Mullane
I like the wildcard aspect, but I have a few issues with the patch:

* It doesn't respect some common abbreviations that work elsewhere (e.g.
CREATE FUNCTION). So while "int4" works, "int" does not. Nor does "float",
which thus requires the mandatory-double-quoted "double precision"

* Adding commas to the args, as returned by psql itself via \df, provides
no matches.

* There seems to be no way (?) to limit the functions returned if they
share a common root. The previous incantation allowed you to pull out
foo(int) from foo(int, bigint). This was a big motivation for writing this
patch.

* SQL error on \df foo a..b as well as one on \df foo (bigint bigint)

Cheers,
Greg


Re: psql \df choose functions by their arguments

2021-01-19 Thread Greg Sabino Mullane
Ha ha ha, my bad, I am not sure why I left those out. Here is a new patch
with int2, int4, and int8. Thanks for the email.

Cheers,
Greg


v6-psql-df-pick-function-by-type.patch
Description: Binary data


Re: psql \df choose functions by their arguments

2021-01-14 Thread Greg Sabino Mullane
Thanks for the feedback: new version v5 (attached) has int8, plus the
suggested code formatting.

Cheers,
Greg


v5-psql-df-pick-function-by-type.patch
Description: Binary data


Re: psql \df choose functions by their arguments

2021-01-06 Thread Greg Sabino Mullane
On Sat, Jan 2, 2021 at 1:56 AM Thomas Munro  wrote:

> ...
> It looks like there is a collation dependency here that causes the
> test to fail on some systems:
>

Thanks for pointing that out. I tweaked the function definitions to
hopefully sidestep the ordering issue - attached is v4.

Cheers,
Greg


v4-psql-df-pick-function-by-type.patch
Description: Binary data


Re: psql \df choose functions by their arguments

2020-12-30 Thread Greg Sabino Mullane
Attached is the latest patch against HEAD - basically fixes a few typos.

Cheers,
Greg


v3-psql-df-pick-function-by-type.patch
Description: Binary data


  1   2   >