Re: CREATE FUNCTION ... SEARCH { DEFAULT | SYSTEM | SESSION }
On 12.08.23 04:35, Jeff Davis wrote: The attached patch implements a new SEARCH clause for CREATE FUNCTION. The SEARCH clause controls the search_path used when executing functions that were created without a SET clause. I don't understand this. This adds a new option for cases where the existing option wasn't specified. Why not specify the existing option then? Is it not good enough? Can we improve it?
Re: Extract numeric filed in JSONB more effectively
update with the correct patch.. v8-0001-optimize-casting-jsonb-to-a-given-type.patch Description: Binary data
Re: Remove distprep
On 14.07.23 11:48, Tom Lane wrote: Peter Eisentraut writes: Ah, there was a reason. The headerscheck and cpluspluscheck targets need jsonpath_gram.h to be built first. Which previously happened indirectly somehow? I have fixed this in the new patch version. I also fixed the issue that Álvaro reported nearby. Have we yet run this concept past the packagers list? I'm still wondering if they will raise any objection to getting rid of all the prebuilt files. So far there hasn't been any feedback from packagers that would appear to affect the outcome here. Also, I personally don't like the fact that you have removed the distinction between distclean and maintainer-clean. I use distclean-and-reconfigure quite a lot to avoid having to rebuild bison/flex outputs. This patch seems to have destroyed that workflow optimization in return for not much. The distclean target has a standard meaning that is baked into downstream build systems, so it would be pretty disruptive if distclean didn't actually clean everything back down to what was in the distribution tarball. We could add a different clean target that cleans not quite everything, if you can suggest a definition of what that should do.
Re: WIP: new system catalog pg_wait_event
On Wed, Aug 16, 2023 at 07:04:53AM +0200, Drouvot, Bertrand wrote: > I'd prefer the singular form. There is a lot of places where it's already used > (pg_database, pg_user, pg_namespace...to name a few) and it looks like that > using > the plural form are exceptions. Okay, fine by me. Also, I would remove the "wait_event_" prefixes to the field names for the attribute names. > Yeah, now that af720b4c50 is done, I'll add the custom wait events handling > in v6. Thanks. I guess that the hash tables had better remain local to wait_event.c. -- Michael signature.asc Description: PGP signature
Re: Extract numeric filed in JSONB more effectively
On Tue, Aug 15, 2023 at 1:24 PM Pavel Stehule wrote: > Hi > > út 15. 8. 2023 v 5:24 odesílatel Andy Fan > napsal: > >> >>> jsonb_extract_xx_type just cares about the argtype, but >>> 'explain select xx' will still access the const->constvalue. >>> const->constvalue is 0 which is set by makeNullConst currently, >>> and it is ok for the current supported type. >>> >> >> The exception is numeric data type, the constvalue can't be 0. >> so hack it with the below line. maybe not good enough, but I >> have no better solution now. >> >> + Const *target = >> makeNullConst(fexpr->funcresulttype, >> + >>-1, >> + >>InvalidOid); >> + /* >> +* Since all the above functions are strict, we >> can't input >> +* a NULL value. >> +*/ >> + target->constisnull = false; >> + >> + Assert(target->constbyval || target->consttype == >> NUMERICOID); >> + >> + /* Mock a valid datum for !constbyval type. */ >> + if (fexpr->funcresulttype == NUMERICOID) >> + target->constvalue = >> DirectFunctionCall1(numeric_in, CStringGetDatum("0")); >> >> > Personally I think this workaround is too dirty, and better to use a > strict function (I believe so the overhead for NULL values is acceptable). > In the patch v8, I created a new routine named makeDummyConst, which just sits by makeNullConst. It may be helpful to some extent. a). The code is self-document for the user/reader. b). We have a central place to maintain this routine. Besides the framework, the troubles for the reviewer may be if the code has some corner case issue or behavior changes. Especially I have some code refactor when working on jsonb_extract_path. so the attached test.sql is designed for this. I have compared the result between master and patched version and I think reviewer can do some extra testing with it. v8 is the finished version in my mind, so I think it is ready for review now. -- Best Regards Andy Fan v8-0001-optimize-casting-jsonb-to-a-given-type.patch.bak Description: Binary data test.sql Description: Binary data
some code cleanup in index.c and indexcmds.c
Here is a patch set with some straightforward code cleanup in index.c and indexcmds.c and some adjacent places. First, I have added const qualifiers to all the function prototypes as appropriate. This didn't require any additional casts or tricks. Then, I have renamed some function arguments for clarity. For example, several functions had an argument like Oid *classObjectId This is confusing in more than one way: The "class" is actually the operator class, not the pg_class entry, and the argument is actually an array, not a single value as the name would suggest. The amended version const Oid *opclassIds should be much clearer. Also, about half the code in these files already used the better naming system, so this change also makes everything within these files more consistent. Third, I removed some line breaks around the places that I touched anyway. In some cases, with the renaming, the lines didn't seem that long anymore to warrant a line break.From dfa595a003ea237a7431603824b9e5f28c5b20fa Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Wed, 16 Aug 2023 07:45:28 +0200 Subject: [PATCH 1/3] Add const decorations in index.c and indexcmds.c and some adjacent places. This especially makes it easier to understand for some complicated function signatures which are the input and the output arguments. --- src/backend/bootstrap/bootstrap.c | 2 +- src/backend/catalog/heap.c| 2 +- src/backend/catalog/index.c | 66 +++ src/backend/catalog/namespace.c | 10 ++--- src/backend/commands/indexcmds.c | 58 +-- src/include/bootstrap/bootstrap.h | 2 +- src/include/catalog/heap.h| 2 +- src/include/catalog/index.h | 28 ++--- src/include/catalog/namespace.h | 10 ++--- src/include/commands/defrem.h | 8 ++-- 10 files changed, 94 insertions(+), 94 deletions(-) diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c index 49e956b2c5..4cc2efa95c 100644 --- a/src/backend/bootstrap/bootstrap.c +++ b/src/backend/bootstrap/bootstrap.c @@ -914,7 +914,7 @@ AllocateAttribute(void) void index_register(Oid heap, Oid ind, - IndexInfo *indexInfo) + const IndexInfo *indexInfo) { IndexList *newind; MemoryContext oldcxt; diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c index 4c30c7d461..c89b0bfa75 100644 --- a/src/backend/catalog/heap.c +++ b/src/backend/catalog/heap.c @@ -697,7 +697,7 @@ void InsertPgAttributeTuples(Relation pg_attribute_rel, TupleDesc tupdesc, Oid new_rel_oid, - Datum *attoptions, + const Datum *attoptions, CatalogIndexState indstate) { TupleTableSlot **slot; diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c index eb2b8d84c3..2a30eb3a32 100644 --- a/src/backend/catalog/index.c +++ b/src/backend/catalog/index.c @@ -106,20 +106,20 @@ typedef struct /* non-export function prototypes */ static bool relationHasPrimaryKey(Relation rel); static TupleDesc ConstructTupleDescriptor(Relation heapRelation, - IndexInfo *indexInfo, - List *indexColNames, + const IndexInfo *indexInfo, + const List *indexColNames, Oid accessMethodObjectId, - Oid *collationObjectId, - Oid *classObjectId); + const Oid *collationObjectId, + const Oid *classObjectId); static void InitializeAttributeOids(Relation indexRelation, int numatts, Oid indexoid); -static void AppendAttributeTuples(Relation indexRelation, Datum *attopts); +static void AppendAttributeTuples(Relation indexRelation, const Datum *attopts); static void UpdateIndexRelation(Oid indexoid, Oid heapoid, Oid parentIndexId, - IndexInfo *indexInfo, - Oid *collation
Re: Synchronizing slots from primary to standby
On Mon, Aug 14, 2023 at 8:38 PM shveta malik wrote: > > On Mon, Aug 14, 2023 at 3:22 PM shveta malik wrote: > > > > On Tue, Aug 8, 2023 at 11:11 AM Drouvot, Bertrand > > wrote: > > > > > > Hi, > > > > > > On 8/8/23 7:01 AM, shveta malik wrote: > > > > On Mon, Aug 7, 2023 at 3:17 PM Drouvot, Bertrand > > > > wrote: > > > >> > > > >> Hi, > > > >> > > > >> On 8/4/23 1:32 PM, shveta malik wrote: > > > >>> On Fri, Aug 4, 2023 at 2:44 PM Drouvot, Bertrand > > > >>> wrote: > > > On 7/28/23 4:39 PM, Bharath Rupireddy wrote: > > > >> > > > > > > > > Agreed. That is why in v10,v11 patches, we have different infra for > > > > sync-slot worker i.e. it is not relying on "logical replication > > > > background worker" anymore. > > > > > > yeah saw that, looks like the right way to go to me. > > > > > > >> Maybe we should start some tests/benchmark with only one sync worker > > > >> to get numbers > > > >> and start from there? > > > > > > > > Yes, we can do that performance testing to figure out the difference > > > > between the two modes. I will try to get some statistics on this. > > > > > > > > > > Great, thanks! > > > > > > > We (myself and Ajin) performed the tests to compute the lag in standby > > slots as compared to primary slots with different number of slot-sync > > workers configured. > > > > 3 DBs were created, each with 30 tables and each table having one > > logical-pub/sub configured. So this made a total of 90 logical > > replication slots to be synced. Then the workload was run for aprox 10 > > mins. During this workload, at regular intervals, primary and standby > > slots' lsns were captured (from pg_replication_slots) and compared. At > > each capture, the intent was to know how much is each standby's slot > > lagging behind corresponding primary's slot by taking the distance > > between confirmed_flush_lsn of primary and standby slot. Then we took > > the average (integer value) of this distance over the span of 10 min > > workload and this is what we got: > > > > I have attached the scripts for schema-setup, running workload and > capturing lag. Please go through Readme for details. > > I did some more tests for 10,20 and 40 slots to calculate the average lsn distance between slots, comparing 1 worker and 3 workers. My results are as follows: 10 slots 1 worker: 5529.75527426 (average lsn distance between primary and standby per slot) 3 worker: 2224.57589134 20 slots 1 worker: 9592.87234043 3 worker: 3194.6293 40 slots 1 worker: 20566.093 3 worker: 7885.80952381 90 slots 1 worker: 36706.8405797 3 worker: 10236.6393162 regards, Ajin Cherian Fujitsu Australia
Re: Memory consumed by child SpecialJoinInfo in partitionwise join planning
On Fri, Aug 4, 2023 at 2:11 PM Ashutosh Bapat wrote: > > Attached patchset fixing those. > 0001 - patch to report planning memory, with to explain.out regression output > fix. We may consider committing this as well. > 0002 - with your comment addressed above. 0003 - Added this patch for handling SpecialJoinInfos for inner joins. These SpecialJoinInfos are created on the fly for parent joins. They can be created on the fly for child joins as well without requiring any translations. Thus they do not require any new memory. This patch is intended to be merged into 0002 finally. Added to the upcoming commitfest https://commitfest.postgresql.org/44/4500/ -- Best Wishes, Ashutosh Bapat From 1c39902456e715902e0e24b7cfdaf7135c472ae8 Mon Sep 17 00:00:00 2001 From: Ashutosh Bapat Date: Mon, 7 Aug 2023 11:00:53 +0530 Subject: [PATCH 3/3] Do not translate dummy SpecialJoinInfos Plain inner joins do not have SpecialJoinInfos associated with them. They are crafted on the fly for parent joins. Do the same for child joins. Ashutosh Bapat --- src/backend/optimizer/path/costsize.c | 37 ++ src/backend/optimizer/path/joinrels.c | 72 ++- src/include/optimizer/paths.h | 2 + 3 files changed, 54 insertions(+), 57 deletions(-) diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c index d6ceafd51c..43e0c9621c 100644 --- a/src/backend/optimizer/path/costsize.c +++ b/src/backend/optimizer/path/costsize.c @@ -5021,23 +5021,7 @@ compute_semi_anti_join_factors(PlannerInfo *root, /* * Also get the normal inner-join selectivity of the join clauses. */ - norm_sjinfo.type = T_SpecialJoinInfo; - norm_sjinfo.min_lefthand = outerrel->relids; - norm_sjinfo.min_righthand = innerrel->relids; - norm_sjinfo.syn_lefthand = outerrel->relids; - norm_sjinfo.syn_righthand = innerrel->relids; - norm_sjinfo.jointype = JOIN_INNER; - norm_sjinfo.ojrelid = 0; - norm_sjinfo.commute_above_l = NULL; - norm_sjinfo.commute_above_r = NULL; - norm_sjinfo.commute_below_l = NULL; - norm_sjinfo.commute_below_r = NULL; - /* we don't bother trying to make the remaining fields valid */ - norm_sjinfo.lhs_strict = false; - norm_sjinfo.semi_can_btree = false; - norm_sjinfo.semi_can_hash = false; - norm_sjinfo.semi_operators = NIL; - norm_sjinfo.semi_rhs_exprs = NIL; + make_dummy_sjinfo(&norm_sjinfo, outerrel, innerrel); nselec = clauselist_selectivity(root, joinquals, @@ -5190,23 +5174,8 @@ approx_tuple_count(PlannerInfo *root, JoinPath *path, List *quals) /* * Make up a SpecialJoinInfo for JOIN_INNER semantics. */ - sjinfo.type = T_SpecialJoinInfo; - sjinfo.min_lefthand = path->outerjoinpath->parent->relids; - sjinfo.min_righthand = path->innerjoinpath->parent->relids; - sjinfo.syn_lefthand = path->outerjoinpath->parent->relids; - sjinfo.syn_righthand = path->innerjoinpath->parent->relids; - sjinfo.jointype = JOIN_INNER; - sjinfo.ojrelid = 0; - sjinfo.commute_above_l = NULL; - sjinfo.commute_above_r = NULL; - sjinfo.commute_below_l = NULL; - sjinfo.commute_below_r = NULL; - /* we don't bother trying to make the remaining fields valid */ - sjinfo.lhs_strict = false; - sjinfo.semi_can_btree = false; - sjinfo.semi_can_hash = false; - sjinfo.semi_operators = NIL; - sjinfo.semi_rhs_exprs = NIL; + make_dummy_sjinfo(&sjinfo, path->outerjoinpath->parent, + path->innerjoinpath->parent); /* Get the approximate selectivity */ foreach(l, quals) diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c index 1a341c9b73..540eda612f 100644 --- a/src/backend/optimizer/path/joinrels.c +++ b/src/backend/optimizer/path/joinrels.c @@ -44,7 +44,8 @@ static void try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, List *parent_restrictlist); static void build_child_join_sjinfo(PlannerInfo *root, SpecialJoinInfo *parent_sjinfo, - Relids left_relids, Relids right_relids, + RelOptInfo *left_child, + RelOptInfo *right_child, SpecialJoinInfo *child_sjinfo); static void free_child_sjinfo_members(SpecialJoinInfo *child_sjinfo); static void compute_partition_bounds(PlannerInfo *root, RelOptInfo *rel1, @@ -655,6 +656,32 @@ join_is_legal(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2, return true; } +/* + * make_dummy_sjinfo + *Populate the given SpecialJoinInfo for a plain inner join, between rel1 + *and rel2, which does not have a SpecialJoinInfo associated with it. + */ +void +make_dummy_sjinfo(SpecialJoinInfo *sjinfo, RelOptInfo *rel1, RelOptInfo *rel2) +{ + sjinfo->type = T_SpecialJoinInfo; + sjinfo->min_lefthand = rel1->relids; + sjinfo->min_righthand = rel2->relids; + sjinfo->syn_lefthand = rel1->relids; + sjinfo->syn_righthand = rel2->relids; + sjinfo->jointype = JOIN_INNER; + sjinfo->ojrelid = 0; + sjinfo->commute_above_l = NULL; + sjinfo->commute_above_r = NULL; + sjinfo->commute_below_l = NULL; + sjinfo->commute_below_r = NULL; + /* we don'
Re: WIP: new system catalog pg_wait_event
Hi, On 8/14/23 6:37 AM, Michael Paquier wrote: On Thu, Aug 10, 2023 at 08:09:34PM +0200, Drouvot, Bertrand wrote: Agree that's worth it given the fact that iterating one more time is not that costly here. I have reviewed v4, and finished by putting my hands on it to see what I could do. Thanks! There are two things that we could do: - Hide that behind a macro defined in wait_event_funcs.c. - Feed the data generated here into a static structure, like: +static const struct +{ + const char *type; + const char *name; + const char *description; +} After experimenting with both, I've found the latter a tad cleaner, so the attached version does that. Yeah, looking at what you've done in v5, I also agree that's better that what has been done in v4 (I also think it will be easier to maintain). I am not sure that "pg_wait_event" is a good idea for the name if the new view. How about "pg_wait_events" instead, in plural form? There is more than one wait event listed. I'd prefer the singular form. There is a lot of places where it's already used (pg_database, pg_user, pg_namespace...to name a few) and it looks like that using the plural form are exceptions. One log entry in Solution.pm has missed the addition of a reference to wait_event_funcs_data.c. Oh right, thanks for fixing it in v5. And.. src/backend/Makefile missed two updates for maintainer-clean & co, no? Oh right, thanks for fixing it in v5. One thing that the patch is still missing is the handling of custom wait events for extensions. Yeah, now that af720b4c50 is done, I'll add the custom wait events handling in v6. So this still requires more code. I was thinking about listed these events as: - Type: "Extension" - Name: What a module has registered. - Description: "Custom wait event \"%Name%\" defined by extension". That sounds good to me, I'll do that. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: remaining sql/json patches
On Tue, Aug 15, 2023 at 5:58 PM jian he wrote: > Hi. > in v11, json_query: > +The returned data_type has the > same semantics > +as for constructor functions like > json_objectagg; > +the default returned type is text. > +The ON EMPTY clause specifies the behavior if the > +path_expression yields no value at all; > the > +default when ON ERROR is not specified is > to return a > +null value. > > the default returned type is jsonb? You are correct. > Also in above quoted second last > line should be ON EMPTY ? Correct too. > Other than that, the doc looks good. Thanks for the review. I will post a new version after finishing working on a few other improvements I am working on. -- Thanks, Amit Langote EDB: http://www.enterprisedb.com
Re: pg_logical_emit_message() misses a XLogFlush()
Hi, On 2023-08-16 12:37:21 +0900, Michael Paquier wrote: > I won't fight much if there are objections to backpatching, but that's > not really wise (no idea how much EDB's close flavor of BDR relies on > that). To be clear: I don't just object to backpatching, I also object to making existing invocations flush WAL in HEAD. I do not at all object to adding a parameter that indicates flushing, or a separate function to do so. The latter might be better, because it allows you to flush a group of messages, rather than a single one. Greetings, Andres Freund
Re: pg_logical_emit_message() misses a XLogFlush()
Hi, On 2023-08-16 03:20:53 +0200, Tomas Vondra wrote: > On 8/16/23 02:33, Andres Freund wrote: > > Hi, > > > > On 2023-08-16 06:58:56 +0900, Michael Paquier wrote: > >> On Tue, Aug 15, 2023 at 11:37:32AM +0200, Tomas Vondra wrote: > >>> Shouldn't the flush be done only for non-transactional messages? The > >>> transactional case will be flushed by regular commit flush. > >> > >> Indeed, that would be better. I am sending an updated patch. > >> > >> I'd like to backpatch that, would there be any objections to that? > > > > Yes, I object. This would completely cripple the performance of some uses of > > logical messages - a slowdown of several orders of magnitude. It's not clear > > to me that flushing would be the right behaviour if it weren't released, but > > it certainly doesn't seem right to make such a change in a minor release. > > > > So are you objecting to adding the flush in general, or just to the > backpatching part? Both, I think. I don't object to adding a way to trigger flushing, but I think it needs to be optional. > IMHO we either guarantee durability of non-transactional messages, in which > case this would be a clear bug - and I'd say a fairly serious one. I'm > curious what the workload that'd see order of magnitude of slowdown does > with logical messages I've used it, but even if such workload exists, would > it really be enough to fix any other durability bug? Not sure what you mean with the last sentence? I've e.g. used non-transactional messages for: - A non-transactional queuing system. Where sometimes one would dump a portion of tables into messages, with something like SELECT pg_logical_emit_message(false, 'app:', to_json(r)) FROM r; Obviously flushing after every row would be bad. This is useful when you need to coordinate with other systems in a non-transactional way. E.g. letting other parts of the system know that files on disk (or in S3 or ...) were created/deleted, since a database rollback wouldn't unlink/revive the files. - Audit logging, when you want to log in a way that isn't undone by rolling back transaction - just flushing every pg_logical_emit_message() would increase the WAL flush rate many times, because instead of once per transaction, you'd now flush once per modified row. It'd basically make it impractical to use for such things. - Optimistic locking. Emitting things that need to be locked on logical replicas, to be able to commit on the primary. A pre-commit hook would wait for the WAL to be replayed sufficiently - but only once per transaction, not once per object. > Or perhaps we don't want to guarantee durability for such messages, in > which case we don't need to fix it at all (even in master). Well, I can see adding an option to flush, or perhaps a separate function to flush, to master. > The docs are not very clear on what to expect, unfortunately. It says > that non-transactional messages are "written immediately" which I could > interpret in either way. Yea, the docs certainly should be improved, regardless what we end up with. Greetings, Andres Freund
Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication
On Fri, Aug 11, 2023 at 11:45 PM Melih Mutlu wrote: > > Again, I couldn't reproduce the cases where you saw significantly degraded > performance. I wonder if I'm missing something. Did you do anything not > included in the test scripts you shared? Do you think v26-0001 will perform > 84% worse than HEAD, if you try again? I just want to be sure that it was not > a random thing. > Interestingly, I also don't see an improvement in above results as big as in > your results when inserts/tx ratio is smaller. Even though it certainly is > improved in such cases. > TEST ENVIRONMENTS I am running the tests on a high-spec machine: -- NOTE: Nobody else is using this machine during our testing, so there are no unexpected influences messing up the results. Linix Architecture: x86_64 CPU(s):120 Thread(s) per core:2 Core(s) per socket:15 totalusedfree shared buff/cache available Mem: 755G5.7G737G 49M 12G748G Swap: 4.0G 0B4.0G ~~~ The results I am seeing are not random. HEAD+v26-0001 is consistently worse than HEAD but only for some settings. With these settings, I see bad results (i.e. worse than HEAD) consistently every time using the dedicated test machine. Hou-san also reproduced bad results using a different high-spec machine Vignesh also reproduced bad results using just his laptop but in his case, it did *not* occur every time. As discussed elsewhere the problem is timing-related, so sometimes you may be lucky and sometimes not. ~ I expect you are running everything correctly, but if you are using just a laptop (like Vignesh) then like him you might need to try multiple times before you can hit the problem happening in your environment. Anyway, in case there is some other reason you are not seeing the bad results I have re-attached scripts and re-described every step below. == BUILDING -- NOTE: I have a very minimal configuration without any optimization/debug flags etc. See config.log $ ./configure --prefix=/home/peter/pg_oss -- NOTE: Of course, make sure to be running using the correct Postgres: echo 'set environment variables for OSS work' export PATH=/home/peter/pg_oss/bin:$PATH -- NOTE: Be sure to do git stash or whatever so don't accidentally build a patched version thinking it is the HEAD version -- NOTE: Be sure to do a full clean build and apply (or don't apply v26-0001) according to the test you wish to run. STEPS 1. sudo make clean 2. make 3. sudo make install == SCRIPTS & STEPS SCRIPTS testrun.sh do_one_test_setup.sh do_one_test_PUB.sh do_one_test_SUB.sh --- STEPS Step-1. Edit the testrun.sh tables=( 100 ) workers=( 2 4 8 16 ) size="0" prefix="0816headbusy" <-- edit to differentiate each test run ~ Step-2. Edit the do_one_test_PUB.sh IF commit_counter = 1000 THEN <-- edit this if needed. I wanted 1000 inserts/tx so nothing to do ~ Step-3: Check nothing else is running. If yes, then clean it up [peter@localhost testing_busy]$ ps -eaf | grep postgres peter111924 100103 0 19:31 pts/000:00:00 grep --color=auto postgres ~ Step-4: Run the tests [peter@localhost testing_busy]$ ./testrun.sh num_tables=100, size=0, num_workers=2, run #1 <-- check the echo matched the config you set in the Setp-1 waiting for server to shut down done server stopped waiting for server to shut down done server stopped num_tables=100, size=0, num_workers=2, run #2 waiting for server to shut down done server stopped waiting for server to shut down done server stopped num_tables=100, size=0, num_workers=2, run #3 ... ~ Step-5: Sanity check When the test completes the current folder will be full of .log and .dat* files. Check for sanity that no errors happened [peter@localhost testing_busy]$ cat *.log | grep ERROR [peter@localhost testing_busy]$ ~ Step-6: Collect the results The results are output (by the do_one_test_SUB.sh) into the *.dat_SUB files Use grep to extract them [peter@localhost testing_busy]$ cat 0816headbusy_100t_0_2w_*.dat_SUB | grep RESULT | grep -v duration | awk '{print $3}' 11742.019 12157.355 11773.807 11582.981 12220.962 12546.325 12210.713 12614.892 12015.489 13527.05 Repeat grep for other files: $ cat 0816headbusy_100t_0_4w_*.dat_SUB | grep RESULT | grep -v duration | awk '{print $3}' $ cat 0816headbusy_100t_0_8w_*.dat_SUB | grep RESULT | grep -v duration | awk '{print $3}' $ cat 0816headbusy_100t_0_16w_*.dat_SUB | grep RESULT | grep -v duration | awk '{print $3}' ~ Step-7: Summarise the results Now I just cut/paste the results from Step-6 into a spreadsheet and report the median of the runs. For example, for the above HEAD run, it was: 2w4w 8w 16w 1 11742 5996 1919 1582 2 12157 5960 1871 1469 3 11774 5926 2101 1571 4 11583 6155 1883 1671 5 12221 6310 1895 1707 6 12
Re: pg_logical_emit_message() misses a XLogFlush()
On Wed, Aug 16, 2023 at 03:20:53AM +0200, Tomas Vondra wrote: > So are you objecting to adding the flush in general, or just to the > backpatching part? > > IMHO we either guarantee durability of non-transactional messages, in > which case this would be a clear bug - and I'd say a fairly serious one. > I'm curious what the workload that'd see order of magnitude of slowdown > does with logical messages, but even if such workload exists, would it > really be enough to fix any other durability bug? Yes, I also think that this is a pretty serious issue to not ensure durability in the non-transactional case if you have solutions designed around that. > Or perhaps we don't want to guarantee durability for such messages, in > which case we don't need to fix it at all (even in master). I mean, just look at the first message of the thread I am mentioning at the top of this thread: it lists three cases where something like pg_logical_emit_message() was wanted. And, it looks like a serious issue to me for the first two ones at least (out-of-order messages and inter-node communication), because an application may want to send something from node1 to node2, and node1 may just forget about it entirely if it crashes, finishing WAL redo at an earlier point than the record inserted. > The docs are not very clear on what to expect, unfortunately. It says > that non-transactional messages are "written immediately" which I could > interpret in either way. Agreed. The original thread never mentions "flush", "sync" or "durab". I won't fight much if there are objections to backpatching, but that's not really wise (no idea how much EDB's close flavor of BDR relies on that). -- Michael signature.asc Description: PGP signature
Re: [PATCH] Add function to_oct
On Wed, Aug 16, 2023 at 12:17 AM Nathan Bossart wrote: > > On Tue, Aug 15, 2023 at 01:53:25PM +0700, John Naylor wrote: > > - *ptr = '\0'; > > + Assert(base == 2 || base == 8 || base == 16); > > > > + *ptr = '\0'; > > > > Spurious whitespace change? > > I think this might just be a weird artifact of the diff algorithm. Don't believe everything you think. :-) ``` *ptr = '\0'; do ``` to ``` *ptr = '\0'; do ``` > > - char buf[32]; /* bigger than needed, but reasonable */ > > + char *buf = palloc(sizeof(uint64) * BITS_PER_BYTE + 1); > > > > Why is this no longer allocated on the stack? Maybe needs a comment about > > the size calculation. > > It really should be. IIRC I wanted to avoid passing a pre-allocated buffer > to convert_to_base(), but I don't remember why. I fixed this in v5. Now I'm struggling to understand why each and every instance has its own nominal buffer, passed down to the implementation. All we care about is the result -- is there some reason not to confine the buffer declaration to the general implementation? -- John Naylor EDB: http://www.enterprisedb.com
RE: [PoC] pg_upgrade: allow to upgrade publisher node
Dear Hou, Thanks for posting the patch! I want to open a question to gather opinions from others. > > It was primarily for upgrade purposes only. So, as we can't see a good > > reason to > > go via pg_dump let's do it in upgrade unless someone thinks otherwise. > > Removed the new option in pg_dump and modified the pg_upgrade > directly use the slot info to restore the slot in new cluster. In this version, creations of logical slots are serialized, whereas old ones were parallelised per db. Do you it should be parallelized again? I have tested locally and felt harmless. Also, this approch allows to log the executed SQLs. Best Regards, Hayato Kuroda FUJITSU LIMITED
Re: pg_logical_emit_message() misses a XLogFlush()
On 8/16/23 02:33, Andres Freund wrote: > Hi, > > On 2023-08-16 06:58:56 +0900, Michael Paquier wrote: >> On Tue, Aug 15, 2023 at 11:37:32AM +0200, Tomas Vondra wrote: >>> Shouldn't the flush be done only for non-transactional messages? The >>> transactional case will be flushed by regular commit flush. >> >> Indeed, that would be better. I am sending an updated patch. >> >> I'd like to backpatch that, would there be any objections to that? > > Yes, I object. This would completely cripple the performance of some uses of > logical messages - a slowdown of several orders of magnitude. It's not clear > to me that flushing would be the right behaviour if it weren't released, but > it certainly doesn't seem right to make such a change in a minor release. > So are you objecting to adding the flush in general, or just to the backpatching part? IMHO we either guarantee durability of non-transactional messages, in which case this would be a clear bug - and I'd say a fairly serious one. I'm curious what the workload that'd see order of magnitude of slowdown does with logical messages, but even if such workload exists, would it really be enough to fix any other durability bug? Or perhaps we don't want to guarantee durability for such messages, in which case we don't need to fix it at all (even in master). The docs are not very clear on what to expect, unfortunately. It says that non-transactional messages are "written immediately" which I could interpret in either way. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Test case for parameterized remote path in postgres_fdw
On Tue, Aug 15, 2023 at 7:50 PM Etsuro Fujita wrote: > So we should have modified the second one as well? Attached is a > small patch for that. Agreed, nice catch! +1 to the patch. Thanks Richard
Re: pg_logical_emit_message() misses a XLogFlush()
Hi, On 2023-08-16 06:58:56 +0900, Michael Paquier wrote: > On Tue, Aug 15, 2023 at 11:37:32AM +0200, Tomas Vondra wrote: > > Shouldn't the flush be done only for non-transactional messages? The > > transactional case will be flushed by regular commit flush. > > Indeed, that would be better. I am sending an updated patch. > > I'd like to backpatch that, would there be any objections to that? Yes, I object. This would completely cripple the performance of some uses of logical messages - a slowdown of several orders of magnitude. It's not clear to me that flushing would be the right behaviour if it weren't released, but it certainly doesn't seem right to make such a change in a minor release. Greetings, Andres Freund
Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication
Here is another review comment about patch v26-0001. The tablesync worker processes include the 'relid' as part of their name. See launcher.c: snprintf(bgw.bgw_name, BGW_MAXLEN, "logical replication tablesync worker for subscription %u sync %u", subid, relid); ~~ And if that worker is "reused" by v26-0001 to process another relation there is a LOG if (reuse_worker) ereport(LOG, errmsg("logical replication table synchronization worker for subscription \"%s\" will be reused to sync table \"%s\" with relid %u.", MySubscription->name, get_rel_name(MyLogicalRepWorker->relid), MyLogicalRepWorker->relid)); AFAICT, when being "reused" the original process name remains unchanged, and so I think it will continue to appear to any user looking at it that the tablesync process is just taking a very long time handling the original 'relid'. Won't the stale process name cause confusion to the users? -- Kind Regards, Peter Smith. Fujitsu Australia
Re: should frontend tools use syncfs() ?
On Tue, Aug 08, 2023 at 01:06:06PM -0700, Nathan Bossart wrote: > I ran a couple of tests for pg_upgrade with 100k tables (created using the > script here [0]) in order to demonstrate the potential benefits of this > patch. That shows some nice numbers with many files, indeed. How does the size of each file influence the difference in time? +else +{ +while (errno = 0, (de = readdir(dir)) != NULL) +{ +charsubpath[MAXPGPATH * 2]; + +if (strcmp(de->d_name, ".") == 0 || +strcmp(de->d_name, "..") == 0) +continue; It seems to me that there is no need to do that for in-place tablespaces. There are relative paths in pg_tblspc/, so they would be taken care of by the syncfs() done on the main data folder. This does not really check if the mount points of each tablespace is different, as well. For example, imagine that you have two tablespaces within the same disk, syncfs() twice. Perhaps, the current logic is OK anyway as long as the behavior is optional, but it should be explained in the docs, at least. I'm finding a bit confusing that fsync_pgdata() is coded in such a way that it does a silent fallback to the cascading syncs through walkdir() when syncfs is specified but not available in the build. Perhaps an error is more helpful because one would then know that they are trying something that's not around? + pg_log_error("could not synchronize file system for file \"%s\": %m", path); + (void) close(fd); + exit(EXIT_FAILURE); walkdir() reports errors and does not consider these fatal. Why the early exit()? I am a bit concerned about the amount of duplication this patch introduces in the docs. Perhaps this had better be moved into a new section of the docs to explain the tradeoffs, with each tool linking to it? Do we actually need --no-sync at all if --sync-method is around? We could have an extra --sync-method=none at option level with --no-sync still around mainly for compatibility? Or perhaps that's just over-designing things? -- Michael signature.asc Description: PGP signature
Re: Avoid a potential unstable test case: xmlmap.sql
> > > Please look at the bug #18014: > > https://www.postgresql.org/message-id/flat/18014-28c81cb79d44295d%40postgresql.org > There were other aspects of the xmlmap test failure discussed in that > thread as well. > Thank you Alexander for the information, I will go through there for discussion. -- Best Regards Andy Fan
Re: Logging of matching pg_hba.conf entry during auth skips trust auth, potential security issue
On Tue, Aug 15, 2023 at 03:39:10PM -0700, Jacob Champion wrote: > I'm not super comfortable with saying "connection authenticated" when > it explicitly hasn't been (nor with switching the meaning of a > non-NULL SYSTEM_USER from "definitely authenticated somehow" to "who > knows; parse it apart to see"). But adding a log entry ("connection > trusted:" or some such?) with the pointer to the HBA line that made it > happen seems like a useful audit helper to me. Yeah, thanks for confirming. That's also the impression I get after reading again the original thread and the idea of how this code path is handled in this commit. We could do something like a LOG "connection: method=%s user=%s (%s:%d)", without the "authenticated" and "identity" terms from set_authn_id(). Just to drop an idea. -- Michael signature.asc Description: PGP signature
Re: Would it be possible to backpatch Close support in libpq (28b5726) to PG16?
On 2023-Aug-16, Michael Paquier wrote: > > Personally I think backpatching 28b5726 has a really low risk of > > breaking anything. > > I agree about the low-risk argument, though. This is just new code. Here's a way to think about it. If 16.1 was already out, would we add libpq support for Close to 16.2? -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "La rebeldía es la virtud original del hombre" (Arthur Schopenhauer)
Re: Logging of matching pg_hba.conf entry during auth skips trust auth, potential security issue
On Tue, Aug 15, 2023 at 3:24 PM Michael Paquier wrote: > The first message from Jacob outlines the idea behind the handling of > trust. We could perhaps add one extra set_authn_id() for the uaTrust > case (not uaCert!) if that's more helpful. I'm not super comfortable with saying "connection authenticated" when it explicitly hasn't been (nor with switching the meaning of a non-NULL SYSTEM_USER from "definitely authenticated somehow" to "who knows; parse it apart to see"). But adding a log entry ("connection trusted:" or some such?) with the pointer to the HBA line that made it happen seems like a useful audit helper to me. --Jacob
Re: Would it be possible to backpatch Close support in libpq (28b5726) to PG16?
On Wed, Aug 16, 2023 at 12:14:21AM +0200, Jelte Fennema wrote: > 28b5726 allows sending Close messages from libpq, as opposed to > sending DEALLOCATE queries to deallocate prepared statements. Without > support for Close messages, libpq based clients won't be able to > deallocate prepared statements on PgBouncer, because PgBouncer does > not parse SQL queries and only looks at protocol level messages (i.e. > Close messages for deallocation). The RMT has the final word on anything related to the release, but we are discussing about adding something new to a branch that has gone through two beta cycles with a GA targetted around the end of September ~ beginning of October based on the trends of the recent years. That's out of the picture, IMO. This comes once every year. > Personally I think backpatching 28b5726 has a really low risk of > breaking anything. I agree about the low-risk argument, though. This is just new code. -- Michael signature.asc Description: PGP signature
Re: Logging of matching pg_hba.conf entry during auth skips trust auth, potential security issue
On Tue, Aug 15, 2023 at 04:49:47PM -0500, Shaun Thomas wrote: > The switch statement that decodes port->hba->auth_method ends by > simply setting status = STATUS_OK; with no supplementary output since > it never calls set_authn_id. So in theory, a malicious user could add > a trust line to pg_hba.conf and have unlimited unlogged access to the > database. That's the same as giving access to your data folder. Updating pg_hba.conf is only the tip of the iceberg if one has write access to your data folder. > Unless you happen to notice that the "connection > authenticated" line has disappeared, it would look like normal > activity. "trust" is not really an anthentication method because it does nothing, it just authorizes things to go through, so you cannot really say that it can have an authn ID (grep for "pedantic" around here): https://www.postgresql.org/message-id/c55788dd1773c521c862e8e0dddb367df51222be.ca...@vmware.com > Would it make sense to decouple the hba info from set_authn_id so that > it is always logged even when new auth methods get added in the > future? Or alternatively create a function call specifically for that > output so it can be produced from the trust case statement and > anywhere else that needs to tag the auth line. I personally would love > to see if someone got in through a trust line, ESPECIALLY if it isn't > supposed to be there. Like: > > 2023-08-15 13:26:03.159 PDT [692166] postgres@snip LOG: connection > authenticated: identity="postgres" method=trust > (/etc/postgresql/15/main/pg_hba.conf:1) You mean outside the switch/case in ClientAuthentication(). Some methods have an authn that is implementation-dependent, like ldap. I am not sure if switching the logic would lead to a gain, like calling once set_authn_id() vs passing up a string to set in a single call of set_authn_id(). > I read through the discussion, and it doesn't seem like the security > aspect of simply hiding trust auths from the log was considered. Since > this is a new capability, I suppose nothing is really different from > say Postgres 14 and below. Still, it never hurts to ask. The first message from Jacob outlines the idea behind the handling of trust. We could perhaps add one extra set_authn_id() for the uaTrust case (not uaCert!) if that's more helpful. -- Michael signature.asc Description: PGP signature
Would it be possible to backpatch Close support in libpq (28b5726) to PG16?
Hi, I know that this will probably get a staunch "No" as an answer, but... I'm still going to ask: Would it be possible to backport 28b5726 to the PG16 branch? Even though it's clearly a new feature? I'm working on named prepared statement support in PgBouncer: https://github.com/pgbouncer/pgbouncer/pull/845 That PR is pretty close to mergable (IMO) and my intention is to release a PgBouncer version with prepared statement support within a few months. 28b5726 allows sending Close messages from libpq, as opposed to sending DEALLOCATE queries to deallocate prepared statements. Without support for Close messages, libpq based clients won't be able to deallocate prepared statements on PgBouncer, because PgBouncer does not parse SQL queries and only looks at protocol level messages (i.e. Close messages for deallocation). Personally I think backpatching 28b5726 has a really low risk of breaking anything. And since PgBouncer is used a lot in the Postgres ecosystem, especially with libpq based clients, IMO it might be worth deviating from the rule of not backporting features after a STABLE branch has been cut. Otherwise all libpq based clients will have only limited support for prepared statements with PgBouncer until PG17 is released. Jelte
Re: pg_logical_emit_message() misses a XLogFlush()
On Tue, Aug 15, 2023 at 11:37:32AM +0200, Tomas Vondra wrote: > Shouldn't the flush be done only for non-transactional messages? The > transactional case will be flushed by regular commit flush. Indeed, that would be better. I am sending an updated patch. I'd like to backpatch that, would there be any objections to that? This may depend on how much logical solutions depend on this routine. -- Michael diff --git a/src/backend/replication/logical/message.c b/src/backend/replication/logical/message.c index c5de14afc6..cb3a806114 100644 --- a/src/backend/replication/logical/message.c +++ b/src/backend/replication/logical/message.c @@ -47,6 +47,7 @@ LogLogicalMessage(const char *prefix, const char *message, size_t size, bool transactional) { xl_logical_message xlrec; + XLogRecPtr lsn; /* * Force xid to be allocated if we're emitting a transactional message. @@ -71,7 +72,15 @@ LogLogicalMessage(const char *prefix, const char *message, size_t size, /* allow origin filtering */ XLogSetRecordFlags(XLOG_INCLUDE_ORIGIN); - return XLogInsert(RM_LOGICALMSG_ID, XLOG_LOGICAL_MESSAGE); + lsn = XLogInsert(RM_LOGICALMSG_ID, XLOG_LOGICAL_MESSAGE); + + /* + * Make sure that the message hits disk before leaving if not emitting a + * transactional message. + */ + if (!transactional) + XLogFlush(lsn); + return lsn; } /* signature.asc Description: PGP signature
Logging of matching pg_hba.conf entry during auth skips trust auth, potential security issue
Hi everyone, This started as a conversation on Discord. Someone asked if Postgres logs which line in pg_hba.conf matched against a certain login attempt, and I said no. That's not quite right, as enabling log_connections includes a line like this: 2023-08-15 13:26:03.159 PDT [692166] postgres@snip LOG: connection authenticated: identity="postgres" method=md5 (/etc/postgresql/15/main/pg_hba.conf:107) But I wasn't getting that output. I finally gave up and looked at the code, where I found that this particular output is only generated by the set_authn_id function. So if that function is never called, there's no message saying which line from the pg_hba.conf file matched a particular login. The switch statement that decodes port->hba->auth_method ends by simply setting status = STATUS_OK; with no supplementary output since it never calls set_authn_id. So in theory, a malicious user could add a trust line to pg_hba.conf and have unlimited unlogged access to the database. Unless you happen to notice that the "connection authenticated" line has disappeared, it would look like normal activity. Would it make sense to decouple the hba info from set_authn_id so that it is always logged even when new auth methods get added in the future? Or alternatively create a function call specifically for that output so it can be produced from the trust case statement and anywhere else that needs to tag the auth line. I personally would love to see if someone got in through a trust line, ESPECIALLY if it isn't supposed to be there. Like: 2023-08-15 13:26:03.159 PDT [692166] postgres@snip LOG: connection authenticated: identity="postgres" method=trust (/etc/postgresql/15/main/pg_hba.conf:1) Perhaps I'm being too paranoid; It just seemed to be an odd omission. Euler Taveira clued me into the initial patch which introduced the pg_hba.conf tattling: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=9afffcb833d3c5e59a328a2af674fac7e7334fc1 I read through the discussion, and it doesn't seem like the security aspect of simply hiding trust auths from the log was considered. Since this is a new capability, I suppose nothing is really different from say Postgres 14 and below. Still, it never hurts to ask. Cheers! -- Shaun Thomas High Availability Architect EDB www.enterprisedb.com
Re: Using defines for protocol characters
> On Wed, Aug 16, 2023 at 06:25:09AM +0900, Tatsuo Ishii wrote: >> Currently pqcomm.h needs c.h which is not problem for Pgpool-II. But >> what about other middleware? > > Why do you need to include directly c.h? There are definitions in > there that are not intended to be exposed. pqcomm.h has this: #define UNIXSOCK_PATH(path, port, sockdir) \ (AssertMacro(sockdir), \ AssertMacro(*(sockdir) != '\0'), \ snprintf(path, sizeof(path), "%s/.s.PGSQL.%d", \ (sockdir), (port))) AssertMacro is defined in c.h. Best reagards, -- Tatsuo Ishii SRA OSS LLC English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp
Re: Using defines for protocol characters
On 2023-Aug-16, Michael Paquier wrote: > On Wed, Aug 16, 2023 at 06:25:09AM +0900, Tatsuo Ishii wrote: > > Currently pqcomm.h needs c.h which is not problem for Pgpool-II. But > > what about other middleware? > > Why do you need to include directly c.h? There are definitions in > there that are not intended to be exposed. What this argument says is that these new defines should be in a separate file, not in pqcomm.h. IMO that makes sense, precisely because these defines should be usable by third parties. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
Re: Using defines for protocol characters
On Wed, Aug 16, 2023 at 06:25:09AM +0900, Tatsuo Ishii wrote: > Currently pqcomm.h needs c.h which is not problem for Pgpool-II. But > what about other middleware? Why do you need to include directly c.h? There are definitions in there that are not intended to be exposed. -- Michael signature.asc Description: PGP signature
Re: Using defines for protocol characters
> On Tue, Aug 15, 2023 at 02:46:24PM +0900, Tatsuo Ishii wrote: >> Is it possible to put the new define staff into protocol.h then let >> pqcomm.h include protocol.h? This makes Pgpool-II and other middle >> ware/drivers written in C easier to use the defines so that they only >> include protocol.h to use the defines. > > It is possible, of course, but are there any reasons such programs couldn't > include pqcomm.h? It looks relatively inexpensive to me. That being said, > I'm fine with the approach you describe if the folks in this thread agree. Currently pqcomm.h needs c.h which is not problem for Pgpool-II. But what about other middleware? Best reagards, -- Tatsuo Ishii SRA OSS LLC English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp
Re: run pgindent on a regular basis / scripted manner
On Fri, Aug 11, 2023 at 01:59:40PM -0700, Peter Geoghegan wrote: > I'm starting to have doubts about this policy. There have now been > quite a few follow-up "fixes" to indentation issues that koel > complained about. None of these fixups have been included in > .git-blame-ignore-revs. If things continue like this then "git blame" > is bound to become much less usable over time. Should we add those? Patch attached. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com >From f4f158e153f9027330ce841ca79b5650dfd24a0e Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Tue, 15 Aug 2023 13:24:18 -0700 Subject: [PATCH v1 1/1] Add a few recent commits to .git-blame-ignore-revs. --- .git-blame-ignore-revs | 21 + 1 file changed, 21 insertions(+) diff --git a/.git-blame-ignore-revs b/.git-blame-ignore-revs index 00b548c7b0..2ff86c8d0d 100644 --- a/.git-blame-ignore-revs +++ b/.git-blame-ignore-revs @@ -14,6 +14,27 @@ # # $ git log --pretty=format:"%H # %cd%n# %s" $PGINDENTGITHASH -1 --date=iso +89be0b89ae60c63856fd26d82a104781540e2312 # 2023-08-15 17:45:00 +0900 +# Fix code indentation vioaltion introduced in commit 9e9931d2b. + +5dc456b7dda4f7d0d7735b066607c190c74d174a # 2023-08-11 20:43:34 +0900 +# Fix code indentation violations introduced by recent commit + +62e9af4c63fbd36fb9af8450fb44bece76d7766f # 2023-07-25 12:35:58 +0530 +# Fix code indentation vioaltion introduced in commit d38ad8e31d. + +4e465aac36ce9a9533c68dbdc83e67579880e628 # 2023-07-18 14:04:31 +0900 +# Fix indentation in twophase.c + +328f492d2565cfbe383f13a69425d751fd79415f # 2023-07-13 22:26:10 +0900 +# Fix code indentation violation in commit b6e1157e7d + +69a674a170058e63e8178aec8a36a673efce8801 # 2023-07-06 11:49:18 +0530 +# Fix code indentation vioaltion introduced in commit cc32ec24fd. + +a4cfeeca5a97f2b5969c31aa69ba775af95ee5a3 # 2023-07-03 12:47:49 +0200 +# Fix code indentation violations + b334612b8aee9f9a34378982d8938b201dfad323 # 2023-06-20 09:50:43 -0400 # Pre-beta2 mechanical code beautification. -- 2.25.1
Re: [PATCH] Add function to_oct
On Tue, Aug 15, 2023 at 01:53:25PM +0700, John Naylor wrote: > If we're not going to have a general SQL conversion function, here are some > comments on the current patch. I appreciate the review. > +static char *convert_to_base(uint64 value, int base); > > Not needed if the definition is above the callers. Done. > + * Workhorse for to_binary, to_oct, and to_hex. Note that base must be > either > + * 2, 8, or 16. > > Why wouldn't it work with any base <= 16? You're right. I changed this in v5. > - *ptr = '\0'; > + Assert(base == 2 || base == 8 || base == 16); > > + *ptr = '\0'; > > Spurious whitespace change? I think this might just be a weird artifact of the diff algorithm. > - char buf[32]; /* bigger than needed, but reasonable */ > + char *buf = palloc(sizeof(uint64) * BITS_PER_BYTE + 1); > > Why is this no longer allocated on the stack? Maybe needs a comment about > the size calculation. It really should be. IIRC I wanted to avoid passing a pre-allocated buffer to convert_to_base(), but I don't remember why. I fixed this in v5. > +static char * > +convert_to_base(uint64 value, int base) > > On my machine this now requires a function call and a DIV instruction, even > though the patch claims not to support anything but a power of two. It's > tiny enough to declare it inline so the compiler can specialize for each > call site. This was on my list of things to check before committing. I assumed that it would be automatically inlined, but given your analysis, I went ahead and added the inline keyword. My compiler took the hint and inlined the function, and it used SHR instead of DIV instructions. The machine code for to_hex32/64 is still a couple of instructions longer than before (probably because of the uint64 casts), but I don't know if we need to worry about micro-optimizing these functions any further. > +{ oid => '5101', descr => 'convert int4 number to binary', > > Needs to be over 8000. Done. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com >From 2ad83f8781a013ef5e5e5a62a422f3b73c6c5f2d Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Tue, 25 Jul 2023 16:09:01 -0700 Subject: [PATCH v5 1/1] add to_binary() and to_oct() --- doc/src/sgml/func.sgml| 42 +++ src/backend/utils/adt/varlena.c | 103 +++--- src/include/catalog/pg_proc.dat | 12 +++ src/test/regress/expected/strings.out | 26 ++- src/test/regress/sql/strings.sql | 9 ++- 5 files changed, 163 insertions(+), 29 deletions(-) diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index be2f54c914..23b5ac7457 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -3737,6 +3737,48 @@ repeat('Pg', 4) PgPgPgPg + + + + to_binary + +to_binary ( integer ) +text + + +to_binary ( bigint ) +text + + +Converts the number to its equivalent binary representation. + + +to_binary(2147483647) +111 + + + + + + + to_oct + +to_oct ( integer ) +text + + +to_oct ( bigint ) +text + + +Converts the number to its equivalent octal representation. + + +to_oct(2147483647) +177 + + + diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c index b1ec5c32ce..b955e54611 100644 --- a/src/backend/utils/adt/varlena.c +++ b/src/backend/utils/adt/varlena.c @@ -4919,54 +4919,105 @@ array_to_text_internal(FunctionCallInfo fcinfo, ArrayType *v, return result; } -#define HEXBASE 16 /* - * Convert an int32 to a string containing a base 16 (hex) representation of - * the number. + * We size the buffer for to_binary's longest possible return value, including + * the terminating '\0'. */ -Datum -to_hex32(PG_FUNCTION_ARGS) +#define CONVERT_TO_BASE_BUFSIZE (sizeof(uint64) * BITS_PER_BYTE + 1) + +/* + * Workhorse for to_binary, to_oct, and to_hex. Note that base must be > 1 and + * <= 16. buf must be at least CONVERT_TO_BASE_BUFSIZE bytes. + */ +static inline char * +convert_to_base(uint64 value, int base, char *buf) { - uint32 value = (uint32) PG_GETARG_INT32(0); - char *ptr; const char *digits = "0123456789abcdef"; - char buf[32]; /* bigger than needed, but reasonable */ + char *ptr = buf + CONVERT_TO_BASE_BUFSIZE - 1; - ptr = buf + sizeof(buf) - 1; - *ptr = '\0'; + Assert(base > 1); + Assert(base <= 16); + *ptr = '\0'; do { - *--ptr = digits[value % HEXBASE]; - value /= HEXBASE; + *--ptr = digits[value % base]; + value /= base; } while (ptr > buf && value); - PG_RETURN_TEXT_P(cstring_to_text(ptr)); + return ptr; +} + +/* + * Convert an integer to a string containing a base-2 (binary) representation + * of the number. +
Re: initial pruning in parallel append
On Wed, Aug 9, 2023 at 8:57 AM Amit Langote wrote: > I checked enough to be sure that IsParallelWorker() is reliable at the > time of ExecutorStart() / ExecInitNode() in ParallelQueryMain() in a > worker. However, ParallelWorkerContext is not available at that > point. Here's the relevant part of ParallelQueryMain(): > > /* Start up the executor */ > queryDesc->plannedstmt->jitFlags = fpes->jit_flags; > ExecutorStart(queryDesc, fpes->eflags); > > /* Special executor initialization steps for parallel workers */ > queryDesc->planstate->state->es_query_dsa = area; > if (DsaPointerIsValid(fpes->param_exec)) > { > char *paramexec_space; > > paramexec_space = dsa_get_address(area, fpes->param_exec); > RestoreParamExecParams(paramexec_space, queryDesc->estate); > } > pwcxt.toc = toc; > pwcxt.seg = seg; > ExecParallelInitializeWorker(queryDesc->planstate, &pwcxt); > > BTW, we do also use IsParallelWorker() in ExecGetRangeTableRelation() > which also probably only runs during ExecInitNode(), same as > ExecInitPartitionPruning() that this patch adds it to. I don't know if that's a great idea, but I guess if we're already doing it, it doesn't hurt to expand the use a little bit. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Faster "SET search_path"
On Mon, Aug 7, 2023 at 7:24 PM Jeff Davis wrote: > I might just avoid guc.c entirely and directly set > namespace_search_path and baseSearchPathValid=false. The main thing we > lose there is the GUC stack (AtEOXact_GUC()), but there's already a > PG_TRY/PG_FINALLY block in fmgr_security_definer, so we can use that to > change it back safely. (I think? I need to convince myself that it's > safe to skip the work in guc.c, at least for the special case of > search_path, and that it won't be too fragile.) I suspect that dodging the GUC stack machinery is not a very good idea. The timing of when TRY/CATCH blocks are called is different from when subtransactions are aborted, and that distinction has messed me up more than once when trying to code various things. Specifically, changes that happen in a CATCH block happen before we actually reach Abort(Sub)Transaction, which sort of makes it sound like you'd be reverting back to the old value too early. The GUC stack is also pretty complicated because of the SET/SAVE and LOCAL/non-LOCAL stuff. I can't say there isn't a way to make something like what you're talking about here work, but I bet it will be difficult to get all of the corner cases right, and I suspect that trying to speed up the existing mechanism is a better plan than trying to go around it. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Extending SMgrRelation lifetimes
On 14/08/2023 05:41, Thomas Munro wrote: The new idea is to overload smgrrelease(it) so that it also clears the owner, which means that AtEOXact_SMgr() will eventually smgrclose(it), unless it is re-owned by a relation before then. That choice stems from the complete lack of information available via sinval in the case of an overflow. We must (1) close all descriptors because any file might have been unlinked, (2) keep all pointers valid and yet (3) not leak dropped smgr objects forever. In this patch, smgrreleaseall() achieves those goals. Makes sense. Some of the smgrclose() calls could perhaps still be smgrclose(). If you can guarantee that no-one is holding the SMgrRelation, it's still OK to call smgrclose(). There's little gain from closing earlier, though. Proof-of-concept patch attached. Are there holes in this scheme? Better ideas welcome. In terms of spelling, another option would be to change the behaviour of smgrclose() to work as described, ie it would call smgrrelease() and then also disown, so we don't have to change most of those callers, and then add a new function smgrdestroy() for the few places that truly need it. Or something like that. If you change smgrclose() to do what smgrrelease() does now, then it will apply automatically to extensions. If an extension is currently using smgropen()/smgrclose() correctly, this patch alone won't make it wrong, so it's not very critical for extensions to adopt the change. However, if after this we consider it OK to hold a pointer to SMgrRelation for longer, and start doing that in the backend, then extensions need to be adapted too. While studying this I noticed a minor thinko in smgrrelease() in 15+16, so here's a fix for that also. I haven't figured out a sequence that makes anything bad happen, but we should really invalidate smgr_targblock when a relfilenode is reused, since it might point past the end. This becomes more obvious once smgrrelease() is used for truncation, as proposed here. +1. You can move the smgr_targblock clearing out of the loop, though. -- Heikki Linnakangas Neon (https://neon.tech)
Re: Using defines for protocol characters
On Tue, Aug 15, 2023 at 02:46:24PM +0900, Tatsuo Ishii wrote: > Is it possible to put the new define staff into protocol.h then let > pqcomm.h include protocol.h? This makes Pgpool-II and other middle > ware/drivers written in C easier to use the defines so that they only > include protocol.h to use the defines. It is possible, of course, but are there any reasons such programs couldn't include pqcomm.h? It looks relatively inexpensive to me. That being said, I'm fine with the approach you describe if the folks in this thread agree. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: [PATCH] Add function to_oct
On Tue, Aug 15, 2023 at 07:58:17AM +0200, Vik Fearing wrote: > On 8/15/23 06:11, Nathan Bossart wrote: >> If there are no objections, I'd like to commit this patch soon. > > I just took a look at this (and the rest of the thread). I am a little bit > disappointed that we don't have a generic base conversion function, but even > if we did I think these specialized functions would still be useful. > > No objection from me. Thanks for taking a look. I don't mean for this to preclude a generic base conversion function that would supersede the functions added by this patch. However, I didn't want to hold up $SUBJECT because of something that may or may not happen after lots of discussion. If it does happen within the v17 development cycle, we could probably just remove to_oct/binary. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: pgbench - adding pl/pgsql versions of tests
On Tue, Aug 15, 2023 at 09:46:59AM +0200, Fabien COELHO wrote: > I'm unclear about what variety of scripts that could be provided given the > tables made available with pgbench. ISTM that other scenari would involve > both an initialization and associated scripts, and any proposal would be > bared because it would open the door to anything. Why's that? I'm not aware of any project policy that prohibits such enhancements to pgbench. It might take some effort to gather consensus on a proposal like this, but IMHO that doesn't mean we shouldn't try. If the prevailing wisdom is that we shouldn't add more built-in scripts because there is an existing way to provide custom ones, then it's not clear that we should proceed with $SUBJECT, anyway. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Replace known_assigned_xids_lck by memory barrier
On Tue, Aug 15, 2023 at 12:29:24PM +0200, Michail Nikolaev wrote: >> What sort of benefits do you see from this patch? It might be worthwhile >> in itself to remove spinlocks when possible, but IME it's much easier to >> justify such changes when there is a tangible benefit we can point to. > > Oh, it is not an easy question :) > > The answer, probably, looks like this: > 1) performance benefits of spin lock acquire removing in > KnownAssignedXidsGetOldestXmin and KnownAssignedXidsSearch > 2) it is closing 13-year-old tech depth > > But in reality, it is not easy to measure performance improvement > consistently for this change. Okay. Elsewhere, it seems like folks are fine with patches that reduce shared memory space via atomics or barriers even if there's no immediate benefit [0], so I think it's fine to proceed. >> Are the assignments in question guaranteed to be atomic? IIUC we assume >> that aligned 4-byte loads/stores are atomic, so we should be okay as long >> as we aren't handling anything larger. > > Yes, 4-bytes assignment are atomic, locking is used to ensure memory > write ordering in this place. Yeah, it looks like both the values that are protected by known_assigned_xids_lck are integers, so this should be okay. One remaining question I have is whether it is okay if we see an updated value for one of the head/tail variables but not the other. It looks like the tail variable is only updated with ProcArrayLock held exclusively, which IIUC wouldn't prevent such mismatches even today, since we use a separate spinlock for reading them in some cases. [0] https://postgr.es/m/20230524214958.mt6f5xokpumvnrio%40awork3.anarazel.de -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Converting sql anywhere to postgres
Hi there I am trying to convert a SQL Anywhere database to postgres. Within SQL anywhere a field can have a default value of 'last user'. This means that when you perform an update on a table, if the field is not explicitly set then the current user is used. So for instance if I have a field called mod_user in a table, but when I do an update on the table and do not set mod_user then SQL Anywhere sets the field to current_uer. I have tried to replicate this using a postgres trigger in the before update. However, if I do not set the value then it automatically picks up the value that was already in the field. Is there a way to tell the difference between me setting the value to the same as the previous value and postgres automatically picking it up. If the field myfield contains the word 'me'. Can I tell the difference between: Update table1 set field1='something',myfield='me' And Update table1 set field1='something'
Re: Avoid a potential unstable test case: xmlmap.sql
Hi Andy, 15.08.2023 14:09, Andy Fan wrote: Hi: In the test case of xmlmap.sql, we have the query below under schema_to_xml. Please look at the bug #18014: https://www.postgresql.org/message-id/flat/18014-28c81cb79d44295d%40postgresql.org There were other aspects of the xmlmap test failure discussed in that thread as well. Best regards, Alexander
Re: BUG #17946: LC_MONETARY & DO LANGUAGE plperl - BUG
On 01/08/2023 16:48, Joe Conway wrote: Any further comments on the posted patch[1]? I would like to apply/push this prior to the beta and minor releases next week. I'm not sure about the placement of the uselocale() calls. In plperl_spi_exec(), for example, I think we should switch to the global locale right after the check_spi_usage_allowed() call. Otherwise, if an error happens in BeginInternalSubTransaction() or in pg_verifymbstr(), the error would be processed with the perl locale. Maybe that's harmless, error processing hardly cares about LC_MONETARY, but seems wrong in principle. Hmm, come to think of it, if BeginInternalSubTransaction() throws an error, we just jump out of the perl interpreter? That doesn't seem cool. But that's not new with this patch. If I'm reading correctly, compile_plperl_function() calls select_perl_context(), which calls plperl_trusted_init(), which calls uselocale(). So it leaves locale set to the perl locale. Who sets it back? How about adding a small wrapper around eval_pl() that sets and unsets the locale(), just when we enter the interpreter? It's easier to see that we are doing the calls in right places, if we make them as close as possible to entering/exiting the interpreter. Are there other functions in addition to eval_pl() that need to be called with the perl locale? /* * plperl_xact_callback --- cleanup at main-transaction end. */ static void plperl_xact_callback(XactEvent event, void *arg) { /* ensure global locale is the current locale */ if (uselocale((locale_t) 0) != LC_GLOBAL_LOCALE) perl_locale_obj = uselocale(LC_GLOBAL_LOCALE); } So the assumption is that the if current locale is not LC_GLOBAL_LOCALE, then it was the perl locale. Seems true today, but this could confusion if anything else calls uselocale(). In particular, if another PL implementation copies this, and you use plperl and the other PL at the same time, they would get mixed up. I think we need another "bool perl_locale_obj_in_use" variable to track explicitly whether the perl locale is currently active. If we are careful to put the uselocale() calls in the right places so that we never ereport() while in perl locale, this callback isn't needed. Maybe it's still a good idea, though, to be extra sure that things get reset to a sane state if something unexpected happens. If multiple interpreters are used, is the single perl_locale_obj variable still enough? Each interpreter can have their own locale I believe. PS. please pgindent -- Heikki Linnakangas Neon (https://neon.tech)
Re: postgres_fdw: wrong results with self join + enable_nestloop off
Hi Etsuro, all The commit[1] seems to break some queries in Citus[2], which is an extension which relies on set_join_pathlist_hook. Although the comment says */*Finally, give extensions a chance to manipulate the path list.*/ *we use it to extract lots of information about the joins and do the planning based on the information. Now, for some joins where consider_join_pushdown=false, we cannot get the information that we used to get, which prevents doing distributed planning for certain queries. We wonder if it is possible to allow extensions to access the join info under all circumstances, as it used to be? Basically, removing the additional check: diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c index 03b3185984..080e76cbe9 100644 --- a/src/backend/optimizer/path/joinpath.c +++ b/src/backend/optimizer/path/joinpath.c @@ -349,8 +349,7 @@ add_paths_to_joinrel(PlannerInfo *root, /* * 6. Finally, give extensions a chance to manipulate the path list. */ - if (set_join_pathlist_hook && - consider_join_pushdown) + if (set_join_pathlist_hook) set_join_pathlist_hook(root, joinrel, outerrel, innerrel, jointype, &extra); } Thanks, Onder [1]: https://github.com/postgres/postgres/commit/b0e390e6d1d68b92e9983840941f8f6d9e083fe0 [2]: https://github.com/citusdata/citus/issues/7119 Etsuro Fujita , 15 Ağu 2023 Sal, 11:05 tarihinde şunu yazdı: > On Tue, Aug 8, 2023 at 6:30 PM Richard Guo wrote: > > On Tue, Aug 8, 2023 at 4:40 PM Etsuro Fujita > wrote: > >> I modified the code a bit further to use an if-test to avoid a useless > >> function call, and added/tweaked comments and docs further. Attached > >> is a new version of the patch. I am planning to commit this, if there > >> are no objections. > > > +1 to the v4 patch. It looks good to me. > > Pushed after some copy-and-paste editing of comments/documents. > > Thanks! > > Best regards, > Etsuro Fujita > > >
Re: [PoC] Improve dead tuple storage for lazy vacuum
On Tue, Aug 15, 2023 at 9:34 AM Masahiko Sawada wrote: > BTW cfbot reported that some regression tests failed due to OOM. I've > attached the patch to fix it. Seems worth doing now rather than later, so added this and squashed most of the rest together. I wonder if that test uses too much memory in general. Maybe using the full uint64 is too much. > On Mon, Aug 14, 2023 at 8:05 PM John Naylor > wrote: > > This is possibly faster than v38-0010, but looking like not worth the complexity, assuming the other way avoids the bug going forward. > > I prefer 0010 but is it worth testing with other compilers such as clang? Okay, keeping 0010 with a comment, and leaving out 0011 for now. Clang is aggressive about unrolling loops, so may be worth looking globally at some point. > > v38-0012-Use-branch-free-coding-to-skip-new-element-index.patch > > Within noise level of v38-0011, but it's small and simple, so I like it, at least for small arrays. > > Agreed. Keeping 0012 and not 0013. -- John Naylor EDB: http://www.enterprisedb.com v39-ART.tar.gz Description: application/gzip
Test case for parameterized remote path in postgres_fdw
Hi, While working on the join pushdown issue, I noticed this bit in commit e4106b252: --- parameterized remote path +-- parameterized remote path for foreign table EXPLAIN (VERBOSE, COSTS false) - SELECT * FROM ft2 a, ft2 b WHERE a.c1 = 47 AND b.c1 = a.c2; + SELECT * FROM "S 1"."T 1" a, ft2 b WHERE a."C 1" = 47 AND b.c1 = a.c2; SELECT * FROM ft2 a, ft2 b WHERE a.c1 = 47 AND b.c1 = a.c2; + The first statement was modified to test the intended behavior, but the second one was not. The second one as-is performs a foreign join: EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft2 a, ft2 b WHERE a.c1 = 47 AND b.c1 = a.c2; QUERY PLAN --- Foreign Scan Output: a.c1, a.c2, a.c3, a.c4, a.c5, a.c6, a.c7, a.c8, b.c1, b.c2, b.c3, b.c4, b.c5, b.c6, b.c7, b.c8 Relations: (public.ft2 a) INNER JOIN (public.ft2 b) Remote SQL: SELECT r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8, r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8 FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1.c2 = r2."C 1")) AND ((r1."C 1" = 47 (4 rows) So we should have modified the second one as well? Attached is a small patch for that. Best regards, Etsuro Fujita postgres-fdw-parameterized-path-test-case.patch Description: Binary data
Re: Avoid a potential unstable test case: xmlmap.sql
I overlooked the fact even in the bitmap index scan loose mode, the recheck is still executed before the qual, so bitmap index scan is OK in this case. Sort Output: oid, relname Sort Key: pg_class.relname -> Bitmap Heap Scan on pg_catalog.pg_class Output: oid, relname Recheck Cond: (pg_class.relnamespace = '28601'::oid) Filter: (has_table_privilege(pg_class.oid, 'SELECT'::text) AND (pg_class.relkind = ANY ('{r,m,v}'::"char"[]))) -> Bitmap Index Scan on pg_class_relname_nsp_index Index Cond: (pg_class.relnamespace = '28601'::oid) v2 attached. -- Best Regards Andy Fan v2-0001-Avoid-a-potential-unstable-testcase.patch Description: Binary data
Avoid a potential unstable test case: xmlmap.sql
Hi: In the test case of xmlmap.sql, we have the query below under schema_to_xml. explain (costs off, verbose) SELECT oid FROM pg_catalog.pg_class WHERE relnamespace = 28601 AND relkind IN ('r','m','v') AND pg_catalog.has_table_privilege (oid, 'SELECT') ORDER BY relname; If the query is using SeqScan, the execution order of the quals is: has_table_privilege(pg_class.oid, 'SELECT'::text) AND (pg_class.relnamespace = '28601'::oid) AND (pg_class.relkind = ANY ('{r,m,v}'::"char"[])) based on current cost setting and algorithm. With this plan, has_table_privilege(pg_class.oid, 'SELECT'::text) may be executed against all the relations (not just the given namespace), so if a tuple in pg_class is scanned and before has_table_privilege is called, the relation is dropped, then we will get error: ERROR: relation with OID xxx does not exist To overcome this, if disabling the seqscan, then only index scan on relnamespace is possible, so relnamespace = '28601'::oid will be filtered first before calling has_table_privilege. and in this test case, we are sure the relation belonging to the current namespace will never be dropped, so no error is possible. Here is the plan for reference: Seq Scan: Sort Output: oid, relname Sort Key: pg_class.relname -> Seq Scan on pg_catalog.pg_class Output: oid, relname Filter: (has_table_privilege(pg_class.oid, 'SELECT'::text) AND (pg_class.relnamespace = '28601'::oid) AND (pg_class.relkind = ANY ('{r,m,v}'::"char"[]))) enable_seqscan to off QUERY PLAN -- Index Scan using pg_class_relname_nsp_index on pg_catalog.pg_class Output: oid, relname Index Cond: (pg_class.relnamespace = '28601'::oid) Filter: (has_table_privilege(pg_class.oid, 'SELECT'::text) AND (pg_class.relkind = ANY ('{r,m,v}'::"char"[]))) Patch is attached. -- Best Regards Andy Fan v1-0001-Avoid-a-potential-unstable-testcase.patch Description: Binary data
Re: Replace known_assigned_xids_lck by memory barrier
Hello, Nathan. > What sort of benefits do you see from this patch? It might be worthwhile > in itself to remove spinlocks when possible, but IME it's much easier to > justify such changes when there is a tangible benefit we can point to. Oh, it is not an easy question :) The answer, probably, looks like this: 1) performance benefits of spin lock acquire removing in KnownAssignedXidsGetOldestXmin and KnownAssignedXidsSearch 2) it is closing 13-year-old tech depth But in reality, it is not easy to measure performance improvement consistently for this change. > Are the assignments in question guaranteed to be atomic? IIUC we assume > that aligned 4-byte loads/stores are atomic, so we should be okay as long > as we aren't handling anything larger. Yes, 4-bytes assignment are atomic, locking is used to ensure memory write ordering in this place. > This use of pg_write_barrier() looks correct to me, but don't we need > corresponding read barriers wherever we obtain the pointers? FWIW I tend > to review src/backend/storage/lmgr/README.barrier in its entirety whenever > I deal with this stuff. Oh, yeah, you're right! (1) I'll prepare an updated version of the patch soon. I don't why I was assuming pg_write_barrier is enough (⊙_⊙') [1]: https://github.com/postgres/postgres/blob/master/src/backend/storage/lmgr/README.barrier#L125
Re: cataloguing NOT NULL constraints
On 2023-Aug-15, Dean Rasheed wrote: > I think perhaps for ALTER TABLE INHERIT, it should check that the > child has a NOT NULL constraint, and error out if not. That's the > current behaviour, and also matches other constraints types (e.g., > CHECK constraints). Yeah, I reached the same conclusion yesterday while trying it out, so that's what I implemented. I'll post later today. > More generally though, I'm worried that this is starting to get very > complicated. I wonder if there might be a different, simpler approach. > One vague idea is to have a new attribute on the column that counts > the number of constraints (local and inherited PK and NOT NULL > constraints) that make the column not null. Hmm. I grant that this is different, but I don't see that it is simpler. > Something else I noticed when reading the SQL standard is that a > user-defined CHECK (col IS NOT NULL) constraint should be recognised > by the system as also making the column not null (setting its > "nullability characteristic" to "known not nullable"). I agree with this view actually, but I've refrained from implementing it(*) because our SQL-standards people have advised against it. Insider knowledge? I don't know. I think this is a comparatively smaller consideration though, and we can adjust for it afterwards. (*) Rather: at some point I removed the implementation of that from the patch. > I'm also wondering whether creating a pg_constraint entry for *every* > not-nullable column is actually going too far. If we were to > distinguish between "defined as NOT NULL" and being not null as a > result of one or more constraints, in the way that the standard seems > to suggest, perhaps the former (likely to be much more common) could > simply be a new attribute stored on the column. I think we actually > only need to create pg_constraint entries if a constraint name or any > additional constraint properties such as NOT VALID are specified. That > would lead to far fewer new constraints, less catalog bloat, and less > noise in the \d output. There is a problem if we do this, though, which is that we cannot use the constraints for the things that we want them for -- for example, remove_useless_groupby_columns() would like to use unique constraints, not just primary keys; but it depends on the NOT NULL rows being there for invalidation reasons (namely: if the NOT NULL constraint is dropped, we need to be able to replan. Without catalog rows, we don't have a mechanism to let that happen). If we don't add all those redundant catalog rows, then this is all for naught. -- Álvaro HerreraBreisgau, Deutschland — https://www.EnterpriseDB.com/ Bob [Floyd] used to say that he was planning to get a Ph.D. by the "green stamp method," namely by saving envelopes addressed to him as 'Dr. Floyd'. After collecting 500 such letters, he mused, a university somewhere in Arizona would probably grant him a degree. (Don Knuth)
Re: cataloguing NOT NULL constraints
On Fri, 11 Aug 2023 at 14:54, Alvaro Herrera wrote: > > Right, in the end I got around to that point of view. I abandoned the > idea of adding these dependency links, and I'm back at relying on the > coninhcount/conislocal markers. But there were a couple of bugs in the > accounting for that, so I've fixed some of those, but it's not yet > complete: > > - ALTER TABLE parent ADD PRIMARY KEY > needs to create NOT NULL constraints in children. I added this, but > I'm not yet sure it works correctly (for example, if a child already > has a NOT NULL constraint, we need to bump its inhcount, but we > don't.) > - ALTER TABLE parent ADD PRIMARY KEY USING index > Not sure if this is just as above or needs separate handling > - ALTER TABLE DROP PRIMARY KEY > needs to decrement inhcount or drop the constraint if there are no > other sources for that constraint to exist. I've adjusted the drop > constraint code to do this. > - ALTER TABLE INHERIT > needs to create a constraint on the new child, if parent has PK. Not > implemented > - ALTER TABLE NO INHERIT > needs to delink any constraints (decrement inhcount, possibly drop > the constraint). > I think perhaps for ALTER TABLE INHERIT, it should check that the child has a NOT NULL constraint, and error out if not. That's the current behaviour, and also matches other constraints types (e.g., CHECK constraints). More generally though, I'm worried that this is starting to get very complicated. I wonder if there might be a different, simpler approach. One vague idea is to have a new attribute on the column that counts the number of constraints (local and inherited PK and NOT NULL constraints) that make the column not null. Something else I noticed when reading the SQL standard is that a user-defined CHECK (col IS NOT NULL) constraint should be recognised by the system as also making the column not null (setting its "nullability characteristic" to "known not nullable"). I think that's more than just an artefact of how they say NOT NULL constraints should be implemented, because the effect of such a CHECK constraint should be exposed in the "columns" view of the information schema -- the value of "is_nullable" should be "NO" if the column is "known not nullable". In this sense, the standard does allow multiple not null constraints on a column, independently of whether the column is "defined as NOT NULL". My understanding of the standard is that ALTER COLUMN ... SET/DROP NOT NULL change whether or not the column is "defined as NOT NULL", and manage a single system-generated constraint, but there may be any number of other user-defined constraints that also make the column "known not nullable", and they need to be tracked in some way. I'm also wondering whether creating a pg_constraint entry for *every* not-nullable column is actually going too far. If we were to distinguish between "defined as NOT NULL" and being not null as a result of one or more constraints, in the way that the standard seems to suggest, perhaps the former (likely to be much more common) could simply be a new attribute stored on the column. I think we actually only need to create pg_constraint entries if a constraint name or any additional constraint properties such as NOT VALID are specified. That would lead to far fewer new constraints, less catalog bloat, and less noise in the \d output. Regards, Dean
Re: pgbench: allow to exit immediately when any client is aborted
About pgbench exit on abort v4: Patch applies cleanly, compiles, local make check ok, doc looks ok. This looks ok to me. -- Fabien.
Re: pg_logical_emit_message() misses a XLogFlush()
On 8/15/23 08:38, Michael Paquier wrote: > Hi all, > > While playing with pg_logical_emit_message() and WAL replay, I have > noticed that LogLogicalMessage() inserts a record but forgets to make > sure that the record has been flushed. So, for example, if the system > crashes the message inserted can get lost. > > I was writing some TAP tests for it for the sake of a bug, and I have > found this the current behavior annoying because one cannot really > rely on it when emulating crashes. > > This has been introduced in 3fe3511 (from 2016), and there is no > mention of that on the original thread that led to this commit: > https://www.postgresql.org/message-id/flat/5685F999.6010202%402ndquadrant.com > > This could be an issue for anybody using LogLogicalMessage() out of > core, as well, because it would mean some records lost. So, perhaps > this should be treated as a bug, sufficient for a backpatch? > Shouldn't the flush be done only for non-transactional messages? The transactional case will be flushed by regular commit flush. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: LLVM 16 (opaque pointers)
[...] Further changes are already needed for their "main" branch (LLVM 17-to-be), so this won't quite be enough to shut seawasp up. For information, the physical server which was hosting my 2 bf animals (seawasp and moonjelly) has given up rebooting after a power cut a few weeks/months ago, and I have not setup a replacement (yet). -- Fabien.
Generating memory trace from Postgres backend process
Hi, Has anyone tried generating a dynamic memory trace of a backend Postgres process while it's running a query? I want to characterize the memory access pattern of the Postgres database engine when it's running any given query. The usual way to do this would be to attach a dynamic instrumentation tool like DynamoRIO or Intel Pin to the backend process running the query (?). How could I find the exact backend process to attach to? Also, if there's any prior experience with generating such memory traces, it would be helpful to know what tools were used. best regards, Muneeb Khan
Re: remaining sql/json patches
Hi. in v11, json_query: +The returned data_type has the same semantics +as for constructor functions like json_objectagg; +the default returned type is text. +The ON EMPTY clause specifies the behavior if the +path_expression yields no value at all; the +default when ON ERROR is not specified is to return a +null value. the default returned type is jsonb? Also in above quoted second last line should be ON EMPTY ? Other than that, the doc looks good.
Re: New WAL record to detect the checkpoint redo location
On Fri, Jul 14, 2023 at 8:46 PM Andres Freund wrote: > > Hi, > > As I think I mentioned before, I like this idea. However, I don't like the > implementation too much. Thanks for looking into it. > This might work right now, but doesn't really seem maintainable. Nor do I like > adding branches into this code a whole lot. Okay, Now I have moved the XlogInsert for the special record outside the WalInsertLock so we don't need this special handling here. > > @@ -6597,6 +6612,32 @@ CreateCheckPoint(int flags) > > I think the commentary above this function would need a fair bit of > revising... > > >*/ > > RedoRecPtr = XLogCtl->Insert.RedoRecPtr = checkPoint.redo; > > > > + /* > > + * Insert a special purpose CHECKPOINT_REDO record as the first > > record at > > + * checkpoint redo lsn. Although we have the checkpoint record that > > + * contains the exact redo lsn, there might have been some other > > records > > + * those got inserted between the redo lsn and the actual checkpoint > > + * record. So when processing the wal, we cannot rely on the > > checkpoint > > + * record if we want to stop at the checkpoint-redo LSN. > > + * > > + * This special record, however, is not required when we doing a > > shutdown > > + * checkpoint, as there will be no concurrent wal insertions during > > that > > + * time. So, the shutdown checkpoint LSN will be the same as > > + * checkpoint-redo LSN. > > + * > > + * This record is guaranteed to be the first record at checkpoint > > redo lsn > > + * because we are inserting this while holding the exclusive wal > > insertion > > + * lock. > > + */ > > + if (!shutdown) > > + { > > + int dummy = 0; > > + > > + XLogBeginInsert(); > > + XLogRegisterData((char *) &dummy, sizeof(dummy)); > > + recptr = XLogInsert(RM_XLOG_ID, XLOG_CHECKPOINT_REDO); > > + } > > It seems to me that we should be able to do better than this. > > I suspect we might be able to get rid of the need for exclusive inserts > here. If we rid of that, we could determine the redoa location based on the > LSN determined by the XLogInsert(). Yeah, good idea, actually we can do this insert outside of the exclusive insert lock and set the LSN of this insert as the checkpoint. redo location. So now we do not need to compute the checkpoint. redo based on the current insertion point we can directly use the LSN of this record. I have modified this and I have also moved some other code that is not required to be inside the WAL insertion lock. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com v2-0001-New-WAL-record-for-checkpoint-redo-location.patch Description: Binary data
Re: postgres_fdw: wrong results with self join + enable_nestloop off
On Tue, Aug 8, 2023 at 6:30 PM Richard Guo wrote: > On Tue, Aug 8, 2023 at 4:40 PM Etsuro Fujita wrote: >> I modified the code a bit further to use an if-test to avoid a useless >> function call, and added/tweaked comments and docs further. Attached >> is a new version of the patch. I am planning to commit this, if there >> are no objections. > +1 to the v4 patch. It looks good to me. Pushed after some copy-and-paste editing of comments/documents. Thanks! Best regards, Etsuro Fujita
Re: pgbench - adding pl/pgsql versions of tests
Hello Nathan, 1. so I don't have to create the script and function manually each time I want to test mainly the database (instead of the client-database system) 2. so that new users of PostgreSQL can easily see how much better OLTP workloads perform when packaged up as a server-side function I'm not sure we should add micro-optimized versions of the existing scripts to pgbench. Your point about demonstrating the benefits of server-side functions seems reasonable, but it also feels a bit like artifically improving pgbench numbers. I think I'd rather see some more variety in the built-in scripts so that folks can more easily test a wider range of common workloads. Perhaps this could include a test that is focused on server-side functions. ISTM that your argument suggests to keep the tpcb-like PL/pgSQL version. It is the more beneficial anyway as it merges 4/5 commands in one call, so it demonstrate the effect of investing in this kind of approach. I'm unclear about what variety of scripts that could be provided given the tables made available with pgbench. ISTM that other scenari would involve both an initialization and associated scripts, and any proposal would be bared because it would open the door to anything. In any case, it looks like there is unaddressed feedback for this patch, so I'm marking it as "waiting on author." Indeed. -- Fabien.
Re: Extract numeric filed in JSONB more effectively
út 15. 8. 2023 v 9:05 odesílatel Andy Fan napsal: > > >> a) effectiveness. The ending performance should be similar like your >> current patch, but without necessity to use planner support API. >> > > So the cost is we need to create a new & different framework. > yes, it can be less work, code than for example introduction of "anycompatible". > >> > b) because you can write only var := j->'f', and plpgsql forces cast >> function execution, but not via planner. >> > > var a := 1 needs going with planner, IIUC, same with j->'f'. > i was wrong, the planner is full, but the executor is reduced. > > c) nothing else. It should not to require to modify cast function >> definitions >> > > If you look at how the planner support function works, that is > pretty simple, just modify the prosupport attribute. I'm not sure > this should be called an issue or avoiding it can be described > as a benefit. > > I don't think the current case is as bad as the other ones like > users needing to modify their queries or type-safety system > being broken. So personally I'm not willing to creating some > thing new & heavy. However I'm open to see what others say. > ok regards Pavel > > -- > Best Regards > Andy Fan >
Re: Extract numeric filed in JSONB more effectively
> > a) effectiveness. The ending performance should be similar like your > current patch, but without necessity to use planner support API. > So the cost is we need to create a new & different framework. > > b) because you can write only var := j->'f', and plpgsql forces cast > function execution, but not via planner. > var a := 1 needs going with planner, IIUC, same with j->'f'. c) nothing else. It should not to require to modify cast function > definitions > If you look at how the planner support function works, that is pretty simple, just modify the prosupport attribute. I'm not sure this should be called an issue or avoiding it can be described as a benefit. I don't think the current case is as bad as the other ones like users needing to modify their queries or type-safety system being broken. So personally I'm not willing to creating some thing new & heavy. However I'm open to see what others say. -- Best Regards Andy Fan