Re: Unresolved repliaction hang and stop problem.
On Tue, Aug 10, 2021 at 8:15 PM Ha Ka wrote: > > sob., 19 cze 2021 o 12:14 Amit Kapila napisał(a): > > We increased logical_decoding_work_mem for our production database > from 64 to 192 MB and it looks like the issue still persists. The > frequency with which replication hangs has remained the same. > Sounds strange. I think one thing to identify at the time slowdown has happened is whether there are a very large number of in-progress transactions at the time slowdown happened. Because the profile shared last time seems to be spending more time in hash_seq_search than in actually serializing the exact. Another possibility to try out for your case is to just always serialize the current xact and see what happens, this might not be an actual solution but can help in diagnosing the problem. > Do you > need any additional perf reports after our change? > It might be good if you can share the WALSender portion of perf as shared in one of the emails above? -- With Regards, Amit Kapila.
Re: SI messages sent when excuting ROLLBACK PREPARED command
On Tue, Aug 03, 2021 at 09:29:48AM +, liuhuail...@fujitsu.com wrote: > There was a problem with the before patch when testing. > So resubmit it. FWIW, I see no problems with patch version 1 or 2, as long as you apply patch version 1 with a command like patch -p2. One thing of patch 2 is that git diff --check complains because of a whitespace. Anyway, I also think that you are right here and that there is no need to run this code path with ROLLBACK PREPARED. It is worth noting that the point of Tom about local invalidation messages in PREPARE comes from PostPrepare_Inval(). I would just tweak the comment block at the top of what's being changed, as per the attached. Please let me know if there are any objections. -- Michael diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c index 6d3efb49a4..2156de187c 100644 --- a/src/backend/access/transam/twophase.c +++ b/src/backend/access/transam/twophase.c @@ -1520,13 +1520,17 @@ FinishPreparedTransaction(const char *gid, bool isCommit) * Handle cache invalidation messages. * * Relcache init file invalidation requires processing both before and - * after we send the SI messages. See AtEOXact_Inval() + * after we send the SI messages, only when committing. See + * AtEOXact_Inval(). */ - if (hdr->initfileinval) - RelationCacheInitFilePreInvalidate(); - SendSharedInvalidMessages(invalmsgs, hdr->ninvalmsgs); - if (hdr->initfileinval) - RelationCacheInitFilePostInvalidate(); + if (isCommit) + { + if (hdr->initfileinval) + RelationCacheInitFilePreInvalidate(); + SendSharedInvalidMessages(invalmsgs, hdr->ninvalmsgs); + if (hdr->initfileinval) + RelationCacheInitFilePostInvalidate(); + } /* * Acquire the two-phase lock. We want to work on the two-phase callbacks signature.asc Description: PGP signature
Re: Added schema level support for publication.
On Mon, Aug 9, 2021 at 9:50 PM Mark Dilger wrote: > > > > > On Aug 6, 2021, at 1:32 AM, vignesh C wrote: > > > > the attached v19 patch > > With v19 applied, a schema owner can publish the contents of a table > regardless of ownership or permissions on that table: > > +CREATE ROLE user1; > +GRANT CREATE ON DATABASE regression TO user1; > +CREATE ROLE user2; > +GRANT CREATE ON DATABASE regression TO user2; > +SET SESSION AUTHORIZATION user1; > +CREATE SCHEMA user1schema; > +GRANT CREATE, USAGE ON SCHEMA user1schema TO user2; > +RESET SESSION AUTHORIZATION; > +SET SESSION AUTHORIZATION user2; > +CREATE TABLE user1schema.user2private (junk text); > +REVOKE ALL PRIVILEGES ON user1schema.user2private FROM PUBLIC; > +REVOKE ALL PRIVILEGES ON user1schema.user2private FROM user1; > +CREATE TABLE user1schema.user2public (junk text); > +GRANT SELECT ON user1schema.user2public TO PUBLIC; > +RESET SESSION AUTHORIZATION; > +SET SESSION AUTHORIZATION user1; > +SELECT * FROM user1schema.user2private; > +ERROR: permission denied for table user2private > +SELECT * FROM user1schema.user2public; > + junk > +-- > +(0 rows) > + > +CREATE PUBLICATION user1pub; > +WARNING: wal_level is insufficient to publish logical changes > +HINT: Set wal_level to logical before creating subscriptions. > +ALTER PUBLICATION user1pub > + ADD TABLE user1schema.user2public; > +ERROR: must be owner of table user2public > +ALTER PUBLICATION user1pub > + ADD TABLE user1schema.user2private, user1schema.user2public; > +ERROR: must be owner of table user2private > +SELECT * FROM pg_catalog.pg_publication_tables > + WHERE pubname = 'user1pub'; > + pubname | schemaname | tablename > +-++--- > +(0 rows) > + > +ALTER PUBLICATION user1pub ADD SCHEMA user1schema; > +SELECT * FROM pg_catalog.pg_publication_tables > + WHERE pubname = 'user1pub'; > + pubname | schemaname | tablename > +--+-+-- > + user1pub | user1schema | user2private > + user1pub | user1schema | user2public > +(2 rows) > > It is a bit counterintuitive that schema owners do not have administrative > privileges over tables within their schemas, but that's how it is. The > design of this patch seems to assume otherwise. Perhaps ALTER PUBLICATION > ... ADD SCHEMA should be restricted to superusers, just as FOR ALL TABLES? I will handle this in the next version of the patch. Additionally I will add this check for "Alter publication add schema" and "Alter publication set schema". I'm not planning to add this check for "Alter publication drop schema" to keep the behavior similar to "Alter publication drop table". Also, the behavior of "Alter publication drop table" for which the user is not the owner is successful, Is this behavior correct? create table tbl1(c1 int); create table tbl2(c1 int); create publication mypub1 for table tbl1,tbl2; SET SESSION AUTHORIZATION user2; alter table tbl2 owner to user2; RESET SESSION AUTHORIZATION; postgres=> alter publication mypub1 drop table tbl2; ALTER PUBLICATION postgres=> alter publication mypub1 add table tbl2; ERROR: must be owner of table tbl2 Thoughts? Regards, Vignesh
Re: Skipping logical replication transactions on subscriber side
On Tue, Aug 10, 2021 at 10:27 PM Greg Nancarrow wrote: > > On Tue, Aug 10, 2021 at 3:07 PM Masahiko Sawada wrote: > > > > I've attached the latest patches that incorporated all comments I got > > so far. Please review them. > > > > Some initial review comments on the v6-0001 patch: Thanks for reviewing the patch! > > > src/backend/replication/logical/proto.c: > (1) > > + TimestampTz committs; > > I think it looks better to name "committs" as "commit_ts", and also is > more consistent with naming for other member "remote_xid". Fixed. > > src/backend/replication/logical/worker.c: > (2) > To be consistent with all other function headers, should start > sentence with capital: "get" -> "Get" > > + * get string representing LogicalRepMsgType. Fixed > > (3) It looks a bit cumbersome and repetitive to set/update the members > of apply_error_callback_arg in numerous places. > > I suggest making the "set_apply_error_context..." and > "reset_apply_error_context..." functions as "static inline void" > functions (moving them to the top part of the source file, and > removing the existing function declarations for these). > > Also, can add something similar to below: > > static inline void > set_apply_error_callback_xid(TransactionId xid) > { >apply_error_callback_arg.remote_xid = xid; > } > > static inline void > set_apply_error_callback_xid_info(TransactionId xid, TimestampTz commit_ts) > { >apply_error_callback_arg.remote_xid = xid; >apply_error_callback_arg.commit_ts = commit_ts; > } > > so that instances of, for example: > >apply_error_callback_arg.remote_xid = prepare_data.xid; >apply_error_callback_arg.committs = prepare_data.commit_time; > > can be: > >set_apply_error_callback_tx_info(prepare_data.xid, > prepare_data.commit_time); Okay. I've added set_apply_error_callback_xact() function to set transaction information to apply error callback. Also, I inlined those helper functions since we call them every change. > > (4) The apply_error_callback() function is missing a function header/comment. Added. The fixes for the above comments are incorporated in the v7 patch I just submitted[1]. Regards, [1] https://www.postgresql.org/message-id/CAD21AoALAq_0q_Zz2K0tO%3DkuUj8aBrDdMJXbey1P6t4w8snpQQ%40mail.gmail.com -- Masahiko Sawada EDB: https://www.enterprisedb.com/
Re: Support reset of Shared objects statistics in "pg_stat_reset" function
On Wed, 11 Aug 2021 at 09:17, Dilip Kumar wrote: > > On Tue, Aug 10, 2021 at 10:49 PM Mahendra Singh Thalor > wrote: > > > I checked this and found that we already have one process "stats > > collector" and in v02 patch, we are sending requests to collect stats > > two times. I don't know how costly one call is but I feel that we can > > avoid the 2nd call by keeping handling of the shared tables into > > pgstat_recv_resetcounter. > > > > I think let's not make the stat collector aware of what we want to > achieve, so the stat collector will only reset for what we have asked > for and let the caller or the signal sender decide what they want to > do. > Thanks Dilip for your opinion. If we want to use pgstat_recv_resetcounter with invalid database oid, then we should update all the comments of pgstat_recv_resetcounter function because we are calling this function with both valid and invalid database oid. If we are passing invalid database oid, it means we want to reset stats for shared tables. 1) /* * Lookup the database in the hashtable. Nothing to do if not there. */ dbentry = pgstat_get_db_entry(msg->m_databaseid, false); We should update the above comment and if m_databaseid is invalid, then we should always get dbentry. Assert (msg->m_databaseid == 0 && dbentry ) and some more sanity checks. 2) /* * We simply throw away all the database's table entries by recreating a * new hash table for them. */ I think we should update this also. 3) * Reset the statistics for the specified database. This should be updated. -- Thanks and Regards Mahendra Singh Thalor EnterpriseDB: http://www.enterprisedb.com
Re: [BUG]Update Toast data failure in logical replication
On Tue, Aug 10, 2021 at 8:08 PM Alvaro Herrera wrote: > > On 2021-Jul-30, Amit Kapila wrote: > > Reading Dilip's last posted patch that day, I had some reservations > about the API of ExtractReplicaIdentity. The new argument makes for a > very strange to explain behavior "return the key values if they are > unchanged, *or* if they are toasted" ... ??? > I think we can say it as "Return the key values if they are changed *or* if they are toasted". Currently, we have an exception for Deletes where the caller always passed key_changed as true, so maybe we can have a similar exception when the tuple has toasted values. Can we think of changing the flag to "key_required" instead of "key_changed" and let the caller identify and set its value? For Deletes, it will work the same but for Updates, the caller needs to compute it by checking if any of the key columns are modified or has a toast value. We can try to see if the caller can identify it cheaply along with determining the modified_attrs as at that time we will anyway check replica key attrs. Currently, in proposed patch first, we check that the tuple has any toast values and then it deforms and forms the new key tuple. After that, it checks if the key has any toast values and then only decides to return the tuple. If as described in the previous paragraph, we can cheaply identify whether the key has toasted values, then we can avoid deform/form cost in some cases. Also, I think we need to change the Replica Identity description in the docs[1]. [1] - https://www.postgresql.org/docs/devel/sql-altertable.html -- With Regards, Amit Kapila.
Re: Fix around conn_duration in pgbench
On 2021/08/05 18:02, Yugo NAGATA wrote: this is a no-op because finishCon() is already called at CSTATE_ABORTED or CSTATE_FINISHED. Therefore, in the end, the disconnection delay is not measured even in v13. Yes, but I was thinking that's a bug that we should fix. IOW, I was thinking that, in v13, both connection and disconnection delays should be measured whether -C is specified or not, *per spec*. But, in v13, the disconnection delays are not measured in the cases where -C is specified and not specified. So I was thinking that this is a bug and we should fix those both cases. But you're thinking that, in v13, the disconnection delays don't need to be measured because they are not measured for now? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Allow parallel DISTINCT
Back in March 2016, e06a38965 added support for parallel aggregation. IIRC, because it was fairly late in the release cycle, I dropped parallel DISTINCT to reduce the scope a little. It's been on my list of things to fix since then. I just didn't get around to it until today. The patch is just some plumbing work to connect all the correct paths up to make it work. It's all fairly trivial. I thought about refactoring things a bit more to get rid of the additional calls to grouping_is_sortable() and grouping_is_hashable(), but I just don't think it's worth making the code ugly for. We'll only call them again if we're considering a parallel plan, in which case it's most likely not a trivial query. Those functions are pretty cheap anyway. I understand that there's another patch in the September commitfest that does some stuff with Parallel DISTINCT, but that goes about things a completely different way by creating multiple queues to distribute values by hash. I don't think there's any overlap here. We'd likely want to still have the planner consider both methods if we get that patch sometime. David parallel_distinct.patch Description: Binary data
Re: use-regular-expressions-to-simplify-less_greater-and-not_equals.patch
On Tue, Aug 10, 2021 at 8:36 PM Tom Lane wrote: > "=?UTF-8?B?5a2Z6K+X5rWpKOaAneaJjSk=?=" > writes: > > we can use regular expressions (<>|!=) to cover "<>" and "!=". > There is no > > need to have two definitions less_greater and not_equals, because it > will confuse developer. > > So, we can use only not_equals to cover this operator set. > > I do not find this an improvement. Agreed, though mostly on a separation of responsibilities principle. Labelling the distinctive tokens and then assigning them meaning are two different things. Yeah, it's a bit shorter, but it's > less clear; not least because the comment explaining the <>-means-!= > behavior is no longer anywhere near the code that implements that > behavior. The patch should just remove the comment for not_equals as well: if both tokens are given the same name they would have to be treated identically. The fact they have the same name is clearer in that the equivalency knowledge is immediate and impossible to miss (if one doesn't go and check how "less_greater" and "not_equal" are interpreted.) It would also get in the way if we ever had reason to treat <> > and != as something other than exact equivalents. > > This is unconvincing both on the odds of being able to treat them differently and the fact that the degree of effort to overcome this obstacle seems minimal - about the same amount as reverting this patch, not accounting for the coding of the new behavior. In short, for me, is a principled separation of concerns worth the slight loss is clarity? I'll stick to my status quo choice, but not strongly. Some more context as to exactly what kind of confusion this setup is causing could help tip the scale. David J.
Re: Support reset of Shared objects statistics in "pg_stat_reset" function
On Tue, Aug 10, 2021 at 10:49 PM Mahendra Singh Thalor wrote: > I checked this and found that we already have one process "stats > collector" and in v02 patch, we are sending requests to collect stats > two times. I don't know how costly one call is but I feel that we can > avoid the 2nd call by keeping handling of the shared tables into > pgstat_recv_resetcounter. > I think let's not make the stat collector aware of what we want to achieve, so the stat collector will only reset for what we have asked for and let the caller or the signal sender decide what they want to do. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: use-regular-expressions-to-simplify-less_greater-and-not_equals.patch
"=?UTF-8?B?5a2Z6K+X5rWpKOaAneaJjSk=?=" writes: > we can use regular expressions (<>|!=) to cover "<>" and "!=". There is > no > need to have two definitions less_greater and not_equals, because it will > confuse developer. > So, we can use only not_equals to cover this operator set. I do not find this an improvement. Yeah, it's a bit shorter, but it's less clear; not least because the comment explaining the <>-means-!= behavior is no longer anywhere near the code that implements that behavior. It would also get in the way if we ever had reason to treat <> and != as something other than exact equivalents. regards, tom lane
use-regular-expressions-to-simplify-less_greater-and-not_equals.patch
we can use regular expressions (<>|!=) to cover "<>" and "!=". There is no need to have two definitions less_greater and not_equals, because it will confuse developer. So, we can use only not_equals to cover this operator set. Please review my patch. Thanks. use-regular-expressions-to-simplify-less_greater-and-not_equals.patch Description: Binary data
Re: Add option --drop-cascade for pg_dump/restore
On Tue, Aug 10, 2021 at 10:57 PM Greg Sabino Mullane wrote: > > On Fri, Jul 16, 2021 at 9:40 AM Tom Lane wrote: >> >> That would require pg_restore to try to edit the DROP commands during >> restore, which sounds horribly fragile. I'm inclined to think that >> supporting this option only during initial dump is safer. > > > Safer, but not nearly as useful. Maybe see what the OP (Wu Haotian) can come > up with as a first implementation? > > Cheers, > Greg > pg_restore already tries to edit the DROP commands during restore in order to support `--if-exists`. > supporting this option only during initial dump is safer. pg_dump & pg_restores use the same function to inject `IF EXISTS` ( and `DROP .. CASCADE` in this patch`). Supporting this option only during pg_dump may not make it safer, as the logic is the same.
Re: Postgres perl module namespace
On 8/10/21 10:26 PM, Andrew Dunstan wrote: > On 8/10/21 10:13 PM, Mark Dilger wrote: >>> On Aug 10, 2021, at 7:11 PM, Andrew Dunstan wrote: >>> >>> If we were publishing them on CPAN that would be reasonable. But we're >>> not, nor are we likely to, I believe. >> I'm now trying to understand the purpose of the renaming. I thought the >> problem was that RPM packagers wanted something that was unlikely to >> collide. Publishing on CPAN would be the way to claim the namespace. >> >> What's the purpose of this idea then? If there isn't one, I'd rather just >> keep the current names. > > > Yes we want them to be in a namespace where they are unlikely to collide > with anything else. No, you don't have to publish on CPAN to achieve that. > Incidentally, not publishing on CPAN was a major reason given a few years ago for using fairly lax perlcritic policies. If we publish these on CPAN now some people at least might want to revisit that decision. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Postgres perl module namespace
On 8/10/21 10:13 PM, Mark Dilger wrote: > >> On Aug 10, 2021, at 7:11 PM, Andrew Dunstan wrote: >> >> If we were publishing them on CPAN that would be reasonable. But we're >> not, nor are we likely to, I believe. > I'm now trying to understand the purpose of the renaming. I thought the > problem was that RPM packagers wanted something that was unlikely to collide. > Publishing on CPAN would be the way to claim the namespace. > > What's the purpose of this idea then? If there isn't one, I'd rather just > keep the current names. Yes we want them to be in a namespace where they are unlikely to collide with anything else. No, you don't have to publish on CPAN to achieve that. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Postgres perl module namespace
On 8/10/21 9:25 PM, Michael Paquier wrote: > On Tue, Aug 10, 2021 at 11:02:13AM -0400, Andrew Dunstan wrote: >> I can live with that. I guess to be consistent we would also rename >> PostgresVersion to PgVersion > Are RewindTest.pm and SSLServer.pm things you are considering for a > renaming as well? It may be more consistent to put everything in the > same namespace if this move is done. It could be very easily done. But I doubt these will make their way into packages, which is how this discussion started. There's good reason to package src/test/perl, so you can use those modules to write TAP tests for extensions. The same reasoning doesn't apply to SSLServer.pm and RewindTest.pm. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Postgres perl module namespace
> On Aug 10, 2021, at 7:11 PM, Andrew Dunstan wrote: > > If we were publishing them on CPAN that would be reasonable. But we're > not, nor are we likely to, I believe. I'm now trying to understand the purpose of the renaming. I thought the problem was that RPM packagers wanted something that was unlikely to collide. Publishing on CPAN would be the way to claim the namespace. What's the purpose of this idea then? If there isn't one, I'd rather just keep the current names. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Postgres perl module namespace
On 8/10/21 9:37 PM, Mark Dilger wrote: > >> On Aug 10, 2021, at 7:10 AM, Andrew Dunstan wrote: >> >> use PgTest::Utils; >> use PgTest::PostgresNode; > Checking CPAN, it seems there are three older modules with names starting > with "Postgres": > > Postgres > Postgres::Handler > Postgres::Handler::HTML > > It would be confusing to combine official PostgreSQL modules with those third > party ones, so perhaps we can claim the PostgreSQL namespace for official > project modules. How about: > > PostgreSQL::Test::Cluster > PostgreSQL::Test::Lib > PostgreSQL::Test::Utils > > and then if we ever wanted to have official packages for non-test purposes, > we could start another namespace under PostgreSQL. > If we were publishing them on CPAN that would be reasonable. But we're not, nor are we likely to, I believe. I'd rather not have to add two level of directory hierarchy for this, and this also seems a bit long-winded: my $node = PostgreSQL::Test::Cluster->new('nodename'); cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Postgres perl module namespace
On Wed, Aug 11, 2021 at 9:37 AM Mark Dilger wrote: > > > On Aug 10, 2021, at 7:10 AM, Andrew Dunstan wrote: > > > > use PgTest::Utils; > > use PgTest::PostgresNode; > > Checking CPAN, it seems there are three older modules with names starting > with "Postgres": > > Postgres > Postgres::Handler > Postgres::Handler::HTML > > It would be confusing to combine official PostgreSQL modules with those third > party ones, so perhaps we can claim the PostgreSQL namespace for official > project modules. How about: > > PostgreSQL::Test::Cluster > PostgreSQL::Test::Lib > PostgreSQL::Test::Utils > > and then if we ever wanted to have official packages for non-test purposes, > we could start another namespace under PostgreSQL. Maybe it's me but I would find that more confusing. Also, to actually claim PostgreSQL namespace, we would have to actually publish them on CPAN right?
Re: Postgres perl module namespace
> On Aug 10, 2021, at 7:10 AM, Andrew Dunstan wrote: > > use PgTest::Utils; > use PgTest::PostgresNode; Checking CPAN, it seems there are three older modules with names starting with "Postgres": Postgres Postgres::Handler Postgres::Handler::HTML It would be confusing to combine official PostgreSQL modules with those third party ones, so perhaps we can claim the PostgreSQL namespace for official project modules. How about: PostgreSQL::Test::Cluster PostgreSQL::Test::Lib PostgreSQL::Test::Utils and then if we ever wanted to have official packages for non-test purposes, we could start another namespace under PostgreSQL. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Postgres perl module namespace
On Tue, Aug 10, 2021 at 11:02:13AM -0400, Andrew Dunstan wrote: > I can live with that. I guess to be consistent we would also rename > PostgresVersion to PgVersion Are RewindTest.pm and SSLServer.pm things you are considering for a renaming as well? It may be more consistent to put everything in the same namespace if this move is done. -- Michael signature.asc Description: PGP signature
Re: Tab completion for CREATE SCHEMAAUTHORIZATION
On Mon, Aug 09, 2021 at 07:00:02PM +0100, Dagfinn Ilmari Mannsåker wrote: > Thanks for the review. Updated patch attached, with the CURRENT/SESSION > ROLE/USER changes for other commands separated out. +#define Query_for_list_of_owner_roles \ +Query_for_list_of_roles \ " UNION ALL SELECT 'CURRENT_ROLE'"\ " UNION ALL SELECT 'CURRENT_USER'"\ " UNION ALL SELECT 'SESSION_USER'" I don't object to the refactoring you are doing here with three Query_for_list_of_*_roles each one depending on the other for clarity. Neither do I really object to not using COMPLETE_WITH_QUERY() with some extra UNION ALL hardcoded in each code path as there 6 cases for _owner_, 6 for _grant_ and 6 for _roles if my count is right. Still, if I may ask, wouldn't it be better to document a bit what's the expectation behind each one of them? Perhaps the names of the queries are too generic for the purposes where they are used (say _grant_ for CREATE USER MAPPING looks confusing)? + else if (Matches("CREATE", "SCHEMA", "AUTHORIZATION")) + COMPLETE_WITH_QUERY(Query_for_list_of_owner_roles); + else if (Matches("CREATE", "SCHEMA", MatchAny, "AUTHORIZATION")) + COMPLETE_WITH_QUERY(Query_for_list_of_owner_roles); + else if (Matches("CREATE", "SCHEMA", MatchAny, "AUTHORIZATION", MatchAny)) + COMPLETE_WITH("CREATE", "GRANT"); + else if (Matches("CREATE", "SCHEMA", MatchAny)) + COMPLETE_WITH("AUTHORIZATION", "CREATE", "GRANT"); Looks like you forgot the case "CREATE SCHEMA AUTHORIZATION MatchAny" that should be completed by GRANT and CREATE. -- Michael signature.asc Description: PGP signature
Re: Postgres picks suboptimal index after building of an extended statistics
On Wed, Jun 23, 2021 at 7:19 AM Andrey V. Lepikhov wrote: > Ivan Frolkov reported a problem with choosing a non-optimal index during > a query optimization. This problem appeared after building of an > extended statistics. Any thoughts on this, Tomas? -- Peter Geoghegan
Re: [PATCH] test/ssl: rework the sslfiles Makefile target
On Tue, Aug 10, 2021 at 09:36:14AM +0200, Daniel Gustafsson wrote: > I personally think the increased readability and usability from what we have > today warrants the changes. Is it the use of .SECONDARY or the change in the > global Makefile you object to (or both)? The part I am mainly objecting to is the change in Makefile.global.in, but I have to admit after thinking about it that enforcing SECONDARY may not be a good idea if other parts of the system rely on that, so encouraging the use of clean_intermediates may be dangerous (Tom's point from upthread). I have not tried so I am not sure, but perhaps we should just focus on reducing the number of openssl commands rather than making easier the integration of new files? It could be possible to close the gap with the addition of new files with some more documentation for future hackers then? -- Michael signature.asc Description: PGP signature
Re: Next Steps with Hash Indexes
On Thu, Jul 15, 2021 at 9:41 AM Simon Riggs wrote: > It would be very desirable to allow Hash Indexes to become Primary Key > Indexes, which requires both > amroutine->amcanunique = true; > amroutine->amcanmulticol = true; Why do you say that? I don't think it's self-evident that it's desirable. In general I don't think that hash indexes are all that compelling compared to B-Trees. In practice the flexibility of B-Trees tends to win out, even if B-Trees are slightly slower than hash indexes with certain kinds of benchmarks that are heavy on point lookups and have no locality. I have no reason to object to any of this, and I don't object. I'm just asking. -- Peter Geoghegan
Re: Quirk of pg_temp schemas ...
On Tue, Aug 10, 2021 at 11:30:35AM -0400, Tom Lane wrote: > Greg Stark writes: >> While fixing up a patch I had dealing with temporary tables I noticed >> a bit of a quirk with pg_temp schemas. Namely that we have no actual >> meta data marking them as temporary aside from their names. And we >> don't do anything to protect that -- superuser can happily issue ALTER >> SCHEMA RENAME to rename it to a name that doesn't match pg_temp*. The fun does not stop here. Here is one: drop the existing temporary schema as superuser, keep the connection that dropped it opened, and play with various temporary objects, even types or functions. > This seems to me to be not very different from the 1001 other ways that > a superuser can break a database. If *non* superusers could rename > those schemas then I'd agree there's a problem to be solved. If non-superusers could do anything that change what's stored in pg_namespace and make things inconsistent with the backend-specific state stored in memory, we are in trouble. -- Michael signature.asc Description: PGP signature
Re: ECPG bug fix: DECALRE STATEMENT and DEALLOCATE, DESCRIBE
On Tue, Aug 10, 2021 at 09:31:37AM +0200, Michael Meskes wrote: >> that it could be a good thing. declare.pgc seems to rely on that >> already but the tests are incorrect as I mentioned in [2]. For >> DESCRIBE, that provides data about a result set, I find the >> assignment >> of a connection a bit strange, and even if this would allow the use >> of >> the same statement name for multiple connections, it seems to me that >> there is a risk of breaking existing applications. There should not >> be that many, so perhaps that's fine anyway. > > I don't think we'd break anything given that DECLARE STATEMENT is new. Sure, DECLARE does not matter as it is new. However, please note that the specific point I was trying to make with my link [2] from upthread is related to the use of cached connection names with DEALLOCATE, as of this line in the new test declare.pgc: EXEC SQL DEALLOCATE PREPARE stmt_2; And DEALLOCATE is far from being new. > Also please keep in mind that you can use EXEC SQL AT ... DESCRIBE ...; > already anyway. Again, not very meaningful but why should we accept a > connection one way but not the other? No objections to that. -- Michael signature.asc Description: PGP signature
Re: Worth using personality(ADDR_NO_RANDOMIZE) for EXEC_BACKEND on linux?
On Wed, Aug 11, 2021 at 7:07 AM Thomas Munro wrote: > On Wed, Aug 11, 2021 at 2:12 AM Andres Freund wrote: > > On Tue, Aug 10, 2021, at 15:19, Thomas Munro wrote: > > > Yeah, make check always fails for me on macOS 11. With the attached > > > experimental hack, it fails only occasionally (1 in 8 runs or so). I > > > don't know why. > > > > I suspect you'd need to use the hack in pg_ctl to make it reliable. The > > layout of normally stayed position independent postmaster can be > > incompatible with the non ASLR spawned child. > > Yeah, but the patch already changes both pg_ctl.c and postmaster.c. /me stares at vmmap output for a while Oooh. It's working perfectly (for example if you export PATH=binarys:$PATH, pg_ctl -D pgdata start, make installcheck), but pg_regress.c has its own separate fork/exec to launch the temporary cluster that needs to be similarly hacked. Unfortunately I have to give this Macintosh back and go and do some real work on a different computer now. That does seem to be a working solution to the problem, though, and could be polished into proposable form. I saw claims that you can also link with -Wl,-no_pie or toggle the PIE bit on your executable and libraries, but that didn't work for me on 11, Intel (no effect) or ARM (linker option gone).
Re: Use extended statistics to estimate (Var op Var) clauses
On 8/9/21 9:19 PM, Mark Dilger wrote: On Jul 20, 2021, at 11:28 AM, Tomas Vondra wrote: Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company <0001-Handling-Expr-op-Expr-clauses-in-extended-stats-20210720.patch> Hi Tomas, I tested this patch against master looking for types of clauses that uniformly get worse with the patch applied. I found some. The tests are too large to attach, but the scripts that generate them are not. To perform the tests: git checkout master perl ./gentest.pl > src/test/regress/sql/gentest.sql cat /dev/null > src/test/regress/expected/gentest.out echo "test: gentest" >> src/test/regress/parallel_schedule ./configure && make && make check cp src/test/regress/results/gentest.out src/test/regress/expected/gentest.out patch -p 1 < 0001-Handling-Expr-op-Expr-clauses-in-extended-stats-20210720.patch make check cat src/test/regress/regression.diffs | perl ./check.pl This shows patterns of conditions that get worse, such as: better:0, worse:80: A < B and A <> A or not A < A better:0, worse:80: A < B and not A <= A or A <= A better:0, worse:80: A < B or A = A better:0, worse:80: A < B or A = A or not A >= A better:0, worse:80: A < B or A >= A better:0, worse:80: A < B or A >= A and not A <> A better:0, worse:80: A < B or not A < A better:0, worse:80: A < B or not A <> A better:0, worse:80: A < B or not A <> A or A <= A better:0, worse:80: A < B or not A >= A or not A < A It seems things get worse when the conditions contain a column compared against itself. I suspect that is being handled incorrectly. Thanks for this testing! I took a quick look, and I think this is mostly due to luck in how the (default) range estimates combine without and with extended statistics. Consider for example this simple example: create table t (a int, b int); insert into t select mod(i,10), mod(i,20) from generate_series(1,100) s(i); Without stats, the first clauses example is estimated like this: explain (timing off, analyze) select * from t where (A < B and A <> A) or not A < A; QUERY PLAN -- Seq Scan on t (cost=0.00..21925.00 rows=55 width=8) (actual rows=100 loops=1) Filter: (((a < b) AND (a <> a)) OR (a >= a)) Planning Time: 0.054 ms Execution Time: 80.485 ms (4 rows) and with MCV on (a,b) it gets estimates like this: QUERY PLAN -- Seq Scan on t (cost=0.00..21925.00 rows=33 width=8) (actual rows=100 loops=1) Filter: (((a < b) AND (a <> a)) OR (a >= a)) Planning Time: 0.152 ms Execution Time: 79.917 ms (4 rows) So with the statistics, the estimate gets a bit worse. The reason is fairly simple - if you look at the two parts of the OR clause, we get this: clause actualno statswith stats --- (A < B and A <> A)0 331667 1 not (A < A) 100 3333 This clearly shows that the first clause is clearly improved, while the (A < A) is estimated the same way, because the clause has a single Var so it's considered to be "simple" so we ignore the MCV selectivity and just use the simple_sel calculated by clause_selectivity_ext. And the 33 and 331667 just happen to be closer to the actual row count. But that's mostly by luck, clearly. But now that I think about it, maybe the problem really is in how statext_mcv_clauselist_selectivity treats this clause - the definition of "simple" clauses as "has one attnum" was appropriate when only clauses (Var op Const) were supported. But with (Var op Var) that's probably not correct anymore. And indeed, commenting out the if condition on line 1933 (so ignoring simple_sel) and that does improve the estimates for this query. But perhaps I'm missing something, this needs more thought. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Autovacuum on partitioned table (autoanalyze)
On 2021-Aug-09, Alvaro Herrera wrote: > > 3) What is the goal of the autovac_refresh_stats() after the loop doing > >pgstat_report_anl_ancestors()? I think it'll be common that the stats > >collector hasn't even processed the incoming messages by that point, not > > to > >speak of actually having written out a new stats file. If it took less > > than > >10ms (PGSTAT_RETRY_DELAY) to get to autovac_refresh_stats(), > >backend_read_statsfile() will not wait for a new stats file to be written > >out, and we'll just re-read the state we previously did. > > > >It's pretty expensive to re-read the stats file in some workloads, so > > I'm a > >bit concerned that we end up significantly increasing the amount of stats > >updates/reads, without actually gaining anything reliable? > > This is done once per autovacuum run and the point is precisely to let > the next block absorb the updates that were sent. In manual ANALYZE we > do it to inform future autovacuum runs. > > Note that the PGSTAT_RETRY_DELAY limit is used by the autovac launcher > only, and this code is running in the worker; we do flush out the old > data. Yes, it's expensive, but we're not doing it once per table, just > once per worker run. I misunderstood what you were talking about here -- I thought it was about the delay in autovac_refresh_stats (STATS_READ_DELAY, 1s). Now that I look at this again I realize what your point is, and you're right, there isn't sufficient time for the collector to absorb the messages we sent before the next scan pg_class scan starts. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "Cada quien es cada cual y baja las escaleras como quiere" (JMSerrat)
Re: DROP relation IF EXISTS Docs and Tests - Bug Fix
On Tue, Aug 10, 2021 at 12:04 PM Robert Haas wrote: > On Tue, Mar 9, 2021 at 10:09 AM David G. Johnston > wrote: > > Frankly, I am hoping for a bit more constructive feedback and even > collaboration from a committer, specifically Tom, on this one given the > outstanding user complaints received on the topic, our disagreement > regarding fixing it (which motivates the patch to better document and add > tests), and professional courtesy given to a fellow consistent community > contributor. > > > > So, no, making it just go away because one of the dozens of committers > can’t make time to try and make it work doesn’t sit well with me. If a > committer wants to actively reject the patch with an explanation then so be > it. > > I have reviewed this patch and my opinion is that we should reject it, > Thank you for the feedback. > So, the only change here is deleting the word "essentially." I do tend to find this wishy-washy language to be more annoying than the community at large. > I'd suggest keeping > the existing text and adding a sentence like: "Note that the command > can still fail for other reasons; for example, it is an error if a > type with the provided name exists but is not a domain." > I would at least like to see this added in response to the various bug reports that found the shared namespace among types, and the fact that it causes an error, to be a surprise. > The final portion of the patch adds new regression tests. I'm not > going to say that this is completely without merit, because it can be > useful to know if some patch changes the behavior, but I guess I don't > really see a whole lot of value in it, either. > I'd say the Bug # 16492 tests warrant keeping independent of the opinion that demonstrating the complicated interplay between 10+ SQL commands isn't worth the test suite time. I'd say that probably half of the tests are demonstrating non-intuitive behavior from my perspective. The bug test noted above plus the one the demonstration that a table in the non-first schema in a search_path will not prevent a create command from succeeding but will cause a DROP IF EXISTS to error out. Does it need to test all 5 types, probably not, but it should at least test DROP VIEW IF EXISTS test_rel_exists. What about the inherent confusion that having both DROP DOMAIN when DROP TYPE will also drop domains? The doc change for that doesn't really fit into your buckets. It would include: drop_domain.sgml + This duplicates the functionality provided by +but restricts + the type's type to domain. drop_type.sgml + This includes domains, though they can be removed specifically + by using the command. Adding sql-droptype to "See Also" on all the domain related command pages as well. After looking at this again I will say I do agree that the procedural nature of the doc changes for the main issue were probably overkill and a "oh-by-the-way" note as to the fact that we ERROR on a namespace conflict would address that concern in a user-facing way adequately. Looking back and what I went through to put the test script together I don't regret doing the work and feel that someone like myself would benefit from its existence as a whole. It's more useful than a README that would communicate the same information. David J.
Re: Autovacuum on partitioned table (autoanalyze)
On 2021-Aug-10, Alvaro Herrera wrote: > I bring a radical proposal that may be sufficient to close this > particular hole. What if we made partition only affected their > top-level parents to become auto-analyzed, and not any intermediate > ancestors? Any intermediate partitioned partitions could be analyzed > manually if the user wished, and perhaps some reloption could enable > autovacuum to do it (with the caveat that it'd cause multiple sampling > of partitions). I don't yet have a clear picture on how to implement > this, but I'll explore it while waiting for opinions on the idea. So, with this patch (a quick and dirty job) we no longer sample all partitions twice; we no longer propagate the tuple counts to p_0. We don't have stats on p_0 anymore, only on p and on the individual partitions. I didn't move the new #include to a more decent place because 1. that stuff is going to move to partition.c as a new function, including the new include; 2. that new function also needs to read the reloptions for p_0 to allow the user to enable stat acquisition for p_0 with "alter table p_0 set (autovacuum_enabled=1)"; 3. need to avoid reporting ancestors of a partition repeatedly, which forestalls the performance objection about reading reloptions too frequently. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ >From 064bc88bf94b6b4e1bfc16f0639e1500b17b9bf5 Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Tue, 10 Aug 2021 13:05:59 -0400 Subject: [PATCH] Propagate counts up only to topmost ancestor Ignore intermediate partitions, to avoid redundant sampling of partitions. If needed, those intermediate partitions can be analyzed manually. --- src/backend/postmaster/pgstat.c | 21 - src/include/pgstat.h| 9 +++-- 2 files changed, 23 insertions(+), 7 deletions(-) diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c index 1b54ef74eb..a003966cc8 100644 --- a/src/backend/postmaster/pgstat.c +++ b/src/backend/postmaster/pgstat.c @@ -1684,6 +1684,7 @@ pgstat_report_analyze(Relation rel, * counts from the partition to its ancestors. This is necessary so that * other processes can decide whether to analyze the partitioned tables. */ +#include "utils/lsyscache.h" void pgstat_report_anl_ancestors(Oid relid) { @@ -1700,19 +1701,25 @@ pgstat_report_anl_ancestors(Oid relid) foreach(lc, ancestors) { Oid ancestor = lfirst_oid(lc); + bool ispartition; - msg.m_ancestors[msg.m_nancestors] = ancestor; - if (++msg.m_nancestors >= PGSTAT_NUM_ANCESTORENTRIES) + ispartition = get_rel_relispartition(ancestor); + + msg.m_ancestors[msg.m_nancestors].m_ancestor_id = ancestor; + msg.m_ancestors[msg.m_nancestors].m_propagate_up = !ispartition; + msg.m_nancestors++; + + if (msg.m_nancestors >= PGSTAT_NUM_ANCESTORENTRIES) { pgstat_send(&msg, offsetof(PgStat_MsgAnlAncestors, m_ancestors[0]) + - msg.m_nancestors * sizeof(Oid)); + msg.m_nancestors * sizeof(PgStat_AnlAncestor)); msg.m_nancestors = 0; } } if (msg.m_nancestors > 0) pgstat_send(&msg, offsetof(PgStat_MsgAnlAncestors, m_ancestors[0]) + - msg.m_nancestors * sizeof(Oid)); + msg.m_nancestors * sizeof(PgStat_AnlAncestor)); list_free(ancestors); } @@ -5415,9 +5422,13 @@ pgstat_recv_anl_ancestors(PgStat_MsgAnlAncestors *msg, int len) for (int i = 0; i < msg->m_nancestors; i++) { - Oid ancestor_relid = msg->m_ancestors[i]; + Oid ancestor_relid; PgStat_StatTabEntry *ancestor; + if (!msg->m_ancestors[i].m_propagate_up) + continue; + + ancestor_relid = msg->m_ancestors[i].m_ancestor_id; ancestor = pgstat_get_tab_entry(dbentry, ancestor_relid, true); ancestor->changes_since_analyze += tabentry->changes_since_analyze - tabentry->changes_since_analyze_reported; diff --git a/src/include/pgstat.h b/src/include/pgstat.h index 2068a68a5f..46ef88e73b 100644 --- a/src/include/pgstat.h +++ b/src/include/pgstat.h @@ -438,9 +438,14 @@ typedef struct PgStat_MsgAnalyze *analyze counters * -- */ +typedef struct PgStat_AnlAncestor +{ + Oid m_ancestor_id; + bool m_propagate_up; +} PgStat_AnlAncestor; #define PGSTAT_NUM_ANCESTORENTRIES\ ((PGSTAT_MSG_PAYLOAD - sizeof(Oid) - sizeof(Oid) - sizeof(int)) \ - / sizeof(Oid)) + / sizeof(PgStat_AnlAncestor)) typedef struct PgStat_MsgAnlAncestors { @@ -448,7 +453,7 @@ typedef struct PgStat_MsgAnlAncestors Oid m_databaseid; Oid m_tableoid; int m_nancestors; - Oid m_ancestors[PGSTAT_NUM_ANCESTORENTRIES]; + PgStat_AnlAncestor m_ancestors[PGSTAT_NUM_ANCESTORENTRIES]; } PgStat_MsgAnlAncestors; /* -- -- 2.20.1
Re: ECPG bug fix: DECALRE STATEMENT and DEALLOCATE, DESCRIBE
On Tue, Aug 10, 2021 at 1:46 PM Andrew Dunstan wrote: > No, you're right, although I think it's implied. Maybe we need a > statement along these lines: I agree with that, but to me it's more in the scope of what is expected of committers in general. At a very high level. So it's not something that I'd expect to see on the RMT Postgres Wiki page. I would expect to see it on the committers Wiki page, somewhere like that. > If they are fine by you then I accept that. After all, the reason we > want you to deal with this is not only that you made the original commit > but because you're the expert in this area. +1. Nobody questioned the original commit, so it would be premature (if not totally arbitrary) to change our approach now, at the first sign of trouble. To the best of my knowledge there is no special risk with applying this patch to address the behavioral inconsistencies, nor is there any known special risk with any other fix. Including even deciding to *not* fix the inconsistency in Postgres 14 based on practical considerations -- for all I know Michael might be perfectly justified in interpreting the patch as new feature work that's out of scope now. I don't feel qualified to even offer an opinion. -- Peter Geoghegan
Re: add operator ^= to mean not equal (like != and <>)
Hi, On Tue, Aug 10, 2021 at 11:13:03AM +0200, Daniel Gustafsson wrote: > > On 10 Aug 2021, at 11:10, Andreas Karlsson wrote: > > What is he reason you want to add ^= is there any other databases > > which uses ^= for inequality? > > I assume it's because of Oracle compatibility which AFAIK is the only > database supporting ^=. DB2 also supports (or supported) it, but it's deprecated: https://www.ibm.com/docs/en/db2/9.7?topic=predicates-basic-predicate We encountered it at least in one customer setting, so we added it to db2fce: https://github.com/credativ/db2fce/blob/master/db2fce.sql#L588 Michael -- Michael Banck Projektleiter / Senior Berater Tel.: +49 2166 9901-171 Fax: +49 2166 9901-100 Email: michael.ba...@credativ.de credativ GmbH, HRB Mönchengladbach 12080 USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Sascha Heuer, Geoff Richardson, Peter Lilley Unser Umgang mit personenbezogenen Daten unterliegt folgenden Bestimmungen: https://www.credativ.de/datenschutz
Re: add operator ^= to mean not equal (like != and <>)
On 10/08/21 8:27 pm, 孙诗浩(思才) wrote: Hi everyone, I am doding some jobs in postgres. I want to add "^=" like "!=" and "<>". One problem is that '^' & '^=' is already used as the exclusive OR operator in programming languages such as: C, Java, JavaScript, and Python. See: https://www.tutorialspoint.com/java/java_basic_operators.htm https://www.tutorialspoint.com/cprogramming/c_operators.htm https://docs.python.org/3/library/operator.html https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Expressions_and_Operators Please don't confuse people unnecessarily! Cheers, Gavin
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
On Thu, Aug 5, 2021 at 6:20 AM Paul Guo wrote: > Rebased. The commit message for 0001 is not clear enough for me to understand what problem it's supposed to be fixing. The code comments aren't really either. They make it sound like there's some problem with copying symlinks but mostly they just talk about callbacks, which doesn't really help me understand what problem we'd have if we just didn't commit this (or reverted it later). I am not really convinced by Álvaro's claim that 0004 is a "fix"; I think I'd call it an improvement. But either way I agree that could just be committed. I haven't analyzed 0002 and 0003 yet. -- Robert Haas EDB: http://www.enterprisedb.com
Re: ECPG bug fix: DECALRE STATEMENT and DEALLOCATE, DESCRIBE
On 8/10/21 2:16 PM, Michael Meskes wrote: >> Agreed. How is this, which already exists? >> >> https://wiki.postgresql.org/wiki/Release_Management_Team > That I know, but I don't think it covers the issues we, or I, had up- > thread. Or do I miss something? No, you're right, although I think it's implied. Maybe we need a statement along these lines: Committers are responsible for the resolution of open items that relate to commits they have made. Action needs to be taken in a timely fashion, and if there is any substantial delay in dealing with an item the committer should provide a date by which they expect action to be completed. The RMT will follow up where these requirements are not being complied with. > > Speaking of RMT, Andrew, Michael, Peter, will you make the final > decision whether we commit Kuroda-san's patches? > > They are fine by me. Another pair of eyes would be nice, though. maybe > you could have another look, Horiguchi-san? > If they are fine by you then I accept that. After all, the reason we want you to deal with this is not only that you made the original commit but because you're the expert in this area. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: How is this possible "publication does not exist"
Reviving this thread 2021-08-10 19:05:09.096 UTC [3738] LOG: logical replication apply worker for subscription "sub_mycluster_alltables" has started 2021-08-10 19:05:09.107 UTC [3739] LOG: logical replication table synchronization worker for subscription "sub_mycluster_alltables", table "t_random" has started 2021-08-10 19:05:12.222 UTC [3739] LOG: logical replication table synchronization worker for subscription "sub_mycluster_alltables", table "t_random" has finished 2021-08-10 19:05:14.806 UTC [3738] ERROR: could not receive data from WAL stream: ERROR: publication "sub_mycluster_alltables" does not exist CONTEXT: slot "sub_mycluster_alltables", output plugin "pgoutput", in the change callback, associated LSN 0/4015DF0 2021-08-10 19:05:14.811 UTC [175] LOG: background worker "logical replication worker" (PID 3738) exited with exit code 1 select * from pg_publication; -[ RECORD 1 ]+ oid | 16415 pubname | sub_mycluster_alltables pubowner | 10 puballtables | t pubinsert| t pubupdate| t pubdelete| t pubtruncate | t select * from pg_replication_slots; -[ RECORD 1 ]---+ slot_name | mycluster_cjvq_68cf55677c_6vgcf plugin | slot_type | physical datoid | database| temporary | f active | t active_pid | 433 xmin| catalog_xmin| restart_lsn | 0/D00 confirmed_flush_lsn | -[ RECORD 2 ]---+ slot_name | sub_mycluster_alltables plugin | pgoutput slot_type | logical datoid | 16395 database| mycluster temporary | f active | t active_pid | 8799 xmin| catalog_xmin| 500 restart_lsn | 0/40011C0 confirmed_flush_lsn | 0/40011C0 I'm at a loss as to where to even look at this point. Dave
Re: Worth using personality(ADDR_NO_RANDOMIZE) for EXEC_BACKEND on linux?
On Wed, Aug 11, 2021 at 2:12 AM Andres Freund wrote: > On Tue, Aug 10, 2021, at 15:19, Thomas Munro wrote: > > Yeah, make check always fails for me on macOS 11. With the attached > > experimental hack, it fails only occasionally (1 in 8 runs or so). I > > don't know why. > > I suspect you'd need to use the hack in pg_ctl to make it reliable. The > layout of normally stayed position independent postmaster can be incompatible > with the non ASLR spawned child. Yeah, but the patch already changes both pg_ctl.c and postmaster.c.
Re: DROP relation IF EXISTS Docs and Tests - Bug Fix
On Tue, Mar 9, 2021 at 10:09 AM David G. Johnston wrote: > Frankly, I am hoping for a bit more constructive feedback and even > collaboration from a committer, specifically Tom, on this one given the > outstanding user complaints received on the topic, our disagreement regarding > fixing it (which motivates the patch to better document and add tests), and > professional courtesy given to a fellow consistent community contributor. > > So, no, making it just go away because one of the dozens of committers can’t > make time to try and make it work doesn’t sit well with me. If a committer > wants to actively reject the patch with an explanation then so be it. I have reviewed this patch and my opinion is that we should reject it, because in my opinion, the documentation changes are not improvements, and there is no really clear need for the regression test additions. According to my view of the situation, there are three kinds of changes in this patch. The first set of hunks make minor wording adjustments. Typical is this: CREATE DOMAIN creates a new domain. A domain is - essentially a data type with optional constraints (restrictions on + a data type with optional constraints (restrictions on the allowed set of values). So, the only change here is deleting the word "essentially." Now, it's possible that if someone different had written the original text, they might have chosen to leave that word out. Personally, I would have chosen to include it, but it's a judgement call. However, I find it extremely difficult to imagine that anybody will be confused because of the presence of that word there. - The domain name must be unique among the types and domains existing - in its schema. + The domain name must be unique among all types (not just domains) + existing in its schema. Similarly here. It is arguable which way the text reads better, but the stated purpose of the patch is to make the behavior more clear, and I cannot imagine someone reading the existing text and getting confused, and then reading the patched text and being not confused. - Do not throw an error if the domain does not exist. A notice is issued - in this case. + This parameter instructs PostgreSQL to search + for the first instance of any type with the provided name. + If no type is found a notice is issued and the command ends. + If a type is found then one of two things will happen: + if the type is a domain it is dropped, otherwise the command fails. This is the second kind of hunk that is present in this patch. There are a whole bunch that are very similar, adjusting the documentation for various object types. I appreciate that it does confuse users sometimes that a DROP IF NOT EXISTS command can still fail for some other reason, so maybe we should try to clarify that in some way, but I find this explanation to be too complex and technical to be helpful. If we feel it's necessary to be more clear here, I'd suggest keeping the existing text and adding a sentence like: "Note that the command can still fail for other reasons; for example, it is an error if a type with the provided name exists but is not a domain." I feel that it's unnecessary to try to clarify what the behavior of the command is relative to search_path, but if it were agreed that we need to do so, I still would not like this way of doing it. First, you never actually use the term "search_path". I expect a lot of people would be confused by the reference to searching "for the first instance" because they won't know what is being searched. Second, this entire bit of text is inside the description of "IF NOT EXISTS" but the behavior in question is mostly stuff that applies whether or not "IF NOT EXISTS" is specified. Moreover, it's not only not specific to IF NOT EXISTS, it's not even specific to this particular command. The behavior of looking through the search_path for the first instance of some object type is one that we use practically everywhere in all kinds of situations. I think we are going to go crazy if we try to re-explain that in every place where it might be relevant. The final portion of the patch adds new regression tests. I'm not going to say that this is completely without merit, because it can be useful to know if some patch changes the behavior, but I guess I don't really see a whole lot of value in it, either. It's easy to end up with a ton of tests that take up a little bit of time every time someone runs 'make check' but only ever find behavior changes that the developer was intentionally trying to make. I think it's possible that these changes would end up falling into that category. I wouldn't complaining about them getting committed, or about committing them myself, if there were a chorus of people convinced that they were worth having, but there isn't, and I don't find them valuable enough myself to justify a commit. -- Robert Haas EDB: http://www.enterprisedb.com
Re: ECPG bug fix: DECALRE STATEMENT and DEALLOCATE, DESCRIBE
> Agreed. How is this, which already exists? > > https://wiki.postgresql.org/wiki/Release_Management_Team That I know, but I don't think it covers the issues we, or I, had up- thread. Or do I miss something? Speaking of RMT, Andrew, Michael, Peter, will you make the final decision whether we commit Kuroda-san's patches? They are fine by me. Another pair of eyes would be nice, though. maybe you could have another look, Horiguchi-san? Michael -- Michael Meskes Michael at Fam-Meskes dot De Michael at Meskes dot (De|Com|Net|Org) Meskes at (Debian|Postgresql) dot Org
Re: Corruption during WAL replay
On Thu, Mar 4, 2021 at 10:01 PM Kyotaro Horiguchi wrote: > The patch assumed that CHKPT_START/COMPLETE barrier are exclusively > used each other, but MarkBufferDirtyHint which delays checkpoint start > is called in RelationTruncate while delaying checkpoint completion. > That is not a strange nor harmful behavior. I changed delayChkpt to a > bitmap integer from an enum so that both barrier are separately > triggered. > > I'm not sure this is the way to go here, though. This fixes the issue > of a crash during RelationTruncate, but the issue of smgrtruncate > failure during RelationTruncate still remains (unless we treat that > failure as PANIC?). I like this patch. As I understand it, we're currently cheating by allowing checkpoints to complete without necessarily flushing all of the pages that were dirty at the time we fixed the redo pointer out to disk. We think this is OK because we know that those pages are going to get truncated away, but it's not really OK because when the system starts up, it has to replay WAL starting from the checkpoint's redo pointer, but the state of the page is not the same as it was at the time when the redo pointer was the end of WAL, so redo fails. In the case described in http://postgr.es/m/byapr06mb63739b2692dc6dbb3c5f186cab...@byapr06mb6373.namprd06.prod.outlook.com modifications are made to the page before the redo pointer is fixed and those changes never make it to disk, but the truncation also never makes it to the disk either. With this patch, that can't happen, because no checkpoint can intervene between when we (1) decide we're not going to bother writing those dirty pages and (2) actually truncate them away. So either the pages will get written as part of the checkpoint, or else they'll be gone before the checkpoint completes. In the latter case, I suppose redo that would have modified those pages will just be skipped, thus dodging the problem. In RelationTruncate, I suggest that we ought to clear the delay-checkpoint flag before rather than after calling FreeSpaceMapVacuumRange. Since the free space map is not fully WAL-logged, anything we're doing there should be non-critical. Also, I think it might be better if MarkBufferDirtyHint stays closer to the existing coding and just uses a Boolean and an if-test to decide whether to clear the bit, instead of inventing a new mechanism. I don't really see anything wrong with the new mechanism, but I think it's better to keep the patch minimal. As you say, this doesn't fix the problem that truncation might fail. But as Andres and Sawada-san said, the solution to that is to get rid of the comments saying that it's OK for truncation to fail and make it a PANIC. However, I don't think that change needs to be part of this patch. Even if we do that, we still need to do this. And even if we do this, we still need to do that. -- Robert Haas EDB: http://www.enterprisedb.com
Re: ECPG bug fix: DECALRE STATEMENT and DEALLOCATE, DESCRIBE
On Tue, Aug 10, 2021 at 08:05:29PM +0200, Michael Meskes wrote: > > I think my point was that committers should be required to understand > > the RMT process, and if we need to communicate that better, let's do > > that. I don't think it should be the responsibility of RMT members > > to > > communicate the RMT process every time they communicate with someone, > > unless someone asks. > > Completely agreed, my point was that documenting the process to some > extend would be helpful. For obvious reasons I'm the wrong person to do > that, though. :) Agreed. How is this, which already exists? https://wiki.postgresql.org/wiki/Release_Management_Team -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Re: ECPG bug fix: DECALRE STATEMENT and DEALLOCATE, DESCRIBE
> I think my point was that committers should be required to understand > the RMT process, and if we need to communicate that better, let's do > that. I don't think it should be the responsibility of RMT members > to > communicate the RMT process every time they communicate with someone, > unless someone asks. Completely agreed, my point was that documenting the process to some extend would be helpful. For obvious reasons I'm the wrong person to do that, though. :) Michael -- Michael Meskes Michael at Fam-Meskes dot De Michael at Meskes dot (De|Com|Net|Org) Meskes at (Debian|Postgresql) dot Org
Re: Support reset of Shared objects statistics in "pg_stat_reset" function
On Tue, 10 Aug 2021 at 22:32, Mahendra Singh Thalor wrote: > > On Tue, 10 Aug 2021 at 21:53, Sadhuprasad Patro wrote: > > > > > As of now, we are adding handling inside pg_stat_reset for shared > > > tables but I think we can add a new function with the name of > > > pg_stat_reset_shared_tables to reset stats for all the shared tables. > > > New function will give more clarity to the users also. We already have > > > a pg_stat_reset_shared(text) function for "archiver", "bgwriter", or > > > "wal". > > > > > > Thoughts? > > > > In my opinion, it is better to extend the functionality of > > "pg_stat_reset" call because a new function just to reset shared table > > data may not be needed. Where we already have a reset shared function > > "pg_stat_reset_shared" in place. > > > > All of applicable comments are implemented in the patch below: > > > > Hi Sadhu, > I can see that you forgot to include "catalog.h" so I am getting below > warning: >> >> pgstat.c: In function ‘pgstat_recv_resetsinglecounter’: >> pgstat.c:5216:7: warning: implicit declaration of function >> ‘IsSharedRelation’; did you mean ‘InvalidRelation’? >> [-Wimplicit-function-declaration] >> if (!IsSharedRelation(msg->m_objectid)) >>^~~~ >>InvalidRelation > > > 1) Please add the .h file. > --- a/src/backend/postmaster/pgstat.c > +++ b/src/backend/postmaster/pgstat.c > @@ -38,6 +38,7 @@ > #include "access/transam.h" > #include "access/twophase_rmgr.h" > #include "access/xact.h" > +#include "catalog/catalog.h" > #include "catalog/partition.h" > > 2) > @@ -1442,6 +1443,10 @@ pgstat_reset_counters(void) > pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_RESETCOUNTER); > msg.m_databaseid = MyDatabaseId; > pgstat_send(&msg, sizeof(msg)); > + > + /* Reset the stat counters for Shared tables also. */ > + msg.m_databaseid = InvalidOid; > + pgstat_send(&msg, sizeof(msg)); > > I will look into this part again. If pgstat_send forks a new process, then I > think, it will be better if we can reset stats in pgstat_recv_resetcounter > for shared tables also because shared tables are not much in counting so it > will be good if we reset in one function only. I will debug this part more > and will see. > I checked this and found that we already have one process "stats collector" and in v02 patch, we are sending requests to collect stats two times. I don't know how costly one call is but I feel that we can avoid the 2nd call by keeping handling of the shared tables into pgstat_recv_resetcounter. Thoughts? -- Thanks and Regards Mahendra Singh Thalor EnterpriseDB: http://www.enterprisedb.com
Re: ECPG bug fix: DECALRE STATEMENT and DEALLOCATE, DESCRIBE
On Tue, Aug 10, 2021 at 09:37:19AM +0200, Michael Meskes wrote: > > I do think all committers need to understand the role of the RMT so > > they > > can respond appropriately. Do we need to communicate this better? > > Being the one who assumed a different procedure, yes please. :) I think my point was that committers should be required to understand the RMT process, and if we need to communicate that better, let's do that. I don't think it should be the responsibility of RMT members to communicate the RMT process every time they communicate with someone, unless someone asks. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Re: Support reset of Shared objects statistics in "pg_stat_reset" function
On Tue, 10 Aug 2021 at 21:53, Sadhuprasad Patro wrote: > > > As of now, we are adding handling inside pg_stat_reset for shared > > tables but I think we can add a new function with the name of > > pg_stat_reset_shared_tables to reset stats for all the shared tables. > > New function will give more clarity to the users also. We already have > > a pg_stat_reset_shared(text) function for "archiver", "bgwriter", or > > "wal". > > > > Thoughts? > > In my opinion, it is better to extend the functionality of > "pg_stat_reset" call because a new function just to reset shared table > data may not be needed. Where we already have a reset shared function > "pg_stat_reset_shared" in place. > > All of applicable comments are implemented in the patch below: > Hi Sadhu, I can see that you forgot to include "catalog.h" so I am getting below warning: > pgstat.c: In function ‘pgstat_recv_resetsinglecounter’: > pgstat.c:5216:7: warning: implicit declaration of function > ‘IsSharedRelation’; did you mean ‘InvalidRelation’? > [-Wimplicit-function-declaration] > if (!IsSharedRelation(msg->m_objectid)) >^~~~ >InvalidRelation > 1) Please add the .h file. --- a/src/backend/postmaster/pgstat.c +++ b/src/backend/postmaster/pgstat.c @@ -38,6 +38,7 @@ #include "access/transam.h" #include "access/twophase_rmgr.h" #include "access/xact.h" +#include "catalog/catalog.h" #include "catalog/partition.h" 2) @@ -1442,6 +1443,10 @@ pgstat_reset_counters(void) pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_RESETCOUNTER); msg.m_databaseid = MyDatabaseId; pgstat_send(&msg, sizeof(msg)); + + /* Reset the stat counters for Shared tables also. */ + msg.m_databaseid = InvalidOid; + pgstat_send(&msg, sizeof(msg)); I will look into this part again. If pgstat_send forks a new process, then I think, it will be better if we can reset stats in pgstat_recv_resetcounter for shared tables also because shared tables are not much in counting so it will be good if we reset in one function only. I will debug this part more and will see. -- Thanks and Regards Mahendra Singh Thalor EnterpriseDB: http://www.enterprisedb.com
Re: Support reset of Shared objects statistics in "pg_stat_reset" function
> As of now, we are adding handling inside pg_stat_reset for shared > tables but I think we can add a new function with the name of > pg_stat_reset_shared_tables to reset stats for all the shared tables. > New function will give more clarity to the users also. We already have > a pg_stat_reset_shared(text) function for "archiver", "bgwriter", or > "wal". > > Thoughts? In my opinion, it is better to extend the functionality of "pg_stat_reset" call because a new function just to reset shared table data may not be needed. Where we already have a reset shared function "pg_stat_reset_shared" in place. All of applicable comments are implemented in the patch below: Thanks & Regards SadhuPrasad EnterpriseDB: http://www.enterprisedb.com v2-0001-DB-1318-pg_stat_reset-and-pg_stat_reset_single_ta.patch Description: Binary data
Re: Support reset of Shared objects statistics in "pg_stat_reset" function
> 3) I am attaching a .sql file. We can add similar types of test cases for > shared tables. > Ex: first reset stats for all shared tables using pg_stat_reset and then > check stats for all shared tables(all should be zero) Adding a new test case for this looks difficult as results are not consistent when the testcase added to automation. There may be some possible delay as collector process has to reset the statistics in background... So Not adding any testcase and I think for the same reason there are no existing test case for this stat reset. Thanks & Regards SadhuPrasad EnterpriseDB: http://www.enterprisedb.com
Re: Quirk of pg_temp schemas ...
Greg Stark writes: > While fixing up a patch I had dealing with temporary tables I noticed > a bit of a quirk with pg_temp schemas. Namely that we have no actual > meta data marking them as temporary aside from their names. And we > don't do anything to protect that -- superuser can happily issue ALTER > SCHEMA RENAME to rename it to a name that doesn't match pg_temp*. This seems to me to be not very different from the 1001 other ways that a superuser can break a database. If *non* superusers could rename those schemas then I'd agree there's a problem to be solved. regards, tom lane
Re: [PATCH] test/ssl: rework the sslfiles Makefile target
Michael Paquier writes: > Regarding 0002, I am not sure. Even if this reduces a lot of > duplication, which is really nice, enforcing .SECONDARY to not trigger > with a change impacting Makefile.global.in does not sound very > appealing to me in the long-run, TBH. Yeah, I don't like that change either. It would be one thing if we had several places in which suppressing .SECONDARY was useful, but if there's only one then it seems better to design around it. As a concrete example of why this might be a bad idea, how sure are you that noplace in Makefile.global depends on that behavior? regards, tom lane
Re: when the startup process doesn't (logging startup delays)
On Tue, Aug 10, 2021 at 9:28 AM Nitin Jadhav wrote: > Please find the updated patch attached. I think this is getting close. The output looks nice. However, I still see a bunch of issues. You mentioned previously that you would add documentation, but I do not see it here. startup_progress_timer_expired should be declared as sig_atomic_t like we do in other cases (see interrupt.c). The default value of the new GUC is 10s in postgresql.conf.sample, but -1 in guc.c. They should both be 10s, and the one in postgresql.conf.sample should be commented out. I suggest making the GUC GUC_UNIT_MS rather than GUC_UNIT_S, but expressing the default in postgresl.conf.sample as 10s rather than 1ms. I tried values measured in milliseconds just for testing purposes and didn't initially understand why it wasn't working. I don't think there's any reason it can't work. I would prefer to see log_startup_progress_interval declared and defined in startup.c/startup.h rather than guc.c/guc.h. There's no precedent in the tree for the use of ##__VA_ARGS__. On my system it seems to work fine if I just leave out the ##. Any reason not to do that? Two of the declarations in startup.h forgot the leading "extern", while the other two that are right next to them have it, matching project style. I'm reasonably happy with most of the identifier names now, but I think init_startup_progress() is confusing. The reason I think that is that we call it more than once, which is not really what people think about when they think of an "init" function, I think. It's not initializing the startup progress facility in general; it's preparing for the next phase of startup progress reporting. How about renaming it to begin_startup_progress_phase()? And then startup_process_op_start_time could be startup_process_phase_start_time to match. SyncDataDirectory() potentially walks over the data directory three times: once to call do_syncfs(), once to call pre_sync_fname(), and once to call datadir_fsync_fname(). With this patch, the first and third will emit progress messages if the operation runs long, but the second will not. I think they should all be treated the same. Also, the locations where you've inserted calls to init_startup_progress() don't really make it clear with what code that's associated. I'd put them *immediately* before the call to do_syncfs() or walkdir(). Remember that PostgreSQL comments are typically written "telegraph style," so function comments should say "Does whatever" not "It does whatever". Most of them are correct, but there's one sentence you need to fix. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Why does the owner of a publication need CREATE privileges on the database?
Amit Kapila writes: > On Tue, Jul 27, 2021 at 11:29 PM Mark Dilger > wrote: >> The documentation for ALTER PUBLICATION ... OWNER TO ... claims the new >> owner must have CREATE privilege on the database, though superuser can >> change the ownership in spite of this restriction. No explanation is given >> for this requirement. > I am not aware of the original thought process behind this but current > behavior seems reasonable because if users need to have CREATE > privilege on the database while Create Publication, the same should be > true while we change the owner to a new owner. I think that for most (all?) forms of ALTER, we say that you need the same privileges as you would need to drop the existing object and create a new one with the new properties. From the standpoint of the privilege system, ALTER is just a shortcut for that. regards, tom lane
Re: RFC: Logging plan of the running query
On 2021/08/10 21:22, torikoshia wrote: I have updated the patch in this way. Thanks for updating the patch! In this patch, getting the plan to the DO statement is as follows. Looks good to me. Any thoughts? + ereport(LOG_SERVER_ONLY, + (errmsg("plan of the query running on backend with PID %d is:\n%s", + MyProcPid, es->str->data), +errhidestmt(true))); Shouldn't we hide context information by calling errhidecontext(true)? While "make installcheck" regression test was running, I repeated executing pg_log_current_query_plan() and got the failure of join_hash test with the following diff. This means that pg_log_current_query_plan() could cause the query that should be completed successfully to fail with the error. Isn't this a bug? I *guess* that the cause of this issue is that ExplainNode() can call InstrEndLoop() more than once unexpectedly. -- $$ select count(*) from simple r join simple s using (id); $$); - initially_multibatch | increased_batches ---+--- - f| f -(1 row) - +ERROR: InstrEndLoop called on running node +CONTEXT: PL/pgSQL function hash_join_batches(text) line 6 at FOR over EXECUTE statement rollback to settings; -- parallel with parallel-oblivious hash join savepoint settings; @@ -687,11 +684,9 @@ left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1; $$); - multibatch - - t -(1 row) - +ERROR: InstrEndLoop called on running node +CONTEXT: parallel worker +PL/pgSQL function hash_join_batches(text) line 6 at FOR over EXECUTE statement rollback to settings; -- single-batch with rescan, parallel-aware savepoint settings; -- Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: Another regexp performance improvement: skip useless paren-captures
> On Aug 9, 2021, at 7:20 PM, Tom Lane wrote: > > So I took another look at the code, and it doesn't seem that hard > to make it act this way. The attached passes regression, but > I've not beat on it with random strings. I expect to get back around to testing this in a day or so. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Another regexp performance improvement: skip useless paren-captures
I wrote: > So AFAICS Perl is acting in the way I'm attributing to POSIX. > But maybe we should actually read POSIX ... I went to look at the POSIX spec, and was reminded that it lacks backrefs altogether. (POSIX specifies the "BRE" and "ERE" regex flavors as described in our docs, but not "ARE".) So there's no help to be had there. The fact that Perl doesn't throw an error is probably the most useful precedent available. regards, tom lane
Quirk of pg_temp schemas ...
While fixing up a patch I had dealing with temporary tables I noticed a bit of a quirk with pg_temp schemas. Namely that we have no actual meta data marking them as temporary aside from their names. And we don't do anything to protect that -- superuser can happily issue ALTER SCHEMA RENAME to rename it to a name that doesn't match pg_temp*. The rest of the system then treats it as a perfectly normal schema that just happens to contain temporary tables postgres=# create temporary table t(i integer); CREATE TABLE postgres=# \d t Table "pg_temp_4.t" Column | Type | Collation | Nullable | Default +-+---+--+- i | integer | | | postgres=# alter schema pg_temp_4 rename to fnord; ALTER SCHEMA postgres=# \d t Table "fnord.t" Column | Type | Collation | Nullable | Default +-+---+--+- i | integer | | | We could, I suppose, say this is just not a problem. Most, perhaps all, of the existing code doesn't seem bothered by this situation. But it seems a bit fragile. The worst side effect I've found is that autovacuum won't drop orphaned temp tables because it can't check if the backend is still alive connected to them. A related point is that super-user is allowed to drop the temp schema. If super-user does do this we still allow new temporary tables to be created in the now-nonexistent schema resulting in tables that don't print correctly: postgres=# drop schema pg_temp_3 cascade; NOTICE: drop cascades to table t3 DROP SCHEMA postgres=# create temporary table t4( i integer); CREATE TABLE postgres=# \d t4 Table ".t4" Column | Type | Collation | Nullable | Default +-+---+--+- i | integer | | | I suspect there are sites that will try to fprintf NULL using %s here which on glibc prints "(null)" but may crash elsewhere... At the very least we should probably disallow creating temporary tables if the temporary schema has been dropped. That's just creating broken references in the catalog tables. Alternately we could rig something so that dropping the schema unsets myTempNamespace. The real fix seems to me adding a "istemp" and "backendid" columns to pg_namespace and not depending on the actual name of the schema to store this info in. But I guess the current scheme has worked fine for ages so I dunno. Perhaps the global temp table work will have to invest in this area. -- greg
Re: Postgres perl module namespace
On 8/10/21 10:40 AM, Tom Lane wrote: > Andrew Dunstan writes: >> I will undertake to do the work, once we get the bike-shedding part done. >> I'll kick that off by suggesting we move all of these to the namespace >> PgTest, and rename TestLib to Utils, so instead of >> use TestLib; >> use PostgresNode; >> you would say >> use PgTest::Utils; >> use PgTest::PostgresNode; > Using both "Pg" and "Postgres" seems a bit inconsistent. > Maybe "PgTest::PgNode"? > > More generally, I've never thought that "Node" was a great name > here; it's a very vague term. While we're renaming, maybe we > could change it to "PgTest::PgCluster" or the like? > > I can live with that. I guess to be consistent we would also rename PostgresVersion to PgVersion cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Add option --drop-cascade for pg_dump/restore
On Fri, Jul 16, 2021 at 9:40 AM Tom Lane wrote: > That would require pg_restore to try to edit the DROP commands during > restore, which sounds horribly fragile. I'm inclined to think that > supporting this option only during initial dump is safer. > Safer, but not nearly as useful. Maybe see what the OP (Wu Haotian) can come up with as a first implementation? Cheers, Greg
Re: Avoid stuck of pbgench due to skipped transactions
Apologies, just saw this. I found no problems, those "failures" were just me missing checkboxes on the commitfest interface. +1 on the patch. Cheers, Greg
Re: Unresolved repliaction hang and stop problem.
sob., 19 cze 2021 o 12:14 Amit Kapila napisał(a): > > On Fri, Jun 18, 2021 at 11:22 AM Kyotaro Horiguchi > wrote: > > > > At Thu, 17 Jun 2021 12:56:42 -0400, Alvaro Herrera > > wrote in > > > On 2021-Jun-17, Kyotaro Horiguchi wrote: > > > > > > > I don't see a call to hash_*seq*_search there. Instead, I see one in > > > > ReorderBufferCheckMemoryLimit(). > > > > > > Doh, of course -- I misread. > > > > > > ReorderBufferCheckMemoryLimit is new in pg13 (cec2edfa7859) so now at > > > least we have a reason why this workload regresses in pg13 compared to > > > earlier releases. > > > > > > Looking at the code, it does seem that increasing the memory limit as > > > Amit suggests might solve the issue. Is that a practical workaround? > > > > I believe so generally. I'm not sure about the op, though. > > > > Just increasing the working memory to certain size would solve the > > problem. There might be a potential issue that it might be hard like > > this case for users to find out that the GUC works for their issue (if > > actually works). pg_stat_replicatoin_slots.spill_count / spill_txns > > could be a guide for setting logical_decoding_work_mem. Is it worth > > having additional explanation like that for the GUC variable in the > > documentation? > > > > I don't see any harm in doing that but note that we can update it only > for PG-14. The view pg_stat_replicatoin_slots is introduced in PG-14. > > -- > With Regards, > Amit Kapila. We increased logical_decoding_work_mem for our production database from 64 to 192 MB and it looks like the issue still persists. The frequency with which replication hangs has remained the same. Do you need any additional perf reports after our change? -- Regards, Hubert Klasa.
Re: Postgres perl module namespace
On 2021-Aug-10, Andrew Dunstan wrote: > I'll kick that off by suggesting we move all of these to the namespace > PgTest, and rename TestLib to Utils, so [...] you would say > > use PgTest::Utils; > use PgTest::PostgresNode; WFM. -- Álvaro Herrera Valdivia, Chile — https://www.EnterpriseDB.com/ "Hay que recordar que la existencia en el cosmos, y particularmente la elaboración de civilizaciones dentro de él no son, por desgracia, nada idílicas" (Ijon Tichy)
Re: Postgres perl module namespace
Andrew Dunstan writes: > I will undertake to do the work, once we get the bike-shedding part done. > I'll kick that off by suggesting we move all of these to the namespace > PgTest, and rename TestLib to Utils, so instead of > use TestLib; > use PostgresNode; > you would say > use PgTest::Utils; > use PgTest::PostgresNode; Using both "Pg" and "Postgres" seems a bit inconsistent. Maybe "PgTest::PgNode"? More generally, I've never thought that "Node" was a great name here; it's a very vague term. While we're renaming, maybe we could change it to "PgTest::PgCluster" or the like? regards, tom lane
Re: [BUG]Update Toast data failure in logical replication
On 2021-Jul-30, Amit Kapila wrote: > I was thinking of using toast pointer but that won't work because it > can be different on the subscriber-side. I don't see any better ideas > to fix this issue. This problem seems to be from the time Logical > Replication has been introduced, so adding others (who are generally > involved in this area) to see what they think about this bug? I think > people might not be using toasted columns for Replica Identity due to > which this problem has been reported yet but I feel this is quite a > fundamental issue and we should do something about this. In the evening before going offline a week ago I was looking at this and my conclusion was that this was a legitimate problem: the original implementation is faulty in that the full detoasted value is required to be transmitted in order for downstream to be able to read the value. I am not sure if at the level of logical decoding it is a problem theoretically, but at least for logical replication it is clearly a practical problem. Reading Dilip's last posted patch that day, I had some reservations about the API of ExtractReplicaIdentity. The new argument makes for a very strange to explain behavior "return the key values if they are unchanged, *or* if they are toasted" ... ??? I tried to make sense of that, and tried to find a concept that would make sense to the whole, but couldn't find any obvious angle in the short time I looked at it. I haven't looked at it again. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "El Maquinismo fue proscrito so pena de cosquilleo hasta la muerte" (Ijon Tichy en Viajes, Stanislaw Lem)
Re: Worth using personality(ADDR_NO_RANDOMIZE) for EXEC_BACKEND on linux?
Hi, On Tue, Aug 10, 2021, at 15:19, Thomas Munro wrote: > On Tue, Aug 10, 2021 at 5:43 AM Robert Haas wrote: > > On Mon, Aug 9, 2021 at 1:30 PM Alvaro Herrera > > wrote: > > > How common is to get a failure? I know I've run tests under > > > EXEC_BACKEND and not seen any failures. Not many runs though. > > > > On macOS, failures are extremely common. Sometimes I have to run > > simple tests many times to get even one success. The proposal on the > > table won't help with that problem since it's Linux-specific, but if > > there's any way to do something similar on macOS it would be a _huge_ > > help. > > Yeah, make check always fails for me on macOS 11. With the attached > experimental hack, it fails only occasionally (1 in 8 runs or so). I > don't know why. I suspect you'd need to use the hack in pg_ctl to make it reliable. The layout of normally stayed position independent postmaster can be incompatible with the non ASLR spawned child. Andres
Re: Postgres perl module namespace
On 5/20/21 5:18 PM, Tom Lane wrote: > Andrew Dunstan writes: >> While solving a problem with the Beta RPMs, I noticed that they export >> our perl test modules as capabilities like this: >> [andrew@f34 x86_64]$ rpm -q --provides -p >> postgresql14-devel-14-beta1_PGDG.fc34.x86_64.rpm | grep ^perl >> perl(PostgresNode) >> perl(PostgresVersion) >> perl(RecursiveCopy) >> perl(SimpleTee) >> perl(TestLib) >> I don't think we should be putting this stuff in a global namespace like >> that. We should invent a namespace that's not likely to conflict with >> other people, like, say, 'PostgreSQL::Test' to put these modules. That >> would require moving some code around and adjusting a bunch of scripts, >> but it would not be difficult. Maybe something to be done post-14? > Agreed that we ought to namespace these better. It looks to me like most > of these are several versions old. Given the lack of field complaints, > I'm content to wait for v15 for a fix, rather than treating it as an open > item for v14. So now the dev tree is open for v15 it's time to get back to this item. I will undertake to do the work, once we get the bike-shedding part done. I'll kick that off by suggesting we move all of these to the namespace PgTest, and rename TestLib to Utils, so instead of use TestLib; use PostgresNode; you would say use PgTest::Utils; use PgTest::PostgresNode; cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: replay of CREATE TABLESPACE eats data at wal_level=minimal
On Mon, Aug 9, 2021 at 9:23 PM Noah Misch wrote: > > I don't presently have a specific idea about how to fix this. > > Can't recovery just not delete the directory, create it if doesn't exist, and > be happy if it does exist? Like the attached WIP. If we think it's possible > for a crash during mkdir to leave a directory having the wrong permissions, > adding a chmod would be in order. Oh, yeah, I think that works, actually. I was imagining a few problems here, but I don't think they really exist. The redo routines for files within the directory can't possibly care about having the old files erased for them, since that wouldn't be something that would normally happen, if there were no recent CREATE TABLESPACE involved. And there's code further down to remove and recreate the symlink, just in case. So I think your proposed patch might be all we need. -- Robert Haas EDB: http://www.enterprisedb.com
Re: when the startup process doesn't (logging startup delays)
> I'd really like to avoid this. I don't see why it's necessary. You say > it causes a problem, but you don't explain what problem it causes. > enable_timeout_at() will disable the timer if not already done. I > think all we need to do is set scheduled_startup_progress_timeout = 0 > before calling reset_startup_progress_timeout() in the "init" case and > don't do that for the non-init case. If that's not quite right, maybe > you can work out something that does work. But adding an is_init flag > to a function and having no common code between the is_init = true > case and the is_init = false case is exactly the kind of thing that I > don't want here. I want as much common code as possible. Setting set scheduled_startup_progress_timeout = 0 in the "init" case solves the problem. The problem was if we call init_start_progress() continuously then the first call to reset_startup_progress_timeout() sets the value of scheduled_startup_progress_timeout to "now + interval". Later call to reset_startup_progress_timeout() uses the previously set value of scheduled_startup_progress_timeout which was not correct and it was not behaving as expected. I could see that the first log gets printed far later than the expected interval. > To some degree, we tend to use names_like_this() for low-level > functions and NamesLikeThis() for higher-level functions, but that is > not a very consistent practice. Ok. Thanks for the information. > But this strategy cannot be used if the drift is larger than the > interval. If we are trying to log a message every 1000ms and the timer > doesn't fire for 14ms, we can wait only 986ms the next time. If it > doesn't fire for 140ms, we can wait only 860ms the next time. But if > the timer doesn't fire for 1400ms, we cannot wait for -400ms the next > time. So what should we do? My proposal is to just wait for the > configured interval, 1000ms, essentially giving up on drift > correction. Now you could argue that we ought to just wait for 600ms > in the hopes of making it 2 * 1000ms after the previous status > message, but I'm not sure that really has any value, and it doesn't > seem especially likely to work. The only way timer interrupts are > likely to be that badly delayed is if the system is horrifically > overloaded, and if that's the case the next timer interrupt isn't > likely to fire on schedule anyway. Trying to correct for drift in such > a situation seems more likely to be confusing than to produce any > helpful result. This is what I was trying to convey in case-2. I agree that it is better to consider "now + interval" in such a case instead of trying to adjust the drift. Please find the updated patch attached. On Tue, Aug 10, 2021 at 1:06 AM Robert Haas wrote: > > On Mon, Aug 9, 2021 at 11:20 AM Nitin Jadhav > wrote: > > Modified the reset_startup_progress_timeout() to take the second > > parameter which indicates whether it is called for initialization or > > for resetting. Without this parameter there is a problem if we call > > init_startup_progress() more than one time if there is no call to > > ereport_startup_progress() in between as the code related to disabling > > the timer has been removed. > > I'd really like to avoid this. I don't see why it's necessary. You say > it causes a problem, but you don't explain what problem it causes. > enable_timeout_at() will disable the timer if not already done. I > think all we need to do is set scheduled_startup_progress_timeout = 0 > before calling reset_startup_progress_timeout() in the "init" case and > don't do that for the non-init case. If that's not quite right, maybe > you can work out something that does work. But adding an is_init flag > to a function and having no common code between the is_init = true > case and the is_init = false case is exactly the kind of thing that I > don't want here. I want as much common code as possible. > > > > This makes sense, but I think I'd like to have all the functions in > > > this patch use names_like_this() rather than NamesLikeThis(). > > > > I have changed all the function names accordingly. But I would like to > > know why it should be names_like_this() as there are many functions > > with the format NamesLikeThis(). I would like to understand when to > > use what, just to incorporate in the future patches. > > There is, unfortunately, no hard-and-fast rule, but we want to > maintain as much consistency with the existing style as we can. I > wasn't initially sure what would work best for this particular patch, > but after we committed to a name like ereport_startup_progress() that > to me was a strong hint in favor of using names_like_this() > throughout. It seems impossible to imagine punctuating it as > EreportStartupProgress() or something since that would be wildly > inconsistent with the existing function name, and there seems to be no > good reason why this patch can't be internally consistent. > > To some degree, we tend to use names_like_this() for low-level > functio
Re: Autovacuum on partitioned table (autoanalyze)
On 2021-Aug-09, Andres Freund wrote: > I don't agree. There's a difference between this happening after a manual > ANALYZE on partition roots, and this continuously happening in production > workloads due to auto-analyzes... Hmm. That's not completely untrue. I bring a radical proposal that may be sufficient to close this particular hole. What if we made partition only affected their top-level parents to become auto-analyzed, and not any intermediate ancestors? Any intermediate partitioned partitions could be analyzed manually if the user wished, and perhaps some reloption could enable autovacuum to do it (with the caveat that it'd cause multiple sampling of partitions). I don't yet have a clear picture on how to implement this, but I'll explore it while waiting for opinions on the idea. -- Álvaro Herrera Valdivia, Chile — https://www.EnterpriseDB.com/ "Nadie está tan esclavizado como el que se cree libre no siéndolo" (Goethe)
Re: OpenSSL 3.0.0 compatibility
> On 6 Aug 2021, at 21:17, Tom Lane wrote: > > Daniel Gustafsson writes: >> Until there is an animal running OpenSSL 3.0.0 in the buildfarm I think this >> should be HEAD only. Further down the line we need to support OpenSSL 3 in >> all >> backbranches IMO since they are all equally likely to be compiled against it, >> but not until we can regularly test against it in the farm. > > Works for me. These have now been committed, when OpenSSL 3.0.0 ships and there is coverage in the buildfarm I’ll revisit this for the backbranches. -- Daniel Gustafsson https://vmware.com/
Re: Skipping logical replication transactions on subscriber side
On Tue, Aug 10, 2021 at 3:07 PM Masahiko Sawada wrote: > > I've attached the latest patches that incorporated all comments I got > so far. Please review them. > Some initial review comments on the v6-0001 patch: src/backend/replication/logical/proto.c: (1) + TimestampTz committs; I think it looks better to name "committs" as "commit_ts", and also is more consistent with naming for other member "remote_xid". src/backend/replication/logical/worker.c: (2) To be consistent with all other function headers, should start sentence with capital: "get" -> "Get" + * get string representing LogicalRepMsgType. (3) It looks a bit cumbersome and repetitive to set/update the members of apply_error_callback_arg in numerous places. I suggest making the "set_apply_error_context..." and "reset_apply_error_context..." functions as "static inline void" functions (moving them to the top part of the source file, and removing the existing function declarations for these). Also, can add something similar to below: static inline void set_apply_error_callback_xid(TransactionId xid) { apply_error_callback_arg.remote_xid = xid; } static inline void set_apply_error_callback_xid_info(TransactionId xid, TimestampTz commit_ts) { apply_error_callback_arg.remote_xid = xid; apply_error_callback_arg.commit_ts = commit_ts; } so that instances of, for example: apply_error_callback_arg.remote_xid = prepare_data.xid; apply_error_callback_arg.committs = prepare_data.commit_time; can be: set_apply_error_callback_tx_info(prepare_data.xid, prepare_data.commit_time); (4) The apply_error_callback() function is missing a function header/comment. Regards, Greg Nancarrow Fujitsu Australia
Re: Worth using personality(ADDR_NO_RANDOMIZE) for EXEC_BACKEND on linux?
On Tue, Aug 10, 2021 at 5:43 AM Robert Haas wrote: > On Mon, Aug 9, 2021 at 1:30 PM Alvaro Herrera wrote: > > How common is to get a failure? I know I've run tests under > > EXEC_BACKEND and not seen any failures. Not many runs though. > > On macOS, failures are extremely common. Sometimes I have to run > simple tests many times to get even one success. The proposal on the > table won't help with that problem since it's Linux-specific, but if > there's any way to do something similar on macOS it would be a _huge_ > help. Yeah, make check always fails for me on macOS 11. With the attached experimental hack, it fails only occasionally (1 in 8 runs or so). I don't know why. From 9bcedede452c1f37dd790f86bc587353cc455e3f Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Tue, 10 Aug 2021 22:05:00 +1200 Subject: [PATCH] Try to make EXEC_BACKEND more convenient on macOS. Use posix_spawn() instead of fork() + execv(), with a special undocumented flag that disables ASLR, if you build with -DEXEC_BACKEND -DUSE_POSIX_SPAWN -DUSE_POSIX_SPAWN_DISABLE_ASLR. XXX Experiment only... XXX Still fails make check occasionally :-( --- src/backend/postmaster/postmaster.c | 33 +++ src/bin/pg_ctl/pg_ctl.c | 51 - 2 files changed, 83 insertions(+), 1 deletion(-) diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c index fc0bc8d99e..3a1e9ae8f8 100644 --- a/src/backend/postmaster/postmaster.c +++ b/src/backend/postmaster/postmaster.c @@ -93,6 +93,10 @@ #include #endif +#ifdef USE_POSIX_SPAWN +#include +#endif + #include "access/transam.h" #include "access/xlog.h" #include "catalog/pg_control.h" @@ -4585,6 +4589,10 @@ internal_forkexec(int argc, char *argv[], Port *port) char tmpfilename[MAXPGPATH]; BackendParameters param; FILE *fp; +#ifdef USE_POSIX_SPAWN + posix_spawn_file_actions_t spawn_file_actions; + posix_spawnattr_t spawnattrs; +#endif if (!save_backend_variables(¶m, port)) return -1;/* log made by save_backend_variables */ @@ -4642,6 +4650,30 @@ internal_forkexec(int argc, char *argv[], Port *port) /* Insert temp file name after --fork argument */ argv[2] = tmpfilename; +#ifdef USE_POSIX_SPAWN + posix_spawn_file_actions_init(&spawn_file_actions); + posix_spawnattr_init(&spawnattrs); +#ifdef USE_POSIX_SPAWN_DISABLE_ASLR + /* + * Undocumented magic. See bsd/sys/spawn.h and bsd/kern/kern_exec.c in the + * Darwin sources at https://github.com/apple/darwin-xnu. + */ + if ((errno = posix_spawnattr_setflags(&spawnattrs, 0x0100)) != 0) + elog(ERROR, "could not set ASLR disable flag when spawning backend: %m"); +#endif + errno = posix_spawn(&pid, + postgres_exec_path, + &spawn_file_actions, + &spawnattrs, + argv, + NULL); + if (errno != 0) + { + ereport(LOG, + (errmsg("could not spawn server process \"%s\": %m", + postgres_exec_path))); + } +#else /* Fire off execv in child */ if ((pid = fork_process()) == 0) { @@ -4654,6 +4686,7 @@ internal_forkexec(int argc, char *argv[], Port *port) exit(1); } } +#endif return pid; /* Parent returns pid, or -1 on fork failure */ } diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c index 7985da0a94..f105e483c4 100644 --- a/src/bin/pg_ctl/pg_ctl.c +++ b/src/bin/pg_ctl/pg_ctl.c @@ -18,6 +18,10 @@ #include #include +#ifdef USE_POSIX_SPAWN +#include +#endif + #ifdef HAVE_SYS_RESOURCE_H #include #include @@ -442,15 +446,59 @@ free_readfile(char **optlines) static pgpid_t start_postmaster(void) { +#ifdef USE_POSIX_SPAWN + posix_spawn_file_actions_t spawn_file_actions; + posix_spawnattr_t spawnattrs; +#else char cmd[MAXPGPATH]; +#endif #ifndef WIN32 - pgpid_t pm_pid; + pid_t pm_pid; /* Flush stdio channels just before fork, to avoid double-output problems */ fflush(stdout); fflush(stderr); +#ifdef USE_POSIX_SPAWN + posix_spawn_file_actions_init(&spawn_file_actions); + posix_spawnattr_init(&spawnattrs); +#ifdef USE_POSIX_SPAWN_DISABLE_ASLR + /* + * Undocumented magic. See bsd/sys/spawn.h and bsd/kern/kern_exec.c in the + * Darwin sources at https://github.com/apple/darwin-xnu. + */ + if ((errno = posix_spawnattr_setflags(&spawnattrs, 0x0100)) != 0) + write_stderr(_("could not set undocumented ASLR disable flag when spawning postmaster: %s"), + strerror(errno)); +#endif + { + /* XXX:HACK this is incomplete, doesn't include the postopts... */ + /* XXX Theory here was that shell exec style used by the exec code + * might drop the flag, hence spawning executable directly, but doing + * that properly would involve unpicking the options and their quotes + * etc... */ + char *args[] = { + exec_path, + "-D", + getenv("PGDATA"), + NULL + }; + errno = posix_spawn(&pm_pid, + exec_path, + &spawn_file_actions, + &spawnattrs, + args, + NULL); + } + if (errno != 0) + { + write_stderr(_("%s: could not start server: %s\
Re: Next Steps with Hash Indexes
On Fri, Jul 23, 2021 at 6:16 PM Simon Riggs wrote: > > On Thu, 22 Jul 2021 at 06:10, Amit Kapila wrote: > Complete patch for hash_multicol.v3.patch attached, slightly updated > from earlier patch. > Docs, tests, passes make check. I was looking into the hash_multicoul.v3.patch, I have a question - Hash indexes support only single-column indexes and do not allow - uniqueness checking. + Hash indexes support uniqueness checking. + Hash indexes support multi-column indexes, but only store the hash value + for the first column, so multiple columns are useful only for uniquness + checking. The above comments say that we store hash value only for the first column, my question is why don't we store for other columns as well? I mean we can search the bucket based on the first column hash but the hashes for the other column could be payload data and we can use that to match the hash value for other key columns before accessing the heap, as discussed here[1]. IMHO, this will further reduce the heap access. [1] https://www.postgresql.org/message-id/7192.1506527843%40sss.pgh.pa.us -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: RFC: Logging plan of the running query
On 2021-07-28 20:44, torikoshia wrote: On 2021-07-28 03:45, Pavel Stehule wrote: út 27. 7. 2021 v 20:34 odesílatel Fujii Masao napsal: On 2021/07/09 14:05, torikoshia wrote: On 2021-07-02 23:21, Bharath Rupireddy wrote: On Tue, Jun 22, 2021 at 8:00 AM torikoshia wrote: Updated the patch. Thanks for the patch. Here are some comments on the v4 patch: Thanks for your comments and suggestions! I agree with you and updated the patch. On Thu, Jul 1, 2021 at 3:34 PM Fujii Masao wrote: DO $$ BEGIN PERFORM pg_sleep(100); END$$; When I called pg_log_current_query_plan() to send the signal to the backend executing the above query, I got the following log message. I think that this is not expected message. I guess this issue happened because the information about query text and plan is retrieved from ActivePortal. If this understanding is right, ISTM that we should implement new mechanism so that we can retrieve those information even while nested query is being executed. I'm now working on this comment. One idea is to define new global pointer, e.g., "QueryDesc *ActiveQueryDesc;". This global pointer is set to queryDesc in ExecutorRun() (also maybe ExecutorStart()). And this is reset to NULL in ExecutorEnd() and when an error is thrown. Then ProcessLogCurrentPlanInterrupt() can get the plan of the currently running query from that global pointer instead of ActivePortal, and log it. Thought? It cannot work - there can be a lot of nested queries, and at the end you cannot reset to null, but you should return back pointer to outer query. Thanks for your comment! I'm wondering if we can avoid this problem by saving one outer level QueryDesc in addition to the current one. I'm going to try it. I have updated the patch in this way. In this patch, getting the plan to the DO statement is as follows. - (pid:76608)=# DO $$ BEGIN PERFORM pg_sleep(15); END$$; (pid:74482)=# SELECT pg_log_current_query_plan(76608); LOG: 0: plan of the query running on backend with PID 76608 is: Query Text: SELECT pg_sleep(15) Result (cost=0.00..0.01 rows=1 width=4) Output: pg_sleep('15'::double precision) -- pid:76608 finished DO statement: (pid:74482)=# SELECT pg_log_current_query_plan(76608); LOG: 0: backend with PID 76608 is not running a query - Any thoughts? -- Regards, -- Atsushi Torikoshi NTT DATA CORPORATIONFrom 74b6cdd70de2c6ed0f4c8370af892584ad9d9a4f Mon Sep 17 00:00:00 2001 From: Atsushi Torikoshi Date: Tue, 10 Aug 2021 15:38:57 +0900 Subject: [PATCH v7] Add function to log the untruncated query string and its plan for the query currently running on the backend with the specified process ID. Currently, we have to wait for the query execution to finish before we check its plan. This is not so convenient when investigating long-running queries on production environments where we cannot use debuggers. To improve this situation, this patch adds pg_log_current_query_plan() function that requests to log the plan of the specified backend process. Only superusers are allowed to request to log the plans because allowing any users to issue this request at an unbounded rate would cause lots of log messages and which can lead to denial of service. On receipt of the request, at the next CHECK_FOR_INTERRUPTS(), the target backend logs its plan at LOG_SERVER_ONLY level, so that these plans will appear in the server log but not be sent to the client. Since some codes, tests and comments of pg_log_current_query_plan() are the same with pg_log_backend_memory_contexts(), this patch also refactors them to make them common. Reviewed-by: Bharath Rupireddy, Fujii Masao, Dilip Kumar, Masahiro Ikeda --- doc/src/sgml/func.sgml | 46 +++ src/backend/commands/explain.c | 83 src/backend/executor/execMain.c | 10 +++ src/backend/storage/ipc/procsignal.c | 4 + src/backend/storage/ipc/signalfuncs.c| 61 ++ src/backend/tcop/postgres.c | 7 ++ src/backend/utils/adt/mcxtfuncs.c| 54 ++--- src/backend/utils/init/globals.c | 1 + src/include/catalog/pg_proc.dat | 6 ++ src/include/commands/explain.h | 2 + src/include/miscadmin.h | 1 + src/include/storage/procsignal.h | 1 + src/include/storage/signalfuncs.h| 22 ++ src/include/tcop/pquery.h| 1 + src/test/regress/expected/misc_functions.out | 16 +++- src/test/regress/sql/misc_functions.sql | 12 +-- 16 files changed, 270 insertions(+), 57 deletions(-) create mode 100644 src/include/storage/signalfuncs.h diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 78812b2dbe..13b44b42cb 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -25281,6 +25281,27 @@
Re: Bug in huge simplehash
Ranier Vilela писал 2021-08-10 14:21: Em ter., 10 de ago. de 2021 às 05:53, Yura Sokolov escreveu: I went to check SH_GROW and It is `SH_GROW(SH_TYPE *tb, uint32 newsize)` :-((( Therefore when `tb->size == SH_MAX_SIZE/2` and we call `SH_GROW(tb, tb->size * 2)`, then SH_GROW(tb, 0) is called due to truncation. And SH_COMPUTE_PARAMETERS is also accepts `uint32 newsize`. Ahh... ok, patch is updated to fix this as well. It seems that we need to fix the function prototype too. /* void _grow(_hash *tb) */ -SH_SCOPE void SH_GROW(SH_TYPE * tb, uint32 newsize); +SH_SCOPE void SH_GROW(SH_TYPE * tb, uint64 newsize); Ahh... Thank you, Ranier. Attached v2. regards, - Yura SokolovFrom 82f449896d62be8440934d955d4e368f057005a6 Mon Sep 17 00:00:00 2001 From: Yura Sokolov Date: Tue, 10 Aug 2021 11:51:16 +0300 Subject: [PATCH] Fix new size and sizemask computaton in simplehash.h Fix couple of 32/64bit related errors in simplehash.h: - size of SH_GROW and SH_COMPUTE_PARAMETERS arguments - computation of tb->sizemask. --- src/include/lib/simplehash.h | 14 +- 1 file changed, 5 insertions(+), 9 deletions(-) diff --git a/src/include/lib/simplehash.h b/src/include/lib/simplehash.h index da51781e98e..adda5598338 100644 --- a/src/include/lib/simplehash.h +++ b/src/include/lib/simplehash.h @@ -198,8 +198,8 @@ SH_SCOPE void SH_DESTROY(SH_TYPE * tb); /* void _reset(_hash *tb) */ SH_SCOPE void SH_RESET(SH_TYPE * tb); -/* void _grow(_hash *tb) */ -SH_SCOPE void SH_GROW(SH_TYPE * tb, uint32 newsize); +/* void _grow(_hash *tb, uint64 newsize) */ +SH_SCOPE void SH_GROW(SH_TYPE * tb, uint64 newsize); /* *_insert(_hash *tb, key, bool *found) */ SH_SCOPE SH_ELEMENT_TYPE *SH_INSERT(SH_TYPE * tb, SH_KEY_TYPE key, bool *found); @@ -302,7 +302,7 @@ SH_SCOPE void SH_STAT(SH_TYPE * tb); * the hashtable. */ static inline void -SH_COMPUTE_PARAMETERS(SH_TYPE * tb, uint32 newsize) +SH_COMPUTE_PARAMETERS(SH_TYPE * tb, uint64 newsize) { uint64 size; @@ -322,11 +322,7 @@ SH_COMPUTE_PARAMETERS(SH_TYPE * tb, uint32 newsize) /* now set size */ tb->size = size; - - if (tb->size == SH_MAX_SIZE) - tb->sizemask = 0; - else - tb->sizemask = tb->size - 1; + tb->sizemask = (uint32)(size - 1); /* * Compute the next threshold at which we need to grow the hash table @@ -476,7 +472,7 @@ SH_RESET(SH_TYPE * tb) * performance-wise, when known at some point. */ SH_SCOPE void -SH_GROW(SH_TYPE * tb, uint32 newsize) +SH_GROW(SH_TYPE * tb, uint64 newsize) { uint64 oldsize = tb->size; SH_ELEMENT_TYPE *olddata = tb->data; -- 2.32.0
Re: Added schema level support for publication.
On Mon, Aug 9, 2021 at 11:31 AM vignesh C wrote: > > On Fri, Aug 6, 2021 at 2:00 PM Masahiko Sawada wrote: > > > > --- > > Suppose that a parent table and its child table are defined in > > different schemas, there is a publication for the schema where only > > the parent table is defined, and the subscriber subscribes to the > > publication, should changes for its child table be replicated to the > > subscriber? > > I felt that in this case only the table data that is present in the > publish schema should be sent to the subscriber. Since the child table > schema is not part of the publication, I felt this child table data > should not be replicated. > But, as point out by Sawada-San, the same is true for FOR TABLE case. I think we should be consistent here and should publish the data for the child table if the parent table's schema is published. > I have kept the above same behavior in the case of publication created > using PUBLISH_VIA_PARTITION_ROOT option i.e the child table data will > not be sent. But now I'm feeling we should send the child table data > since it is being sent through the parent table which is part of the > publication. Also this way users can use this option if the user has > the table having partitions designed across the schemas. > This sounds fine to me. -- With Regards, Amit Kapila.
Re: Bug in huge simplehash
Em ter., 10 de ago. de 2021 às 05:53, Yura Sokolov escreveu: > Good day, hackers. > > Our client caught process stuck in tuplehash_grow. There was a query > like > `select ts, count(*) from really_huge_partitioned_table group by ts`, > and > planner decided to use hash aggregation. > > Backtrace shows that oldsize were 2147483648 at the moment. While > newsize > were optimized, looks like it were SH_MAX_SIZE. > > #0 0x00603d0c in tuplehash_grow (tb=0x7f18c3c284c8, > newsize=) at ../../../src/include/lib/simplehash.h:457 > hash = 2147483654 > startelem = 1 > curelem = 1 > oldentry = 0x7f00c299e0d8 > oldsize = 2147483648 > olddata = 0x7f00c299e048 > newdata = 0x32e0448 > i = 6 > copyelem = 6 > > EXPLAIN shows that there are 2604186278 rows in all partitions, but > planner > thinks there will be only 200 unique rows after group by. Looks like we > was > mistaken. > > Finalize GroupAggregate (cost=154211885.42..154211936.09 rows=200 > width=16) > Group Key: really_huge_partitioned_table.ts > -> Gather Merge (cost=154211885.42..154211932.09 rows=400 > width=16) >Workers Planned: 2 >-> Sort (cost=154210885.39..154210885.89 rows=200 width=16) > Sort Key: really_huge_partitioned_table.ts > -> Partial HashAggregate > (cost=154210875.75..154210877.75 rows=200 width=16) >Group Key: really_huge_partitioned_table.ts >-> Append (cost=0.43..141189944.36 > rows=2604186278 width=8) > -> Parallel Index Only Scan using > really_huge_partitioned_table_001_idx2 on > really_huge_partitioned_table_001 (cost=0.43..108117.92 rows=2236977 > width=8) > -> Parallel Index Only Scan using > really_huge_partitioned_table_002_idx2 on > really_huge_partitioned_table_002 (cost=0.43..114928.19 rows=2377989 > width=8) > and more than 400 partitions more > > After some investigation I found bug that is present in simplehash from > its > beginning: > > - sizemask is set only in SH_COMPUTE_PARAMETERS . And it is set in this > way: > > /* now set size */ > tb->size = size; > > if (tb->size == SH_MAX_SIZE) > tb->sizemask = 0; > else > tb->sizemask = tb->size - 1; > >that means, when we are resizing to SH_MAX_SIZE, sizemask becomes > zero. > > - then sizemask is used to SH_INITIAL_BUCKET and SH_NEXT to compute > initial and >next position: > >SH_INITIAL_BUCKET(SH_TYPE * tb, uint32 hash) > return hash & tb->sizemask; >SH_NEXT(SH_TYPE * tb, uint32 curelem, uint32 startelem) > curelem = (curelem + 1) & tb->sizemask; > > - and then SH_GROW stuck in element placing loop: > >startelem = SH_INITIAL_BUCKET(tb, hash); >curelem = startelem; >while (true) > curelem = SH_NEXT(tb, curelem, startelem); > > There is Assert(curelem != startelem) in SH_NEXT, but since no one test > it > with 2 billion elements, it were not triggered. And Assert is not > compiled > in production code. > > Attached patch fixes it with removing condition and type casting: > > /* now set size */ > tb->size = size; > tb->sizemask = (uint32)(size - 1); > > > OOPS > > While writting this letter, I looke at newdata in the frame of > tuplehash_grow: > > newdata = 0x32e0448 > > It is bellow 4GB border. Allocator does not allocate many-gigabytes > chunks > (and we certainly need 96GB in this case) in sub 4GB address space. > Because > mmap doesn't do this. > > I went to check SH_GROW and It is `SH_GROW(SH_TYPE *tb, uint32 > newsize)` > :-((( > Therefore when `tb->size == SH_MAX_SIZE/2` and we call `SH_GROW(tb, > tb->size * 2)`, > then SH_GROW(tb, 0) is called due to truncation. > And SH_COMPUTE_PARAMETERS is also accepts `uint32 newsize`. > > Ahh... ok, patch is updated to fix this as well. > It seems that we need to fix the function prototype too. /* void _grow(_hash *tb) */ -SH_SCOPE void SH_GROW(SH_TYPE * tb, uint32 newsize); +SH_SCOPE void SH_GROW(SH_TYPE * tb, uint64 newsize); regards, Ranier Vilela
Re: Skipping logical replication transactions on subscriber side
On Tue, Aug 10, 2021 at 3:29 PM Amit Kapila wrote: > > On Tue, Aug 10, 2021 at 10:37 AM Masahiko Sawada > wrote: > > > > I've attached the latest patches that incorporated all comments I got > > so far. Please review them. > > > > I am not able to apply the latest patch > (v6-0001-Add-errcontext-to-errors-happening-during-applyin) on HEAD, > getting the below error: > patching file src/backend/replication/logical/worker.c > Hunk #11 succeeded at 1195 (offset 50 lines). > Hunk #12 succeeded at 1253 (offset 50 lines). > Hunk #13 succeeded at 1277 (offset 50 lines). > Hunk #14 succeeded at 1305 (offset 50 lines). > Hunk #15 succeeded at 1330 (offset 50 lines). > Hunk #16 succeeded at 1362 (offset 50 lines). > Hunk #17 succeeded at 1508 (offset 50 lines). > Hunk #18 succeeded at 1524 (offset 50 lines). > Hunk #19 succeeded at 1645 (offset 50 lines). > Hunk #20 succeeded at 1671 (offset 50 lines). > Hunk #21 succeeded at 1772 (offset 50 lines). > Hunk #22 succeeded at 1828 (offset 50 lines). > Hunk #23 succeeded at 1934 (offset 50 lines). > Hunk #24 succeeded at 1962 (offset 50 lines). > Hunk #25 succeeded at 2399 (offset 50 lines). > Hunk #26 FAILED at 2405. > Hunk #27 succeeded at 3730 (offset 54 lines). > 1 out of 27 hunks FAILED -- saving rejects to file > src/backend/replication/logical/worker.c.rej > Sorry, I forgot to rebase the patches to the current HEAD. Since stream_prepare is introduced, I'll add some tests to the patches. I’ll submit the new patches tomorrow that also incorporates your comments on v6-0001 patch. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
Re: add operator ^= to mean not equal (like != and <>)
Le mar. 10 août 2021 à 18:41, Daniel Gustafsson a écrit : > > On 10 Aug 2021, at 12:21, David Rowley wrote: > > > > I'm strongly against inheriting warts from Oracle for apparently good > > reason. At the very least, anyone who's using ^= for some other > > purpose is very likely to be upset with us. Anyone else who really > > needs this for compatibility reasons can just create a set of > > operators for themselves. > > Agreed, if it’s because of Oracle compatibility then this seems like > something > which has a better fit in orafce or a similar extension like that. > +1 >
Re: add operator ^= to mean not equal (like != and <>)
> On 10 Aug 2021, at 12:21, David Rowley wrote: > > On Tue, 10 Aug 2021 at 21:13, Daniel Gustafsson wrote: >> >>> On 10 Aug 2021, at 11:10, Andreas Karlsson wrote: >> >>> What is he reason you want to add ^= is there any other databases which >>> uses ^= for inequality? >> >> I assume it's because of Oracle compatibility which AFAIK is the only >> database >> supporting ^=. > > Seems likely. > > I'm strongly against inheriting warts from Oracle for apparently good > reason. At the very least, anyone who's using ^= for some other > purpose is very likely to be upset with us. Anyone else who really > needs this for compatibility reasons can just create a set of > operators for themselves. Agreed, if it’s because of Oracle compatibility then this seems like something which has a better fit in orafce or a similar extension like that. -- Daniel Gustafsson https://vmware.com/
Re: add operator ^= to mean not equal (like != and <>)
On Tue, 10 Aug 2021 at 21:13, Daniel Gustafsson wrote: > > > On 10 Aug 2021, at 11:10, Andreas Karlsson wrote: > > > What is he reason you want to add ^= is there any other databases which > > uses ^= for inequality? > > I assume it's because of Oracle compatibility which AFAIK is the only database > supporting ^=. Seems likely. I'm strongly against inheriting warts from Oracle for apparently good reason. At the very least, anyone who's using ^= for some other purpose is very likely to be upset with us. Anyone else who really needs this for compatibility reasons can just create a set of operators for themselves. David
Re: Skipping logical replication transactions on subscriber side
On Tue, Aug 10, 2021 at 11:59 AM Amit Kapila wrote: > > On Tue, Aug 10, 2021 at 10:37 AM Masahiko Sawada > wrote: > > > > I've attached the latest patches that incorporated all comments I got > > so far. Please review them. > > > > I am not able to apply the latest patch > (v6-0001-Add-errcontext-to-errors-happening-during-applyin) on HEAD, > getting the below error: > Few comments on v6-0001-Add-errcontext-to-errors-happening-during-applyin == 1. While applying DML operations, we are setting up the error context multiple times due to which the context information is not appropriate. The first is set in apply_dispatch and then during processing, we set another error callback slot_store_error_callback in slot_store_data and slot_modify_data. When I forced one of the errors in slot_store_data(), it displays the below information in CONTEXT which doesn't make much sense. 2021-08-10 15:16:39.887 IST [6784] ERROR: incorrect binary data format in logical replication column 1 2021-08-10 15:16:39.887 IST [6784] CONTEXT: processing remote data for replication target relation "public.test1" column "id" during apply of "INSERT" for relation "public.test1" in transaction with xid 740 committs 2021-08-10 14:44:38.058174+05:30 2. I think we can slightly change the new context information as below: Before during apply of "INSERT" for relation "public.test1" in transaction with xid 740 committs 2021-08-10 14:44:38.058174+05:30 After during apply of "INSERT" for relation "public.test1" in transaction id 740 with commit timestamp 2021-08-10 14:44:38.058174+05:30 3. +/* Struct for saving and restoring apply information */ +typedef struct ApplyErrCallbackArg +{ + LogicalRepMsgType command; /* 0 if invalid */ + + /* Local relation information */ + char*nspname; + char*relname; ... ... + +static ApplyErrCallbackArg apply_error_callback_arg = +{ + .command = 0, + .relname = NULL, + .nspname = NULL, Let's initialize the struct members in the order they are declared. The order of relname and nspname should be another way. 4. + + TransactionId remote_xid; + TimestampTz committs; +} ApplyErrCallbackArg; It might be better to add a comment like "remote xact information" above these structure members. 5. +static void +apply_error_callback(void *arg) +{ + StringInfoData buf; + + if (apply_error_callback_arg.command == 0) + return; + + initStringInfo(&buf); At the end of this call, it is better to free this (pfree(buf.data)) 6. In the commit message, you might want to indicate that this additional information can be used by the future patch to skip the conflicting transaction. -- With Regards, Amit Kapila.
Re: Bug in huge simplehash
On Tue, 10 Aug 2021 at 20:53, Yura Sokolov wrote: > EXPLAIN shows that there are 2604186278 rows in all partitions, but > planner > thinks there will be only 200 unique rows after group by. Looks like we > was > mistaken. This looks unrelated. Looks like the planner used DEFAULT_NUM_DISTINCT. > /* now set size */ > tb->size = size; > > if (tb->size == SH_MAX_SIZE) > tb->sizemask = 0; > else > tb->sizemask = tb->size - 1; Ouch. That's not great. > /* now set size */ > tb->size = size; > tb->sizemask = (uint32)(size - 1); That fix seems fine. > I went to check SH_GROW and It is `SH_GROW(SH_TYPE *tb, uint32 > newsize)` > :-((( > Therefore when `tb->size == SH_MAX_SIZE/2` and we call `SH_GROW(tb, > tb->size * 2)`, > then SH_GROW(tb, 0) is called due to truncation. > And SH_COMPUTE_PARAMETERS is also accepts `uint32 newsize`. Yeah. Agreed. I don't see anything wrong with your fix for that. I'm surprised nobody has hit this before. I guess having that many groups is not common. Annoyingly this just missed the window for being fixed in the minor releases going out soon. We'll need to wait a few days before patching. David
RE: ECPG bug fix: DECALRE STATEMENT and DEALLOCATE, DESCRIBE
Dear Meskes and any hackers, > Yes, at least technically. I think DESCRIBE should accept the cached > connection name, although it won't really matter. I sought docs too and I found that Pro*C have such a same policy, so it might be reasonable: https://docs.oracle.com/en/database/oracle/oracle-database/21/lnpcc/Oracle-dynamic-SQL.html#GUID-0EB50EB7-D4C8-401D-AFCD-340D281711C4 Anyway I revised patches again in the current spec. I separated them into 6 parts: 0001: move "connection = NULL" to top rule. This is per Wang. 0002: adds supporting deallocate statement. 0003: adds supporting describe statement. The above and this are main parts. 0004: adds warning then the connection is overwritten. This is per Horiguchi-san. 0005: adds warning then the connection is overwritten. This is per Horiguchi-san and Paquier. 0006: adds some tests. 0001 is the solution of follows: https://www.postgresql.org/message-id/OSBPR01MB42141999ED8EFDD4D8FDA42CF2E29%40OSBPR01MB4214.jpnprd01.prod.outlook.com This bug is caused because "connection = NULL" is missing is missing in some cases, so I force to substitute NULL in the statement: rule, the top-level in the parse tree. I also remove the substitution from output.c because such line is overlapped. If you think this change is too large, I can erase 0001 and add a substitution to the end part of ECPGCursorStmt rule. That approach is also resolve the bug and impact is very small. 0004 is an optional patch, this is not related with DEALLOCATE and DESCRIBE. We were discussing about how should work when followings are pre-compiled and executed: > EXEC SQL AT conn1 DECLARE stmt STATEMENT; > EXEC SQL AT conn2 EXECUTE stmt INTO ..; Currently line 2 will execute at conn1 without any warnings (and this is the Oracle's spec) but Horiguchi-san says it is non-obvious. So I added a precompiler-warning when the above situation. More discussion might be needed here, but this is not main part. About 0005, see previous discussion: > Since we don't allow descriptors with the same name even if they are > for the different connections, I think we can set the current > connection if any (which is set either by AT option or statement-bound > one) to i->connection silently if i->connection is NULL in > lookup_descriptor(). What do you think about this? How do you think? Best Regards, Hayato Kuroda FUJITSU LIMITED v04-0001-move-toplevel.patch Description: v04-0001-move-toplevel.patch v04-0002-allow-deallocate.patch Description: v04-0002-allow-deallocate.patch v04-0003-allow-describe.patch Description: v04-0003-allow-describe.patch v04-0004-add-warning-in-check_declared_list.patch Description: v04-0004-add-warning-in-check_declared_list.patch v04-0005-descriptor.patch Description: v04-0005-descriptor.patch v04-0006-add-test.patch Description: v04-0006-add-test.patch
Re: add operator ^= to mean not equal (like != and <>)
> On 10 Aug 2021, at 11:10, Andreas Karlsson wrote: > What is he reason you want to add ^= is there any other databases which uses > ^= for inequality? I assume it's because of Oracle compatibility which AFAIK is the only database supporting ^=. -- Daniel Gustafsson https://vmware.com/
Re: add operator ^= to mean not equal (like != and <>)
On 8/10/21 10:27 AM, 孙诗浩(思才) wrote: Before send patch review, I want to konw whether the postgres maintainer will approve my changes. So, please give me some advice. Welcome! I do not think that is a feature which will get much interest from the developers since it is unclear to me what the advantage of yet another alias for not equal would be. It just takes up yet another operator and means that there is yet another thing to remember for the users. Personally I feel it is bad enough that we have two ways of writing it. What is he reason you want to add ^= is there any other databases which uses ^= for inequality? Andreas
Bug in huge simplehash
Good day, hackers. Our client caught process stuck in tuplehash_grow. There was a query like `select ts, count(*) from really_huge_partitioned_table group by ts`, and planner decided to use hash aggregation. Backtrace shows that oldsize were 2147483648 at the moment. While newsize were optimized, looks like it were SH_MAX_SIZE. #0 0x00603d0c in tuplehash_grow (tb=0x7f18c3c284c8, newsize=) at ../../../src/include/lib/simplehash.h:457 hash = 2147483654 startelem = 1 curelem = 1 oldentry = 0x7f00c299e0d8 oldsize = 2147483648 olddata = 0x7f00c299e048 newdata = 0x32e0448 i = 6 copyelem = 6 EXPLAIN shows that there are 2604186278 rows in all partitions, but planner thinks there will be only 200 unique rows after group by. Looks like we was mistaken. Finalize GroupAggregate (cost=154211885.42..154211936.09 rows=200 width=16) Group Key: really_huge_partitioned_table.ts -> Gather Merge (cost=154211885.42..154211932.09 rows=400 width=16) Workers Planned: 2 -> Sort (cost=154210885.39..154210885.89 rows=200 width=16) Sort Key: really_huge_partitioned_table.ts -> Partial HashAggregate (cost=154210875.75..154210877.75 rows=200 width=16) Group Key: really_huge_partitioned_table.ts -> Append (cost=0.43..141189944.36 rows=2604186278 width=8) -> Parallel Index Only Scan using really_huge_partitioned_table_001_idx2 on really_huge_partitioned_table_001 (cost=0.43..108117.92 rows=2236977 width=8) -> Parallel Index Only Scan using really_huge_partitioned_table_002_idx2 on really_huge_partitioned_table_002 (cost=0.43..114928.19 rows=2377989 width=8) and more than 400 partitions more After some investigation I found bug that is present in simplehash from its beginning: - sizemask is set only in SH_COMPUTE_PARAMETERS . And it is set in this way: /* now set size */ tb->size = size; if (tb->size == SH_MAX_SIZE) tb->sizemask = 0; else tb->sizemask = tb->size - 1; that means, when we are resizing to SH_MAX_SIZE, sizemask becomes zero. - then sizemask is used to SH_INITIAL_BUCKET and SH_NEXT to compute initial and next position: SH_INITIAL_BUCKET(SH_TYPE * tb, uint32 hash) return hash & tb->sizemask; SH_NEXT(SH_TYPE * tb, uint32 curelem, uint32 startelem) curelem = (curelem + 1) & tb->sizemask; - and then SH_GROW stuck in element placing loop: startelem = SH_INITIAL_BUCKET(tb, hash); curelem = startelem; while (true) curelem = SH_NEXT(tb, curelem, startelem); There is Assert(curelem != startelem) in SH_NEXT, but since no one test it with 2 billion elements, it were not triggered. And Assert is not compiled in production code. Attached patch fixes it with removing condition and type casting: /* now set size */ tb->size = size; tb->sizemask = (uint32)(size - 1); OOPS While writting this letter, I looke at newdata in the frame of tuplehash_grow: newdata = 0x32e0448 It is bellow 4GB border. Allocator does not allocate many-gigabytes chunks (and we certainly need 96GB in this case) in sub 4GB address space. Because mmap doesn't do this. I went to check SH_GROW and It is `SH_GROW(SH_TYPE *tb, uint32 newsize)` :-((( Therefore when `tb->size == SH_MAX_SIZE/2` and we call `SH_GROW(tb, tb->size * 2)`, then SH_GROW(tb, 0) is called due to truncation. And SH_COMPUTE_PARAMETERS is also accepts `uint32 newsize`. Ahh... ok, patch is updated to fix this as well. regards. - Yura Sokolov y.soko...@postgrespro.ru funny.fal...@gmail.comFrom a8283d3a17c630a65e1b42f8617e07873a30fbc7 Mon Sep 17 00:00:00 2001 From: Yura Sokolov Date: Tue, 10 Aug 2021 11:51:16 +0300 Subject: [PATCH] Fix new size and sizemask computaton in simplehash.h Fix couple of 32/64bit related errors in simplehash.h: - size of SH_GROW and SH_COMPUTE_PARAMETERS arguments - computation of tb->sizemask. --- src/include/lib/simplehash.h | 10 +++--- 1 file changed, 3 insertions(+), 7 deletions(-) diff --git a/src/include/lib/simplehash.h b/src/include/lib/simplehash.h index da51781e98e..2287601cfa1 100644 --- a/src/include/lib/simplehash.h +++ b/src/include/lib/simplehash.h @@ -302,7 +302,7 @@ SH_SCOPE void SH_STAT(SH_TYPE * tb); * the hashtable. */ static inline void -SH_COMPUTE_PARAMETERS(SH_TYPE * tb, uint32 newsize) +SH_COMPUTE_PARAMETERS(SH_TYPE * tb, uint64 newsize) { uint64 size; @@ -322,11 +322,7 @@ SH_COMPUTE_PARAMETERS(SH_TYPE * tb, uint32 newsize) /* now set size */ tb->size = size; - - if (tb->size == SH_MAX_SIZE) - tb->sizemask = 0; - else - tb->sizemask = tb->size - 1; + tb->sizemask = (uint32)(size - 1); /* * Compute the next threshold at which we need to grow th
Re: [PATCH] Hooks at XactCommand level
Le 30/07/2021 à 23:49, Tom Lane a écrit : Andres Freund writes: On 2021-07-30 13:58:51 -0400, Tom Lane wrote: I've not read this version of the patch, but I see from the cfbot's results that it's broken postgres_fdw. I think this may partially be an issue with the way that postgres_fdw uses the callback than with the patch. It disconnects from the server *regardless* of the XactEvent passed to the callback. That makes it really hard to extend the callback mechanism to further events... Perhaps. Nonetheless, I thought upthread that adding these events as Xact/SubXactCallback events was the wrong design, and I still think that. A new hook would be a more sensible way. I'm *very* unconvinced it makes sense to implement a feature like this in an extension / that we should expose API for that purpose. To me the top-level transaction state is way too tied to our internals for it to be reasonably dealt with in an extension. Yeah, that's the other major problem --- the use-case doesn't seem very convincing. I'm not even sold on the goal, let alone on trying to implement it by hooking into these particular places. I think that'll end up being buggy and fragile as well as not very performant. I've attached the new version v5 of the patch that use a hook instead of the use of a xact callback. Compared to the first implementation calls to the hook have been extracted from the start_xact_command() function. The test extension have also be updated. If I understand well the last discussions there is no chance of having this hook included. If there is no contrary opinion I will withdraw the patch from the commitfest. However thank you so much to have taken time to review this proposal. Best regards, -- Gilles Darold diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c index 530caa520b..bc62a2cb98 100644 --- a/src/backend/tcop/postgres.c +++ b/src/backend/tcop/postgres.c @@ -207,6 +207,8 @@ static void log_disconnections(int code, Datum arg); static void enable_statement_timeout(void); static void disable_statement_timeout(void); +/* Hooks for plugins to get control at end of start_xact_command() */ +XactCommandStart_hook_type start_xact_command_hook = NULL; /* * routines to obtain user input @@ -989,6 +991,13 @@ exec_simple_query(const char *query_string) */ start_xact_command(); + /* + * Now give loadable modules a chance to execute code + * before a transaction command is processed. + */ + if (start_xact_command_hook) + (*start_xact_command_hook) (); + /* * Zap any pre-existing unnamed statement. (While not strictly necessary, * it seems best to define simple-Query mode as if it used the unnamed @@ -1082,6 +1091,13 @@ exec_simple_query(const char *query_string) /* Make sure we are in a transaction command */ start_xact_command(); + /* + * Now give loadable modules a chance to execute code + * before a transaction command is processed. + */ + if (start_xact_command_hook) + (*start_xact_command_hook) (); + /* * If using an implicit transaction block, and we're not already in a * transaction block, start an implicit block to force this statement @@ -1361,6 +1377,13 @@ exec_parse_message(const char *query_string, /* string to execute */ */ start_xact_command(); + /* + * Now give loadable modules a chance to execute code + * before a transaction command is processed. + */ + if (start_xact_command_hook) + (*start_xact_command_hook) (); + /* * Switch to appropriate context for constructing parsetrees. * @@ -1647,6 +1670,13 @@ exec_bind_message(StringInfo input_message) */ start_xact_command(); + /* + * Now give loadable modules a chance to execute code + * before a transaction command is processed. + */ + if (start_xact_command_hook) + (*start_xact_command_hook) (); + /* Switch back to message context */ MemoryContextSwitchTo(MessageContext); @@ -2140,6 +2170,13 @@ exec_execute_message(const char *portal_name, long max_rows) */ start_xact_command(); + /* + * Now give loadable modules a chance to execute code + * before a transaction command is processed. + */ + if (start_xact_command_hook) + (*start_xact_command_hook) (); + /* * If we re-issue an Execute protocol request against an existing portal, * then we are only fetching more rows rather than completely re-executing @@ -2546,6 +2583,13 @@ exec_describe_statement_message(const char *stmt_name) */ start_xact_command(); + /* + * Now give loadable modules a chance to execute code + * before a transaction command is processed. + */ + if (start_xact_command_hook) + (*start_xact_command_hook) (); + /* Switch back to message context */ MemoryContextSwitchTo(MessageContext); @@ -2641,6 +2685,13 @@ exec_describe_portal_message(const char *portal_name) */ start_xact_command(); + /* + * Now give loadable modules a chance to execute cod
add operator ^= to mean not equal (like != and <>)
Hi everyone, I am doding some jobs in postgres. I want to add "^=" like "!=" and "<>". So i modify the code in scan.l. Plan 1: equals_greater "=>" less_equals "<=" greater_equals ">=" less_greater "<>" not_equals (!=|\^=) Maybe i can delete some code to make the code more simple. Plan 2: equals_greater "=>" less_equals "<=" greater_equals ">=" less_greater "<>" (delete this definition) not_equals (!=|\^=|<>) Before send patch review, I want to konw whether the postgres maintainer will approve my changes. So, please give me some advice. Thank you!
Re: [PATCH] Hooks at XactCommand level
Le 31/07/2021 à 01:28, Andres Freund a écrit : I'm *very* unconvinced it makes sense to implement a feature like this in an extension / that we should expose API for that purpose. To me the top-level transaction state is way too tied to our internals for it to be reasonably dealt with in an extension. Yeah, that's the other major problem --- the use-case doesn't seem very convincing. I'm not even sold on the goal, let alone on trying to implement it by hooking into these particular places. I think that'll end up being buggy and fragile as well as not very performant. I'm more favorable than you on the overall goal. Migrations to PG are a frequent and good thing and as discussed before, lots of PG forks ended up implementing a version of this. Clearly there's demand. Sorry for the response delay. I have though about adding this odd hook to be able to implement this feature through an extension because I don't think this is something that should be implemented in core. There were also patches proposals which were all rejected. We usually implement the feature at client side which is imo enough for the use cases. But the problem is that this a catastrophe in term of performances. I have done a small benchmark to illustrate the problem. This is a single process client on the same host than the PG backend. For 10,000 tuples inserted with 50% of failures and rollback at statement level handled at client side: Expected: 5001, Count: 5001 DML insert took: 13 wallclock secs ( 0.53 usr + 0.94 sys = 1.47 CPU) Now with statement at rollback level handled at server side using the hook and the extension: Expected: 5001, Count: 5001 DML insert took: 4 wallclock secs ( 0.27 usr + 0.32 sys = 0.59 CPU) And with 100,000 tuples this is worst. Without the extension: Expected: 50001, Count: 50001 DML insert took: 1796 wallclock secs (14.95 usr + 20.29 sys = 35.24 CPU) with server side Rollback at statement level: Expected: 50001, Count: 50001 DML insert took: 372 wallclock secs ( 4.85 usr + 5.45 sys = 10.30 CPU) I think this is not so uncommon use cases and that could shows the interest of such extension. However, I think a proper implementation would require a substantial amount of effort. Including things like optimizing the subtransaction logic so that switching the feature on doesn't lead to xid wraparound issues. Adding odd hooks doesn't move us towards a real solution imo. I would like to help on this part but unfortunately I have no idea on how we can improve that. Best regards, -- Gilles Darold
Re: Added schema level support for publication.
On Fri, Aug 6, 2021 at 6:32 PM vignesh C wrote: > > Thanks for the comments, the attached v19 patch has the fixes for the > comments. > Some more review comments, this time for the v19 patch: (1) In patch v19-0002, there's still a couple of instances where it says "publication type" instead of "publication kind". (2) src/backend/catalog/pg_publication.c "This should only be used for normal publications." What exactly is meant by that - what type is considered normal? Maybe that comment should be more specific. (3) src/backend/catalog/pg_publication.c GetSchemaPublications Function header says "Gets list of publication oids for publications marked as FOR SCHEMA." Shouldn't it say something like: "Gets the list of FOR SCHEMA publication oids associated with a specified schema oid." or something like that? (since the function accepts a schemaid parameter) (4) src/backend/commands/publicationcmds.c In AlterPublicationSchemas(), I notice that the two error cases "cannot be added to or dropped ..." don't check stmt->action for DEFELEM_ADD/DEFELEM_DROP. Is that OK? (i.e. should these cases error out if stmt->action is not DEFELEM_ADD/DEFELEM_DROP?) Also, I assume that the else part (not DEFELEM_ADD/DEFELEM_DROP) is the "Set" case? Maybe a comment should be added to the top of the else part. (5) src/backend/commands/publicationcmds.c Typo (same in 2 places): "automaically" -> "automatically" + * will be released automaically at the end of create publication See functions: (i) CreatePublication (ii) AlterPublicationSchemas (6) src/backend/commands/publicationcmds.c LockSchemaList Function header says "are locked in ShareUpdateExclusiveLock mode" but then code calls LockDatabaseObject using "AccessShareLock". Regards, Greg Nancarrow Fujitsu Australia
Re: ECPG bug fix: DECALRE STATEMENT and DEALLOCATE, DESCRIBE
> I do think all committers need to understand the role of the RMT so > they > can respond appropriately. Do we need to communicate this better? Being the one who assumed a different procedure, yes please. :) Thanks, Michael -- Michael Meskes Michael at Fam-Meskes dot De Michael at Meskes dot (De|Com|Net|Org) Meskes at (Debian|Postgresql) dot Org
Re: [PATCH] test/ssl: rework the sslfiles Makefile target
> On 10 Aug 2021, at 09:22, Michael Paquier wrote: > > On Fri, Jul 30, 2021 at 03:11:49PM +, Jacob Champion wrote: >> No worries, it's easy enough to unroll the expansion manually. The >> annoyances without secondary expansion are the duplicated lines for >> each individual CA and the need to introduce .INTERMEDIATE targets so >> that cleanup works as intended. >> >> Attached is a v3 that does that, and introduces a fallback in case >> openssl isn't on the PATH. I also missed a Makefile dependency on >> cas.config the first time through, which has been fixed. The patch you >> pulled out earlier is 0001 in the set. > > Patch 0001 is a good cleanup. Daniel, are you planning to apply that? Yes, it’s on my todo for today. > Regarding 0002, I am not sure. Even if this reduces a lot of > duplication, which is really nice, enforcing .SECONDARY to not trigger > with a change impacting Makefile.global.in does not sound very > appealing to me in the long-run, TBH. I personally think the increased readability and usability from what we have today warrants the changes. Is it the use of .SECONDARY or the change in the global Makefile you object to (or both)? -- Daniel Gustafsson https://vmware.com/
Re: ECPG bug fix: DECALRE STATEMENT and DEALLOCATE, DESCRIBE
> Okay. So you mean to change DESCRIBE and DEALLOCATE to be able to > handle cached connection names, as of [1]? For [DE]ALLOCATE, I agree Yes, at least technically. I think DESCRIBE should accept the cached connection name, although it won't really matter. > that it could be a good thing. declare.pgc seems to rely on that > already but the tests are incorrect as I mentioned in [2]. For > DESCRIBE, that provides data about a result set, I find the > assignment > of a connection a bit strange, and even if this would allow the use > of > the same statement name for multiple connections, it seems to me that > there is a risk of breaking existing applications. There should not > be that many, so perhaps that's fine anyway. I don't think we'd break anything given that DECLARE STATEMENT is new. Also please keep in mind that you can use EXEC SQL AT ... DESCRIBE ...; already anyway. Again, not very meaningful but why should we accept a connection one way but not the other? Michael -- Michael Meskes Michael at Fam-Meskes dot De Michael at Meskes dot (De|Com|Net|Org) Meskes at (Debian|Postgresql) dot Org signature.asc Description: This is a digitally signed message part
Re: ECPG bug fix: DECALRE STATEMENT and DEALLOCATE, DESCRIBE
Peter, > I think that there was a snowballing effect here. We both made > mistakes that compounded. I apologize for my role in this whole mess. Completely agreed. I think we both took things for granted that the other one didn't take into account at all. I'm sorry about that, too. Michael -- Michael Meskes Michael at Fam-Meskes dot De Michael at Meskes dot (De|Com|Net|Org) Meskes at (Debian|Postgresql) dot Org
Re: [PATCH] test/ssl: rework the sslfiles Makefile target
On Fri, Jul 30, 2021 at 03:11:49PM +, Jacob Champion wrote: > No worries, it's easy enough to unroll the expansion manually. The > annoyances without secondary expansion are the duplicated lines for > each individual CA and the need to introduce .INTERMEDIATE targets so > that cleanup works as intended. > > Attached is a v3 that does that, and introduces a fallback in case > openssl isn't on the PATH. I also missed a Makefile dependency on > cas.config the first time through, which has been fixed. The patch you > pulled out earlier is 0001 in the set. Patch 0001 is a good cleanup. Daniel, are you planning to apply that? Regarding 0002, I am not sure. Even if this reduces a lot of duplication, which is really nice, enforcing .SECONDARY to not trigger with a change impacting Makefile.global.in does not sound very appealing to me in the long-run, TBH. -- Michael signature.asc Description: PGP signature
Re: Segment fault when excuting SPI function On PG with commit 41c6a5be
On Fri, Jul 30, 2021 at 11:22:43AM -0400, Tom Lane wrote: > Happy to make it so. Anyone have suggestions about the wording of > the message? For the archives, this has been applied as of ef12f32, and the new message seems explicit enough: + if (unlikely(portal == NULL)) + elog(ERROR, "cannot execute SQL without an outer snapshot or portal"); -- Michael signature.asc Description: PGP signature
Re: fix DECLARE tab completion
On Tue, Aug 03, 2021 at 01:58:44AM +, shinya11.k...@nttdata.com wrote: > In the discussion of [1], the option ASENSITIVE was added to DECLARE. > I have improved its tab completion and attached a patch. > > Do you think? Thanks, fixed. -- Michael signature.asc Description: PGP signature