Re: [HACKERS] Error with index on unlogged table
On December 10, 2015 5:02:27 AM GMT+01:00, Michael Paquier wrote: >On Wed, Dec 9, 2015 at 9:07 PM, Andres Freund >wrote: >> On 2015-12-09 21:03:47 +0900, Michael Paquier wrote: >>> Oh, OK. I didn't read though your lines correctly. So you basically >>> mean that we would look at the init files that are on disk, and >check >>> if they are empty. If they are, we simply use XLogReadBufferExtended >>> to fetch the INIT_FORKNUM content and fill in another buffer for the >>> MAIN_FORKNUM. More or less right? >> >> We would not just do so if they're empty, we would just generally >copy the file >> via shared buffers, instead of copy_file(). But we'd get the file >size >> from the filesystem (which is fine, we make sure it is correct during >> replay). > >So, this suggestion is basically implementing the reverse operation of >GetRelationPath() to be able to rebuild a RelFileNode from scratch and >then look at the shared buffers needed. Isn't it a bit brittle for >back-branches? RelFileNode stuff is available easily through records >when replaying individually FPIs, but not really in this code path. Oh, sorry. In definitely not thinking about doing this for the back branches. That was more musing about a way to optimize this. Andres --- Please excuse brevity and formatting - I am writing this on my mobile phone. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.
Hello, At Thu, 10 Dec 2015 15:28:14 +0900, Michael Paquier wrote in > On Thu, Dec 10, 2015 at 4:40 AM, Robert Haas wrote: > > On Mon, Dec 7, 2015 at 2:41 AM, Amit Langote > > wrote: > >> Hm, I guess progress messages would not change that much over the course > >> of a long-running command. How about we pass only the array(s) that we > >> change and NULL for others: > >> > >> With the following prototype for pgstat_report_progress: > >> > >> void pgstat_report_progress(uint32 *uint32_param, int num_uint32_param, > >>bool *message_param[], int num_message_param); > >> > >> If we just wanted to change, say scanned_heap_pages, then we report that > >> using: > >> > >> uint32_param[1] = scanned_heap_pages; > >> pgstat_report_progress(uint32_param, 3, NULL, 0); > >> > >> Also, pgstat_report_progress() should check each of its parameters for > >> NULL before iterating over to copy. So in most reports over the course of > >> a command, the message_param would be NULL and hence not copied. > > > > It's going to be *really* important that this facility provides a > > lightweight way of updating progress, so I think this whole API is > > badly designed. VACUUM, for example, is going to want a way to update > > the individual counter for the number of pages it's scanned after > > every page. It should not have to pass all of the other information > > that is part of a complete report. It should just be able to say > > pgstat_report_progress_update_counter(1, pages_scanned) or something > > of this sort. Don't marshal all of the data and then make > > pgstat_report_progress figure out what's changed. Provide a series of > > narrow APIs where the caller tells you exactly what they want to > > update, and you update only that. It's fine to have a > > pgstat_report_progress() function to update everything at once, for > > the use at the beginning of a command, but the incremental updates > > within the command should do something lighter-weight. > > [first time looking really at the patch and catching up with this thread] > > Agreed. As far as I can guess from the code, those should be as > followed to bloat pgstat message queue a minimum: > > + pgstat_report_activity_flag(ACTIVITY_IS_VACUUM); > /* > * Loop to process each selected relation. > */ > Defining a new routine for this purpose is a bit surprising. Cannot we > just use pgstat_report_activity with a new BackendState STATE_INVACUUM > or similar if we pursue the progress tracking approach? The author might want to know vacuum status *after* it has been finished. But it is reset just after the end of a vacuum. One concern is that BackendState adds new value for pg_stat_activiry.state and it might confuse someone using it but I don't see other issue on it. > A couple of comments: > - The relation OID should be reported and not its name. In case of a > relation rename that would not be cool for tracking, and most users > are surely going to join with other system tables using it. +1 > - The progress tracking facility adds a whole level of complexity for > very little gain, and IMO this should *not* be part of PgBackendStatus > because in most cases its data finishes wasted. We don't expect > backends to run frequently such progress reports, do we? I strongly thought the same thing but I haven't came up with better place for it to be stored. but, > My opinion on > the matter if that we should define a different collector data for > vacuum, with something like PgStat_StatVacuumEntry, then have on top > of it a couple of routines dedicated at feeding up data with it when > some work is done on a vacuum job. +many. But I can't guess the appropriate number of the entry of it, or suitable replacing policy on excesive number of vacuums. Although sane users won't run vacuum on so many backends. > In short, it seems to me that this patch needs a rework, and should be > returned with feedback. Other opinions? This is important feature for DBAs so I agree with you. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speedup twophase transactions
On Wed, Dec 9, 2015 at 10:44 AM, Stas Kelvich wrote: > Hello. > > While working with cluster stuff (DTM, tsDTM) we noted that postgres 2pc > transactions is approximately two times slower than an ordinary commit on > workload with fast transactions — few single-row updates and COMMIT or > PREPARE/COMMIT. Perf top showed that a lot of time is spent in kernel on > fopen/fclose, so it worth a try to reduce file operations with 2pc tx. > I've tested this through my testing harness which forces the database to go through endless runs of crash recovery and checks for consistency, and so far it has survived perfectly. ... > > Now results of benchmark are following (dual 6-core xeon server): > > Current master without 2PC: ~42 ktps > Current master with 2PC: ~22 ktps > Current master with 2PC: ~36 ktps Can you give the full command line? -j, -c, etc. > > Benchmark done with following script: > > \set naccounts 10 * :scale > \setrandom from_aid 1 :naccounts > \setrandom to_aid 1 :naccounts > \setrandom delta 1 100 > \set scale :scale+1 Why are you incrementing :scale ? I very rapidly reach a point where most of the updates are against tuples that don't exist, and then get integer overflow problems. > BEGIN; > UPDATE pgbench_accounts SET abalance = abalance - :delta WHERE aid = > :from_aid; > UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :to_aid; > PREPARE TRANSACTION ':client_id.:scale'; > COMMIT PREPARED ':client_id.:scale'; > Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.
On Thu, Dec 10, 2015 at 4:40 AM, Robert Haas wrote: > On Mon, Dec 7, 2015 at 2:41 AM, Amit Langote > wrote: >> Hm, I guess progress messages would not change that much over the course >> of a long-running command. How about we pass only the array(s) that we >> change and NULL for others: >> >> With the following prototype for pgstat_report_progress: >> >> void pgstat_report_progress(uint32 *uint32_param, int num_uint32_param, >>bool *message_param[], int num_message_param); >> >> If we just wanted to change, say scanned_heap_pages, then we report that >> using: >> >> uint32_param[1] = scanned_heap_pages; >> pgstat_report_progress(uint32_param, 3, NULL, 0); >> >> Also, pgstat_report_progress() should check each of its parameters for >> NULL before iterating over to copy. So in most reports over the course of >> a command, the message_param would be NULL and hence not copied. > > It's going to be *really* important that this facility provides a > lightweight way of updating progress, so I think this whole API is > badly designed. VACUUM, for example, is going to want a way to update > the individual counter for the number of pages it's scanned after > every page. It should not have to pass all of the other information > that is part of a complete report. It should just be able to say > pgstat_report_progress_update_counter(1, pages_scanned) or something > of this sort. Don't marshal all of the data and then make > pgstat_report_progress figure out what's changed. Provide a series of > narrow APIs where the caller tells you exactly what they want to > update, and you update only that. It's fine to have a > pgstat_report_progress() function to update everything at once, for > the use at the beginning of a command, but the incremental updates > within the command should do something lighter-weight. [first time looking really at the patch and catching up with this thread] Agreed. As far as I can guess from the code, those should be as followed to bloat pgstat message queue a minimum: + pgstat_report_activity_flag(ACTIVITY_IS_VACUUM); /* * Loop to process each selected relation. */ Defining a new routine for this purpose is a bit surprising. Cannot we just use pgstat_report_activity with a new BackendState STATE_INVACUUM or similar if we pursue the progress tracking approach? A couple of comments: - The relation OID should be reported and not its name. In case of a relation rename that would not be cool for tracking, and most users are surely going to join with other system tables using it. - The progress tracking facility adds a whole level of complexity for very little gain, and IMO this should *not* be part of PgBackendStatus because in most cases its data finishes wasted. We don't expect backends to run frequently such progress reports, do we? My opinion on the matter if that we should define a different collector data for vacuum, with something like PgStat_StatVacuumEntry, then have on top of it a couple of routines dedicated at feeding up data with it when some work is done on a vacuum job. In short, it seems to me that this patch needs a rework, and should be returned with feedback. Other opinions? -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics
On Wed, Dec 9, 2015 at 2:17 PM, Alexander Korotkov < a.korot...@postgrespro.ru> wrote: > On Tue, Dec 8, 2015 at 6:00 PM, Amit Kapila > wrote: > >> On Tue, Dec 8, 2015 at 3:56 PM, Alexander Korotkov < >> a.korot...@postgrespro.ru> wrote: >>> >>> Agree. This patch need to be carefully verified. Current experiments >>> just show that it is promising direction for improvement. I'll come with >>> better version of this patch. >>> >>> Also, after testing on large machines I have another observation to >>> share. For now, LWLock doesn't guarantee that exclusive lock would be ever >>> acquired (assuming each shared lock duration is finite). It because when >>> there is no exclusive lock, new shared locks aren't queued and LWLock state >>> is changed directly. Thus, process which tries to acquire exclusive lock >>> have to wait for gap in shared locks. >>> >> >> I think this has the potential to starve exclusive lockers in worst case. >> >> >>> But with high concurrency for shared lock that could happen very rare, >>> say never. >>> >>> We did see this on big Intel machine in practice. pgbench -S gets shared >>> ProcArrayLock very frequently. Since some number of connections is >>> achieved, new connections hangs on getting exclusive ProcArrayLock. I think >>> we could do some workaround for this problem. For instance, when exclusive >>> lock waiter have some timeout it could set some special bit which prevents >>> others to get new shared locks. >>> >>> >> I think timeout based solution would lead to giving priority to >> exclusive lock waiters (assume a case where each of exclusive >> lock waiter timesout one after another) and make shared lockers >> wait and a timer based solution might turn out to be costly for >> general cases where wait is not so long. >> > > Since all lwlock waiters are ordered in the queue, we can let only first > waiter to set this bit. > Thats okay, but still every time an Exclusive locker woke up, the threshold time for its wait might be already over and it will set the bit. In theory, that looks okay, but as compare to current algorithm it will make more shared lockers to be added into wait queue. > Anyway, once bit is set, shared lockers would be added to the queue. They > would get the lock in queue order. > > Ye thats right, but I think in general the solution to this problem should be don't let any Exclusive locker to starve and still allow as many shared lockers as possible. I think here it is important how we define starving, should it be based on time or something else? I find timer based solution somewhat less suitable, but may be it is okay, if there is no other better way. > Another way could be to >> check if the Exclusive locker needs to go for repeated wait for a >> couple of times, then we can set such a bit. >> > > I'm not sure what do you mean by repeated wait. Do you mean exclusive > locker was waked twice up by timeout? > I mean to say once the Exclusive locker is woken up, it again re-tries to acquire the lock as it does today, but if it finds that the number of retries is greater than certain threshold (let us say 10), then we sit the bit. > Because now, without timeout, exclusive locker wouldn't be waked up until > all shared locks are released. > > Does LWLockWakeup() work that way? I thought it works such that once an Exclusive locker is encountered in the wait queue, it just wakes that and won't try to wake any further waiters. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] pg_hba_lookup function to get all matching pg_hba.conf entries
On Thu, Dec 10, 2015 at 4:33 PM, Amit Kapila wrote: > On Thu, Dec 10, 2015 at 9:51 AM, Haribabu Kommi > wrote: >> >> On Thu, Dec 10, 2015 at 2:29 PM, Amit Kapila >> wrote: > >> > How about creating "hba parser context" and "ident parser context" >> > at the beginning of their respective functions and don't change >> > anything in tokenize_file()? >> >> The tokenize file cxt is deleted after a successful load of pg_hba.conf or >> pg_ident.conf files. we don't need this memory once the pg_hba.conf >> or pg_ident file is loaded, because of this reason, it is created as a >> separate context and deleted later. >> > > What about the error case? Yes, One error case is possible when the length of the string crosses the MAX_LINE size. If we allocate the tokenize file cxt inside CurrentMemoryContext (i.e MessageContext) instead of TopMemoryContext, it will automatically freed later in case if exists. Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.
On 2015/12/10 4:40, Robert Haas wrote: > It's going to be *really* important that this facility provides a > lightweight way of updating progress, so I think this whole API is > badly designed. VACUUM, for example, is going to want a way to update > the individual counter for the number of pages it's scanned after > every page. It should not have to pass all of the other information > that is part of a complete report. It should just be able to say > pgstat_report_progress_update_counter(1, pages_scanned) or something > of this sort. Don't marshal all of the data and then make > pgstat_report_progress figure out what's changed. Provide a series of > narrow APIs where the caller tells you exactly what they want to > update, and you update only that. It's fine to have a > pgstat_report_progress() function to update everything at once, for > the use at the beginning of a command, but the incremental updates > within the command should do something lighter-weight. How about something like the following: /* * index: in the array of uint32 counters in the beentry * counter: new value of the (index+1)th counter */ void pgstat_report_progress_update_counter(int index, uint32 counter); /* * msg: new value of (index+1)the message (with trailing null byte) */ void pgstat_report_progress_update_message(int index, const char *msg); Actually updating a counter or message would look like: pgstat_increment_changecount_before(beentry); // update the counter or message at index in beentry->st_progress_* pgstat_increment_changecount_after(beentry); Other interface functions which are called at the beginning: void pgstat_report_progress_set_command(int commandId); void pgstat_report_progress_set_command_target(const char *target_name); or void pgstat_report_progress_set_command_target(Oid target_oid); And then a SQL-level, void pgstat_reset_local_progress(); Which simply sets beentry->st_command to some invalid value which signals a progress view function to ignore this backend. Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
On Thu, Dec 10, 2015 at 6:46 AM, Alvaro Herrera wrote: > I've been giving RecoveryTest.pm a look. I wonder if we really need that > as a separate package. My first thought was that we could have another > class that inherits from PostgresNode (say RecoveryNode). But later it > occured to me that we could have the new functions just be part of > PostgresNode itself directly; so we would have some new PostgresNode > methods: > $node->enable_streaming > $node->enable_restoring > $node->enable_archiving Sure. > $node->wait (your RecoveryTest::wait_for_node; better name for this?) wait_for_access? > and some additional constructors: > make_master > make_stream_standby > make_archive_standby I have done that a little bit differently. Those are completely remove, then init() and init_from_backup() are extended with a new set of parameters to enable archive, streaming or restore on a node. Which gives the patch attached. -- Michael 20151210_recovery_suite.patch Description: binary/octet-stream -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_hba_lookup function to get all matching pg_hba.conf entries
On Thu, Dec 10, 2015 at 9:51 AM, Haribabu Kommi wrote: > > On Thu, Dec 10, 2015 at 2:29 PM, Amit Kapila wrote: > > How about creating "hba parser context" and "ident parser context" > > at the beginning of their respective functions and don't change > > anything in tokenize_file()? > > The tokenize file cxt is deleted after a successful load of pg_hba.conf or > pg_ident.conf files. we don't need this memory once the pg_hba.conf > or pg_ident file is loaded, because of this reason, it is created as a > separate context and deleted later. > What about the error case? With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] parallel joins, and better parallel explain
On Wed, Dec 9, 2015 at 11:51 PM, Robert Haas wrote: > > On Fri, Dec 4, 2015 at 3:07 AM, Amit Kapila wrote: > > > I think the problem is at Gather node, the number of buffers (read + hit) > > are greater than the number of pages in relation. The reason why it > > is doing so is that in Workers (ParallelQueryMain()), it starts the buffer > > usage accumulation before ExecutorStart() whereas in master backend > > it always calculate it for ExecutorRun() phase, on changing it to accumulate > > for ExecutorRun() phase above problem is fixed. Attached patch fixes the > > problem. > > Why is it a bad thing to capture the cost of doing ExecutorStart() in > the worker? I can see there's an argument that changing this would be > more consistent, but I'm not totally convinced. The overhead of > ExecutorStart() in the leader isn't attributable to any specific > worker, but the overhead of ExecutorStart() in the worker can fairly > be blamed on Gather, I think. > This boils down to the question why currently buffer usage or other similar stats (time for node execution) for a node doesn't include the cost for ExecutorStart(). I think the reason is that ExecutorStart() does some other miscellaneous works like accessing system tables or initialization of nodes which we might not even execute, so accumulating the cost for such work doesn't seems to be meaningful. Looking at references of InstrStartNode() which actually marks the beginning of buffer usage stats, it is clear that buffer usage stats are not counted for ExecutorStart() phase, so I think following the same for worker stats seems to be more accurate. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Passing initially_valid values instead of !skip_validation to StoreRelCheck() in AddRelationNewConstraints()
>On Thursday, 10 December 2015 10:13 AM, Amit Langote > wrote: >On 2015/12/10 13:38, Amit Langote wrote: >> >> Take a look at a similar code block in transformFKConstraints >> (parse_utilcmd.c: line 1935), or transformIndexConstraint >> (parse_utilcmd.c: line 1761). >Oops, forget the second one. No issue, first one make sense. Updated patch is attached. Thanks & Regards, Amul Sul transformCheckConstraints-function-to-overrid-not-valid_V2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: Implement failover on libpq connect level.
On Mon, 07 Dec 2015 15:26:33 -0500 Korry Douglas wrote: > The problem seems to be in PQconnectPoll() in the case for > CONNECTION_AUTH_OK, specifically this code: > >/* We can release the address list now. */ >pg_freeaddrinfo_all(conn->addrlist_family, conn->addrlist); >conn->addrlist = NULL; >conn->addr_cur = NULL; > That frees up the list of alternative host addresses. The state > machine then progresses to CONNECTION_CHECK_RO (which invokes > pg_is_in_recovery()), then CONNECTION_CHECK_RW (waiting for the Thank you for pointing to this problem. I've overlooked it. Probably I should improve my testing scenario. I', attaching new version of the patch, which, hopefully, handles address list freeing correctly. -- Victor Wagner diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml index 9c0e4c8..162bd4f 100644 --- a/doc/src/sgml/libpq.sgml +++ b/doc/src/sgml/libpq.sgml @@ -792,7 +792,7 @@ host=localhost port=5432 dbname=mydb connect_timeout=10 The general form for a connection URI is: -postgresql://[user[:password]@][netloc][:port][/dbname][?param1=value1&...] +postgresql://[user[:password]@][netloc][:port][,netloc[:port]...][/dbname][?param1=value1&...] @@ -809,6 +809,7 @@ postgresql://localhost/mydb postgresql://user@localhost postgresql://user:secret@localhost postgresql://other@localhost/otherdb?connect_timeout=10&application_name=myapp +postgresql://node1,node2:5433,node3:4432,node4/mydb?hostorder=random&readonly=1 Components of the hierarchical part of the URI can also be given as parameters. For example: @@ -831,7 +832,9 @@ postgresql:///mydb?host=localhost&port=5433 For improved compatibility with JDBC connection URIs, instances of parameter ssl=true are translated into -sslmode=require. +sslmode=require and +loadBalanceHosts=true into +hostorder=random. @@ -841,6 +844,10 @@ postgresql:///mydb?host=localhost&port=5433 postgresql://[2001:db8::1234]/database + + There can be serveral host specifications, optionally accompanied + with port, separated by comma. + The host component is interpreted as described for the parameter PostgreSQL was built). On machines without Unix-domain sockets, the default is to connect to localhost. + + There can be more than one host parameter in + the connect string. In this case these hosts would be considered + alternate entries into same database and if connect to first one + fails, library would try to connect second etc. This can be used + for high availability cluster or for load balancing. See +parameter. + + + Network host name can be accompanied with port number, separated by + colon. If so, this port number is used only when connected to + this host. If there is no port number, port specified in the +parameter would be used. + @@ -942,8 +963,44 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname - + + + hostorder + + + Specifies how to choose host from list of alternate hosts, + specified in the parameter. + + + If value of this argument is sequential (the + default) library connects to the hosts in order they specified, + and tries to connect second one only if connection to the first + fails. + + + If value is random host to connect is randomly + picked from the list. It allows to balance load between several + cluster nodes. However, currently PostgreSQL doesn't support + multimaster clusters. So, without use of third-party products, + only read-only connections can take advantage from the + load-balancing. See + + + + + readonly + + + If this parameter is 0 (the default), upon successful connection + library checks if host is in recovery state, and if it is so, + tries next host in the connect string. If this parameter is + non-zero, connection to warm standby nodes are considered + successful. + + + + port @@ -985,7 +1042,6 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname - connect_timeout @@ -996,7 +1052,27 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname - + + falover_timeout + + + Maximum time to cycilically retry all the hosts in commect string. + (as decimal integer number of seconds). If not specified, then + hosts are tried just once. + + + If we have replicating cluster, and master node fails, it might + take some time to promote one of standby nodes to the new master. + So clients which notice that connect to the master fails, can + already give up attempt to reestablish
Re: [HACKERS] Passing initially_valid values instead of !skip_validation to StoreRelCheck() in AddRelationNewConstraints()
On 2015/12/10 13:38, Amit Langote wrote: > > Take a look at a similar code block in transformFKConstraints > (parse_utilcmd.c: line 1935), or transformIndexConstraint > (parse_utilcmd.c: line 1761). Oops, forget the second one. Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Passing initially_valid values instead of !skip_validation to StoreRelCheck() in AddRelationNewConstraints()
On 2015/12/10 13:12, amul sul wrote: >> On Thursday, 10 December 2015 8:22 AM, Amit Langote >> wrote: > > >> You forgot to put braces around the if block. > > > Does this really required? If nothing else, for consistency with surrounding code. Take a look at a similar code block in transformFKConstraints (parse_utilcmd.c: line 1935), or transformIndexConstraint (parse_utilcmd.c: line 1761). Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_hba_lookup function to get all matching pg_hba.conf entries
On Thu, Dec 10, 2015 at 2:29 PM, Amit Kapila wrote: > On Thu, Dec 10, 2015 at 6:46 AM, Haribabu Kommi > wrote: >> >> On Thu, Dec 10, 2015 at 4:22 AM, Alvaro Herrera >> wrote: >> > Haribabu Kommi wrote: >> > >> >> Reverting the context release patch is already provided in the first >> >> mail of this >> >> thread [1]. Forgot to mention about the same in further mails. >> >> >> >> Here I attached the same patch. This patch needs to be applied first >> >> before >> >> pg_hba_lookup patch. I tested it in windows version also. >> > >> > So if you change the file and reload repeatedly, we leak all the memory >> > allocated for HBA lines in TopMemoryContext? This doesn't sound great. >> > Perhaps we need a dedicated context which can be reset at will so that >> > it can be refilled with the right info when we reload the file. >> >> No. There is no leaks associated with pg_hba.conf parsing. we already have >> a memory context called "hba parser context" allocated from Postmaster >> context. The "revert_hba_context_release_in_backend" patch changes it to >> TopMemoryContext. The memory required for parsing and storing parsed >> hba lines is obtained from this context. >> > > tokenize_file() is called before creation of hba parser context, so below > change would be problem. > > *** 386,392 tokenize_file(const char *filename, FILE *file, > > MemoryContext linecxt; > > MemoryContext oldcxt; > > > > ! linecxt = AllocSetContextCreate(CurrentMemoryContext, > > "tokenize file cxt", > > ALLOCSET_DEFAULT_MINSIZE, > > ALLOCSET_DEFAULT_INITSIZE, > > --- 386,392 > > MemoryContext linecxt; > > MemoryContext oldcxt; > > > > ! linecxt = AllocSetContextCreate(TopMemoryContext, > > "tokenize file cxt", > > ALLOCSET_DEFAULT_MINSIZE, > > ALLOCSET_DEFAULT_INITSIZE, > > > How about creating "hba parser context" and "ident parser context" > at the beginning of their respective functions and don't change > anything in tokenize_file()? The tokenize file cxt is deleted after a successful load of pg_hba.conf or pg_ident.conf files. we don't need this memory once the pg_hba.conf or pg_ident file is loaded, because of this reason, it is created as a separate context and deleted later. Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Passing initially_valid values instead of !skip_validation to StoreRelCheck() in AddRelationNewConstraints()
>On Thursday, 10 December 2015 8:22 AM, Amit Langote > wrote: >You forgot to put braces around the if block. Does this really required? Regards, Amul Sul -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Error with index on unlogged table
On Wed, Dec 9, 2015 at 9:07 PM, Andres Freund wrote: > On 2015-12-09 21:03:47 +0900, Michael Paquier wrote: >> Oh, OK. I didn't read though your lines correctly. So you basically >> mean that we would look at the init files that are on disk, and check >> if they are empty. If they are, we simply use XLogReadBufferExtended >> to fetch the INIT_FORKNUM content and fill in another buffer for the >> MAIN_FORKNUM. More or less right? > > We would not just do so if they're empty, we would just generally copy the > file > via shared buffers, instead of copy_file(). But we'd get the file size > from the filesystem (which is fine, we make sure it is correct during > replay). So, this suggestion is basically implementing the reverse operation of GetRelationPath() to be able to rebuild a RelFileNode from scratch and then look at the shared buffers needed. Isn't it a bit brittle for back-branches? RelFileNode stuff is available easily through records when replaying individually FPIs, but not really in this code path. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_hba_lookup function to get all matching pg_hba.conf entries
On Wed, Dec 9, 2015 at 2:35 PM, Haribabu Kommi wrote: > > > Reverting the context release patch is already provided in the first > mail of this > thread [1]. Forgot to mention about the same in further mails. > Thanks, that is helpful. However I think it is better if you can always keep the link of related patches at end of mail. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] pg_hba_lookup function to get all matching pg_hba.conf entries
On Thu, Dec 10, 2015 at 6:46 AM, Haribabu Kommi wrote: > > On Thu, Dec 10, 2015 at 4:22 AM, Alvaro Herrera > wrote: > > Haribabu Kommi wrote: > > > >> Reverting the context release patch is already provided in the first > >> mail of this > >> thread [1]. Forgot to mention about the same in further mails. > >> > >> Here I attached the same patch. This patch needs to be applied first before > >> pg_hba_lookup patch. I tested it in windows version also. > > > > So if you change the file and reload repeatedly, we leak all the memory > > allocated for HBA lines in TopMemoryContext? This doesn't sound great. > > Perhaps we need a dedicated context which can be reset at will so that > > it can be refilled with the right info when we reload the file. > > No. There is no leaks associated with pg_hba.conf parsing. we already have > a memory context called "hba parser context" allocated from Postmaster > context. The "revert_hba_context_release_in_backend" patch changes it to > TopMemoryContext. The memory required for parsing and storing parsed > hba lines is obtained from this context. > tokenize_file() is called before creation of hba parser context, so below change would be problem. *** 386,392 tokenize_file(const char *filename, FILE *file, MemoryContext linecxt; MemoryContext oldcxt; ! linecxt = AllocSetContextCreate(CurrentMemoryContext, "tokenize file cxt", ALLOCSET_DEFAULT_MINSIZE, ALLOCSET_DEFAULT_INITSIZE, --- 386,392 MemoryContext linecxt; MemoryContext oldcxt; ! linecxt = AllocSetContextCreate(TopMemoryContext, "tokenize file cxt", ALLOCSET_DEFAULT_MINSIZE, ALLOCSET_DEFAULT_INITSIZE, How about creating "hba parser context" and "ident parser context" at the beginning of their respective functions and don't change anything in tokenize_file()? With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] psql: add \pset true/false
On Wed, Dec 9, 2015 at 8:50 PM, Michael Paquier wrote: > On Tue, Dec 8, 2015 at 8:51 PM, Michael Paquier > wrote: >> On Tue, Dec 8, 2015 at 7:18 PM, Kyotaro HORIGUCHI >> wrote: >>> At Sat, 5 Dec 2015 21:05:29 +0900, Michael Paquier >>> wrote in >>> >>> > Regarding the patch, I >>> > would tend to think that we should just reject it and try to cruft >>> > something that could be more pluggable if there is really a need. >>> > Thoughts? >>> >>> Honestly saying, I feel similarly with you:p I personally will do >>> something like the following for the original objective. >> >> Are there other opinions? The -1 team is in majority at the end of this >> thread.. > > So, marking the patch as rejected? Any objections? Done so. Alea jacta est, as one guy 2000 years ago would have said. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
On Mon, Dec 07, 2015 at 02:34:39PM +0900, Michael Paquier wrote: > On Mon, Dec 7, 2015 at 12:06 PM, Noah Misch wrote: > > On Wed, Dec 02, 2015 at 06:59:09PM -0300, Alvaro Herrera wrote: > >> +sub DESTROY > >> +{ > >> + my $self = shift; > >> + return if not defined $self->{_pid}; > >> + print "# signalling QUIT to $self->{_pid}\n"; > >> + kill 'QUIT', $self->{_pid}; > > > > On Windows, that kill() is the wrong thing. I suspect "pg_ctl kill" will be > > the right thing, but that warrants specific testing. > > I don't directly see any limitation with the use of kill on Windows.. > http://perldoc.perl.org/functions/kill.html > But indeed using directly pg_ctl kill seems like a better fit for the > PG infrastructure. >From http://perldoc.perl.org/perlwin32.html, "Using signals under this port should currently be considered unsupported." Windows applications cannot handle SIGQUIT: https://msdn.microsoft.com/en-us/library/xdkz3x12.aspx. The PostgreSQL backend does not generate or expect Windows signals; see its signal.c emulation facility. > > Postmaster log file names became less informative. Before the commit: > > Should nodes get a name, so we instead see master_57834.log? > > I am not sure that this is necessary. In general, you'd need to cross-reference the main log file to determine which postmaster log corresponds to which action within the test. I did plenty of "grep $PATTERN src/bin/pg_rewind/tmp_check/log/master.log" while debugging that test. I'd like to be able to use /*master*.log, not rely on timestamps or on scraping regress_log_002_databases to determine which logs are master logs. Feel free to skip this point if I'm the only one minding, though. > There is definitely a limitation > regarding the fact that log files of nodes get overwritten after each > test, hence I would tend with just appending the test name in front of > the node_* files similarly to the other files. They got appended, not overwritten. I like how you changed that to not happen, but it doesn't address what I was reporting. nm -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Passing initially_valid values instead of !skip_validation to StoreRelCheck() in AddRelationNewConstraints()
On 2015/12/09 18:07, amul sul wrote: >> On Wednesday, 9 December 2015 12:55 PM, Amit Langote >> wrote: > >> Thoughts? > > Wondering, have you notice failed regression tests? Here is the updated patch. > I have tried with new transformCheckConstraints() function & there will be > small fix in gram.y. > > > Have look into attached patch & please share your thoughts and/or suggestions. The transformCheckConstraints approach may be better after all. By the way, > @@ -1915,6 +1922,32 @@ transformIndexConstraint(Constraint *constraint, > CreateStmtContext *cxt) ... > + if (skipValidation) > + foreach(ckclist, cxt->ckconstraints) > + { > + Constraint *constraint = (Constraint *) lfirst(ckclist); > + > + constraint->skip_validation = true; > + constraint->initially_valid = true; > + } You forgot to put braces around the if block. Thanks, Amit diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c index 7d7d062..04c4f8f 100644 --- a/src/backend/catalog/heap.c +++ b/src/backend/catalog/heap.c @@ -2349,7 +2349,7 @@ AddRelationNewConstraints(Relation rel, * OK, store it. */ constrOid = - StoreRelCheck(rel, ccname, expr, !cdef->skip_validation, is_local, + StoreRelCheck(rel, ccname, expr, cdef->initially_valid, is_local, is_local ? 0 : 1, cdef->is_no_inherit, is_internal); numchecks++; diff --git a/src/backend/parser/parse_utilcmd.c b/src/backend/parser/parse_utilcmd.c index 344a40c..f0c3e76 100644 --- a/src/backend/parser/parse_utilcmd.c +++ b/src/backend/parser/parse_utilcmd.c @@ -562,6 +562,11 @@ transformColumnDefinition(CreateStmtContext *cxt, ColumnDef *column) break; case CONSTR_CHECK: +/* + * When being added as part of a column definition, the + * following always holds. + */ +constraint->initially_valid = true; cxt->ckconstraints = lappend(cxt->ckconstraints, constraint); break; @@ -687,6 +692,10 @@ transformTableConstraint(CreateStmtContext *cxt, Constraint *constraint) break; case CONSTR_CHECK: + /* Is this better done in a transformCheckConstraint? */ + if (!cxt->isalter) +constraint->initially_valid = true; + cxt->ckconstraints = lappend(cxt->ckconstraints, constraint); break; @@ -935,6 +944,7 @@ transformTableLikeClause(CreateStmtContext *cxt, TableLikeClause *table_like_cla n->conname = pstrdup(ccname); n->raw_expr = NULL; n->cooked_expr = nodeToString(ccbin_node); + n->initially_valid = true; cxt->ckconstraints = lappend(cxt->ckconstraints, n); /* Copy comment on constraint */ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rework the way multixact truncations work
On Wed, Dec 09, 2015 at 11:08:32AM -0500, Robert Haas wrote: > On Wed, Dec 9, 2015 at 9:43 AM, Noah Misch wrote: > > I hope those who have not already read commit 4f627f8 will not waste time > > reading it. They should instead ignore multixact changes from commit > > 4f627f8 > > through its revert. The 2015-09-26 commits have not appeared in a supported > > release, and no other work has built upon them. They have no tenure. (I am > > glad you talked the author out of back-patching; otherwise, 9.4.5 and 9.3.10 > > would have introduced a data loss bug.) Nobody reported a single defect > > before my review overturned half the patch. A revert will indeed impose on > > those who invested time to understand commit 4f627f8, but the silence about > > its defects suggests the people understanding it number either zero or one. > > Even as its author and reviewers, you would do better to set aside what you > > thought you knew about this code. > > I just don't find this a realistic model of how people use the git > log. Maybe you use it this way; I don't. I don't *want* git blame to > make it seem as if 4f627f8 is not part of the history. For better or > worse, it is. I would like to understand how you use git, then. What's one of your models of using "git log" and/or "git blame" in which you foresee the revert making history less clear, not more clear? By the way, it occurs to me that I should also make pg_upgrade blacklist the range of catversions that might have data loss. No sense in putting ourselves in the position of asking whether data files of a 9.9.3 cluster spent time in a 9.5beta2 cluster. > Ripping it out and replacing it monolithically will not > change that; it will only make the detailed history harder to > reconstruct, and I *will* want to reconstruct it. What's something that might happen six months from now and lead you to inspect master or 9.5 multixact.c between 4f627f8 and its revert? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_hba_lookup function to get all matching pg_hba.conf entries
On Thu, Dec 10, 2015 at 4:22 AM, Alvaro Herrera wrote: > Haribabu Kommi wrote: > >> Reverting the context release patch is already provided in the first >> mail of this >> thread [1]. Forgot to mention about the same in further mails. >> >> Here I attached the same patch. This patch needs to be applied first before >> pg_hba_lookup patch. I tested it in windows version also. > > So if you change the file and reload repeatedly, we leak all the memory > allocated for HBA lines in TopMemoryContext? This doesn't sound great. > Perhaps we need a dedicated context which can be reset at will so that > it can be refilled with the right info when we reload the file. No. There is no leaks associated with pg_hba.conf parsing. we already have a memory context called "hba parser context" allocated from Postmaster context. The "revert_hba_context_release_in_backend" patch changes it to TopMemoryContext. The memory required for parsing and storing parsed hba lines is obtained from this context. Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Should TIDs be typbyval = FLOAT8PASSBYVAL to speed up CREATE INDEX CONCURRENTLY?
On Tue, Nov 17, 2015 at 7:33 PM, Corey Huinker wrote: > I'm willing, but I'm too new to the codebase to be an effective reviewer > (without guidance). The one thing I can offer in the mean time is this: my > company/client nearly always has a few spare AWS machines on the largish > side where I can compile uncommitted patches and benchmark stuff for y'all. I think that this particular patch is close to being a slam-dunk, so I don't think it's particularly needed here. But thanks. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Given a view relation OID, how to construct a Query?
On Wed, Dec 9, 2015 at 5:07 PM Tom Lane wrote: > > FWIW, it's exposed in 9.4 and up. But in older branches you could > probably just copy it, it's not that big. > That's good to know, thanks. I did copy it and it's almost 3x faster than going through SPI. Thanks again for the pointer. eric
Re: [HACKERS] Given a view relation OID, how to construct a Query?
Eric Ridge writes: > On Wed, Dec 9, 2015 at 4:04 PM Tom Lane wrote: >> Open the relation and use get_view_query(), perhaps. > I figured there was something simple, but I couldn't find it. Thanks! > Sadly, it's static. FWIW, it's exposed in 9.4 and up. But in older branches you could probably just copy it, it's not that big. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed to generate plan on lateral subqueries
Merlin Moncure writes: > Aside from the functional issues, could your changes result in > performance regressions? (if so, I'd argue not to backpatch unless > there were cases that returned bad data as opposed to spurious > errors). I can't say that I think planner failures on valid queries is something that's optional to fix. However, I believe that this will typically not change the selected plan in cases where the planner didn't fail before. I did notice one change in an existing regression test, where the planner pushed a qual clause further down in the plan than it did before; but that seems like a better plan anyway. (The reason that happened is that the changes to enlarge the minimum parameterization of some base rels result in choosing to push qualifiers further down, since a qual clause will be evaluated at the lowest plan level that the selected parameterization allows.) It's a little bit harder to gauge the impact on planner speed. The transitive closure calculation could be expensive in a query with many lateral references, but that doesn't seem likely to be common; and anyway we'll buy back some of that cost due to simpler tests later. I'm optimistic that we'll come out ahead in HEAD/9.5 after the removal of LateralJoinInfo setup. It might be roughly a wash in the back branches. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Given a view relation OID, how to construct a Query?
On Wed, Dec 9, 2015 at 4:04 PM Tom Lane wrote: > Eric Ridge writes: > > I'm doing some extension development (in C) and have a situation where I > > need to examine the target list of a view, but all I have is the view's > oid. > > Open the relation and use get_view_query(), perhaps. I figured there was something simple, but I couldn't find it. Thanks! Sadly, it's static. eric
Re: [HACKERS] Speedup twophase transactions
On Thu, Dec 10, 2015 at 3:44 AM, Stas Kelvich wrote: > Most of that ideas was already mentioned in 2009 thread by Michael Paquier > http://www.postgresql.org/message-id/c64c5f8b0908062031k3ff48428j824a9a46f2818...@mail.gmail.com > where he suggested to store 2pc data in shared memory. > At that time patch was declined because no significant speedup were observed. > Now I see performance improvements by my patch at about 60%. Probably old > benchmark overall tps was lower and it was harder to hit filesystem > fopen/fclose limits. Glad to see this patch is given a second life 6 years later. > Now results of benchmark are following (dual 6-core xeon server): > > Current master without 2PC: ~42 ktps > Current master with 2PC: ~22 ktps > Current master with 2PC: ~36 ktps That's nice. +XLogRecPtrprepare_xlogptr;/* XLOG offset of prepare record start + * or NULL if twophase data moved to file + * after checkpoint. + */ This has better be InvalidXLogRecPtr if unused. +if (gxact->prepare_lsn) +{ +XlogReadTwoPhaseData(gxact->prepare_xlogptr, &buf, NULL); +} Perhaps you mean prepare_xlogptr here? -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed to generate plan on lateral subqueries
On Wed, Dec 9, 2015 at 4:18 PM, Tom Lane wrote: > David Fetter writes: >> On Tue, Dec 08, 2015 at 12:13:41PM -0500, Tom Lane wrote: >>> Hm. At least in the first of these cases, the problem is that the >>> code I committed yesterday doesn't account for indirect lateral >>> dependencies. That is, if S1 depends on S2 which depends on the >>> inner side of an outer join, it now knows not to join S2 directly to >>> the outer side of the outer join, but it doesn't realize that the >>> same must apply to S1. >>> >>> Maybe we should redefine lateral_relids as the transitive closure of >>> a rel's lateral dependencies? Not sure. > >> That seems like it would fix the problem once and for good. Is there >> any reason to believe that the lateral dependencies could go in a >> loop, i.e. is there a reason to put in checks for same in the >> transitive closure construction? > > It should not be possible to have a loop, since rels can only have lateral > references to rels that appeared syntactically before them. But an Assert > about it doesn't seem a bad thing. > > After working on this for awhile, I've found that indeed converting > lateral_relids to a transitive closure makes things better. But there was > another bug (or two other bugs, depending on how you count) exposed by > Andreas' examples. I was right after all to suspect that the "hazardous > PHV" condition needs to be accounted for by join_is_legal; as things > stood, join_is_legal might allow a join for which every possible path > would be rejected by check_hazardous_phv. More, if we were insisting that > a join PHV be calculated before we could pass it to some other relation, > we didn't actually do anything to ensure that that join would get built. > Given an appropriate set of conditions, the clauseless-join heuristics > could decide that that join wasn't interesting and never build it, again > possibly leading to planner failure. So join_is_legal's cousins > have_join_order_restriction and has_join_restriction also need to know > about this issue. > > (This is a bit of a mess; I'm beginning to wonder if we shouldn't bite > the bullet and teach the executor how to handle non-Var NestLoopParams, > so that the planner doesn't have to work around that. But I'd rather > not back-patch such a change.) > > I also noticed that lateral_referencers should be a transitive closure > as well. I think that oversight doesn't lead to any observable bug, > but it would lead to constructing index paths that cannot be used, so > fixing it should save some planner cycles. > > So that leads to the attached patch, which I think is good as-is for > the back branches. In this state of the code, the LateralJoinInfo > list is darn near useless: the only thing we use it for is detecting > whether two relations have a direct (not transitive closure) lateral > reference. I'm strongly tempted to replace that logic by keeping a > copy of the pre-closure lateral_relids sets, and then we could drop > LateralJoinInfo altogether, which would make create_lateral_join_info > quite a bit shorter and faster. That could go into HEAD/9.5, but > I'm afraid to back-patch it further than 9.5 for fear that third-party > code somewhere might be relying on the LateralJoinInfo data structure. Aside from the functional issues, could your changes result in performance regressions? (if so, I'd argue not to backpatch unless there were cases that returned bad data as opposed to spurious errors). merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Given a view relation OID, how to construct a Query?
Eric Ridge writes: > I'm doing some extension development (in C) and have a situation where I > need to examine the target list of a view, but all I have is the view's oid. Open the relation and use get_view_query(), perhaps. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Making tab-complete.c easier to maintain
On Thu, Dec 10, 2015 at 12:49 AM, Greg Stark wrote: > On Wed, Dec 9, 2015 at 2:27 PM, David Fetter wrote: >> Agreed that the "whole new language" aspect seems like way too big a >> hammer, given what it actually does. > > Which would be easier to update when things change? Regarding that both patches are equal compared to the current methods with strcmp. > Which would be possible to automatically generate from gram.y? None of those patches take this approach. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Given a view relation OID, how to construct a Query?
I'm doing some extension development (in C) and have a situation where I need to examine the target list of a view, but all I have is the view's oid. An approach that works is (pseudocode): SPI_connect(); "SELECT ev_action FROM pg_catalog.pg_rewrite WHERE rulename = '_RETURN' and ev_class=?oid"; Query *query = linitial(stringToNode(ev_action)); ... SPI_finish(); I backed into this by tracing through pg_getviewdef(). Is there a more direct way to do this without going through SPI? I also looked at using postgres.c#pg_analyze_and_rewrite() against a query like "SELECT * FROM viewname" but the target list of the actual query wasn't what I was expecting (individual entry tags don't match those of the SPI approach above). Thanks for your time! eric
Re: [HACKERS] Re: Reusing abbreviated keys during second pass of ordered [set] aggregates
On Wed, Dec 9, 2015 at 2:15 PM, Peter Geoghegan wrote: > I think that you're missing that patch 0001 formally forbids > abbreviated keys that are pass-by-value Sorry. I do of course mean it forbids abbreviated keys that are *not* pass-by-value (that are pass-by-reference). -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed to generate plan on lateral subqueries
David Fetter writes: > On Tue, Dec 08, 2015 at 12:13:41PM -0500, Tom Lane wrote: >> Hm. At least in the first of these cases, the problem is that the >> code I committed yesterday doesn't account for indirect lateral >> dependencies. That is, if S1 depends on S2 which depends on the >> inner side of an outer join, it now knows not to join S2 directly to >> the outer side of the outer join, but it doesn't realize that the >> same must apply to S1. >> >> Maybe we should redefine lateral_relids as the transitive closure of >> a rel's lateral dependencies? Not sure. > That seems like it would fix the problem once and for good. Is there > any reason to believe that the lateral dependencies could go in a > loop, i.e. is there a reason to put in checks for same in the > transitive closure construction? It should not be possible to have a loop, since rels can only have lateral references to rels that appeared syntactically before them. But an Assert about it doesn't seem a bad thing. After working on this for awhile, I've found that indeed converting lateral_relids to a transitive closure makes things better. But there was another bug (or two other bugs, depending on how you count) exposed by Andreas' examples. I was right after all to suspect that the "hazardous PHV" condition needs to be accounted for by join_is_legal; as things stood, join_is_legal might allow a join for which every possible path would be rejected by check_hazardous_phv. More, if we were insisting that a join PHV be calculated before we could pass it to some other relation, we didn't actually do anything to ensure that that join would get built. Given an appropriate set of conditions, the clauseless-join heuristics could decide that that join wasn't interesting and never build it, again possibly leading to planner failure. So join_is_legal's cousins have_join_order_restriction and has_join_restriction also need to know about this issue. (This is a bit of a mess; I'm beginning to wonder if we shouldn't bite the bullet and teach the executor how to handle non-Var NestLoopParams, so that the planner doesn't have to work around that. But I'd rather not back-patch such a change.) I also noticed that lateral_referencers should be a transitive closure as well. I think that oversight doesn't lead to any observable bug, but it would lead to constructing index paths that cannot be used, so fixing it should save some planner cycles. So that leads to the attached patch, which I think is good as-is for the back branches. In this state of the code, the LateralJoinInfo list is darn near useless: the only thing we use it for is detecting whether two relations have a direct (not transitive closure) lateral reference. I'm strongly tempted to replace that logic by keeping a copy of the pre-closure lateral_relids sets, and then we could drop LateralJoinInfo altogether, which would make create_lateral_join_info quite a bit shorter and faster. That could go into HEAD/9.5, but I'm afraid to back-patch it further than 9.5 for fear that third-party code somewhere might be relying on the LateralJoinInfo data structure. Comments? regards, tom lane diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c index 0f04033..53d8fdd 100644 *** a/src/backend/optimizer/path/joinpath.c --- b/src/backend/optimizer/path/joinpath.c *** allow_star_schema_join(PlannerInfo *root *** 255,308 } /* - * There's a pitfall for creating parameterized nestloops: suppose the inner - * rel (call it A) has a parameter that is a PlaceHolderVar, and that PHV's - * minimum eval_at set includes the outer rel (B) and some third rel (C). - * We might think we could create a B/A nestloop join that's parameterized by - * C. But we would end up with a plan in which the PHV's expression has to be - * evaluated as a nestloop parameter at the B/A join; and the executor is only - * set up to handle simple Vars as NestLoopParams. Rather than add complexity - * and overhead to the executor for such corner cases, it seems better to - * forbid the join. (Note that existence of such a PHV probably means there - * is a join order constraint that will cause us to consider joining B and C - * directly; so we can still make use of A's parameterized path with B+C.) - * So we check whether any PHVs used in the query could pose such a hazard. - * We don't have any simple way of checking whether a risky PHV would actually - * be used in the inner plan, and the case is so unusual that it doesn't seem - * worth working very hard on it. - * - * This case can occur whether or not the join's remaining parameterization - * overlaps param_source_rels, so we have to check for it separately from - * allow_star_schema_join, even though it looks much like a star-schema case. - */ - static inline bool - check_hazardous_phv(PlannerInfo *root, - Path *outer_path, - Path *inner_path) - { - Relids i
Re: [HACKERS] Re: Reusing abbreviated keys during second pass of ordered [set] aggregates
On Wed, Dec 9, 2015 at 11:31 AM, Robert Haas wrote: > I find the references to a "void" representation in this patch to be > completely opaque. I see that there are some such references in > tuplesort.c already, and most likely they were put there by commits > that I did, so I guess I have nobody but myself to blame, but I don't > know what this means, and I don't think we should let this terminology > proliferate. > > My understanding is that the "void" representation is intended to > whatever Datum we originally got, which might be a pointer. Why not > just say that instead of referring to it this way? That isn't what is intended. "void" is the state that macros like index_getattr() leave NULL leading attributes (that go in the SortTuple.datum1 field) in. However, the function tuplesort_putdatum() requires callers to initialize their Datum to 0 now, which is new. A "void" representation is a would-be NULL pointer in the case of pass-by-value types, and a NULL pointer for pass-by-reference types. > My understanding is also that it's OK if the abbreviated key stays the > same even though the value has changed, but that the reverse could > cause queries to return wrong answers. The first part of that > justifies why this is safe when no abbreviation is available: we'll > return an abbreviated value of 0 for everything, but that's fine. > However, using the original Datum (which might be a pointer) seems > unsafe, because two binary-identical values could be stored at > different addresses and thus have different pointer representations. > > I'm probably missing something here, so clue me in... I think that you're missing that patch 0001 formally forbids abbreviated keys that are pass-by-value, by revising the contract (this is proposed for backpatch to 9.5 -- only comments are changed). This is already something that is all but forbidden, although the datum case does tacitly acknowledge the possibility by not allowing abbreviation to work with the pass-by-value-and-yet-abbreviated case. I think that this revision is also useful for putting abbreviated keys in indexes, something that may happen yet. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speedup twophase transactions
Thanks, Kevin. > I assume that last one should have been *Patched master with 2PC”? Yes, this list should look like this: Current master without 2PC: ~42 ktps Current master with 2PC: ~22 ktps Patched master with 2PC: ~36 ktps And created CommitFest entry for this patch. -- Stas Kelvich Postgres Professional: http://www.postgrespro.com The Russian Postgres Company > On 10 Dec 2015, at 00:37, Kevin Grittner wrote: > > On Wed, Dec 9, 2015 at 12:44 PM, Stas Kelvich > wrote: > >> Now 2PC in postgres does following: >> * on prepare 2pc data (subxacts, commitrels, abortrels, invalmsgs) saved to >> xlog and to file, but file not is not fsynced >> * on commit backend reads data from file >> * if checkpoint occurs before commit, then files are fsynced during >> checkpoint >> * if case of crash replay will move data from xlog to files >> >> In this patch I’ve changed this procedures to following: >> * on prepare backend writes data only to xlog and store pointer to the start >> of the xlog record >> * if commit occurs before checkpoint then backend reads data from xlog by >> this pointer >> * on checkpoint 2pc data copied to files and fsynced >> * if commit happens after checkpoint then backend reads files >> * in case of crash replay will move data from xlog to files (as it was >> before patch) > > That sounds like a very good plan to me. > >> Now results of benchmark are following (dual 6-core xeon server): >> >> Current master without 2PC: ~42 ktps >> Current master with 2PC: ~22 ktps >> Current master with 2PC: ~36 ktps > > I assume that last one should have been *Patched master with 2PC"? > > Please add this to the January CommitFest. > > -- > Kevin Grittner > EDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] mdnblocks() sabotages error checking in _mdfd_getseg()
Robert Haas writes: > The comment in mdnblocks.c says this: > * Because we pass O_CREAT, we will create the > next segment (with > * zero length) immediately, if the last > segment is of length > * RELSEG_SIZE. While perhaps not strictly > necessary, this keeps > * the logic simple. > I don't really see how this "keeps the logic simple". My (vague) recollection is that this is defending some assumptions elsewhere in md.c. You sure you haven't broken anything with this? Relation extension across a segment boundary, perhaps? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Redundant sentence within INSERT documentation page (exclusion constraints)
On Wed, Dec 9, 2015 at 11:13 AM, Robert Haas wrote: > I don't know nearly as much about ON CONFLICT DO UPDATE as Andres, but > even I can spot a redundancy when somebody shoves the evidence right > under my nose. So, committed and back-patched to 9.5. Thanks. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] mdnblocks() sabotages error checking in _mdfd_getseg()
This is per a report by an EnterpriseDB customer and a bunch of off-list analysis by Kevin Grittner and Rahila Syed. Suppose you have a large relation with OID 123456. There are segment files 123456, 123456.1, and 123456.2. Due to some kind of operating system malfeasance, 123456.1 disappears; you are officially in trouble. Now, a funny thing happens. The next time you call mdnblocks() on this relation, which will probably happen pretty quickly since every sequential scan does one, it will see that 123456 is a complete segment and it will create an empty 123456.1. It and any future mdnblocks() calls will report that the length of the relation is equal to the length of one full segment, because they don't check for the next segment unless the current segment is completely full. Now, if subsequent to this an index scan happens to sweep through and try to fetch a block in 123456.2, it will work! This happens because _mdfd_getseg() doesn't care about the length of the segments; it only cares whether or not they exist. If 123456.1 were actually missing, then we'd never test whether 123456.2 exists and we'd get an error. But because mdnblocks() created 123456.1, _mdfd_getseg() is now quite happy to fetch blocks in 123456.2; it considers the empty 123456.1 file to be a sufficient reason to look for 123456.2, and seeing that file, and finding the block it wants therein, it happily returns that block, blithely ignoring the fact that it passed over a completely .1 segment before returning a block from .2. This is maybe not the smartest thing ever. The comment in mdnblocks.c says this: * Because we pass O_CREAT, we will create the next segment (with * zero length) immediately, if the last segment is of length * RELSEG_SIZE. While perhaps not strictly necessary, this keeps * the logic simple. I don't really see how this "keeps the logic simple". What it does is allow sequential scans and index scans to have two completely different notions of how many accessible blocks there are in the relation. Granted, this kind of thing really shouldn't happen, but sometimes bad things do happen. Therefore, I propose the attached patch. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company RM36310.patch Description: binary/octet-stream -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
I've been giving RecoveryTest.pm a look. I wonder if we really need that as a separate package. My first thought was that we could have another class that inherits from PostgresNode (say RecoveryNode). But later it occured to me that we could have the new functions just be part of PostgresNode itself directly; so we would have some new PostgresNode methods: $node->enable_streaming $node->enable_restoring $node->enable_archiving $node->wait (your RecoveryTest::wait_for_node; better name for this?) and some additional constructors: make_master make_stream_standby make_archive_standby -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speedup twophase transactions
On Wed, Dec 9, 2015 at 12:44 PM, Stas Kelvich wrote: > Now 2PC in postgres does following: > * on prepare 2pc data (subxacts, commitrels, abortrels, invalmsgs) saved to > xlog and to file, but file not is not fsynced > * on commit backend reads data from file > * if checkpoint occurs before commit, then files are fsynced during checkpoint > * if case of crash replay will move data from xlog to files > > In this patch I’ve changed this procedures to following: > * on prepare backend writes data only to xlog and store pointer to the start > of the xlog record > * if commit occurs before checkpoint then backend reads data from xlog by > this pointer > * on checkpoint 2pc data copied to files and fsynced > * if commit happens after checkpoint then backend reads files > * in case of crash replay will move data from xlog to files (as it was before > patch) That sounds like a very good plan to me. > Now results of benchmark are following (dual 6-core xeon server): > > Current master without 2PC: ~42 ktps > Current master with 2PC: ~22 ktps > Current master with 2PC: ~36 ktps I assume that last one should have been *Patched master with 2PC"? Please add this to the January CommitFest. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [DOCS] max_worker_processes on the standby
Robert Haas wrote: > On Mon, Dec 7, 2015 at 8:33 AM, Fujii Masao wrote: > > So firstly you will push those "latest" changes soon? > > It seems like these changes haven't been pushed yet, and unfortunately > that's probably a beta blocker. I'm on this. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Some questions about the array.
On Wed, Dec 2, 2015 at 3:01 PM, Merlin Moncure wrote: > On Tue, Dec 1, 2015 at 8:46 AM, YUriy Zhuravlev > wrote: >> On Tuesday 01 December 2015 08:38:21 you wrote: >>> it (zero >>> based indexing support) doesn't meet the standard of necessity for >>> adding to the core API and as stated it's much to magical. >> >> We do not touch the arrays, we simply create a function to access them with a >> comfortable behavior. Creating a separate array types in the form extension >> is >> very difficult IMHO. > > Correct; what I'm saying is that we don't need core API support for > zero based array indexing. Yes. I think adding new functions that use an indexing convention inconsistent with the one we're using for everything else ought to be completely out of the question. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [DOCS] max_worker_processes on the standby
On Mon, Dec 7, 2015 at 8:33 AM, Fujii Masao wrote: > On Sat, Dec 5, 2015 at 12:56 AM, Alvaro Herrera > wrote: >> Fujii Masao wrote: >> >>> Sorry for not reviewing the patch before you push it... >>> >>> In HEAD, I ran very simple test case: >>> >>> 1. enable track_commit_timestamp >>> 2. start the server >>> 3. run some transactions >>> 4. execute pg_last_committed_xact() -- returns non-null values >>> 5. shutdown the server with immdiate mode >>> 6. restart the server -- crash recovery happens >>> 7. execute pg_last_committed_xact() >>> >>> The last call of pg_last_committed_xact() returns NULL values, which means >>> that the xid and timestamp information of the last committed transaction >>> disappeared by crash recovery. Isn't this a bug? >> >> Hm, not really, because the status of the "last" transaction is kept in >> shared memory as a cache and not expected to live across a restart. >> However, I tested the equivalent scenario: >> >> alvherre=# create table fg(); >> CREATE TABLE >> >> alvherre=# select ts.* from pg_class,pg_xact_commit_timestamp(xmin) ts where >> relname = 'fg'; >> ts >> --- >> 2015-12-04 12:41:48.017976-03 >> (1 fila) >> >> then crash the server, and after recovery the data is gone: >> >> alvherre=# select ts.*, xmin, c.relname from pg_class >> c,pg_xact_commit_timestamp(xmin) ts where relname = 'fg'; >> ts | xmin | relname >> +--+- >> | 630 | fg >> (1 fila) >> >> Not sure what is going on; my reading of the code certainly says that >> the data should be there. I'm looking into it. >> >> I also noticed that I didn't actually push the whole of the patch >> yesterday -- I neglected to "git add" the latest changes, the ones that >> fix the promotion scenario :-( so the commit messages is misleading >> because it describes something that's not there. > > So firstly you will push those "latest" changes soon? It seems like these changes haven't been pushed yet, and unfortunately that's probably a beta blocker. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.
On Mon, Dec 7, 2015 at 2:41 AM, Amit Langote wrote: > On 2015/12/03 19:05, Kyotaro HORIGUCHI wrote: >> At Thu, 3 Dec 2015 16:24:32 +0900, Amit Langote >> wrote >>> By the way, there are some non-st_progress_* fields that I was talking >>> about in my previous message. For example, st_activity_flag (which I have >>> suggested to rename to st_command instead). It needs to be set once at the >>> beginning of the command processing and need not be touched again. I think >>> it may be a better idea to do the same for table name or OID. It also >>> won't change over the duration of the command execution. So, we could set >>> it once at the beginning where that becomes known. Currently in the patch, >>> it's reported in st_progress_message[0] by every pgstat_report_progress() >>> invocation. So, the table name will be strcpy()'d to shared memory for >>> every scanned block that's reported. >> >> If we don't have dedicate reporting functions for each phase >> (that is, static data phase and progress phase), the one function >> pgstat_report_progress does that by having some instruction from >> the caller and it would be a bitfield. >> >> Reading the flags for making decision whether every field to copy >> or not and branching by that seems too much for int32 (or maybe >> 64?) fields. Alhtough it would be in effective when we have >> progress_message fields, I don't think it is a good idea without >> having progress_message. >> >>> pgstat_report_progress(uint32 *param1, uint16 param1_bitmap, >>>char param2[][..], uint16 param2_bitmap) >>> { >>> ... >>> for(i = 0; i < 16 ; i++) >>> { >>> if (param1_bitmap & (1 << i)) >>>beentry->st_progress_param[i] = param1[i]; >>> if (param2_bitmap & (1 << i)) >>>strcpy(beentry->..., param2[i]); >>> } >> >> Thoughts? > > Hm, I guess progress messages would not change that much over the course > of a long-running command. How about we pass only the array(s) that we > change and NULL for others: > > With the following prototype for pgstat_report_progress: > > void pgstat_report_progress(uint32 *uint32_param, int num_uint32_param, >bool *message_param[], int num_message_param); > > If we just wanted to change, say scanned_heap_pages, then we report that > using: > > uint32_param[1] = scanned_heap_pages; > pgstat_report_progress(uint32_param, 3, NULL, 0); > > Also, pgstat_report_progress() should check each of its parameters for > NULL before iterating over to copy. So in most reports over the course of > a command, the message_param would be NULL and hence not copied. It's going to be *really* important that this facility provides a lightweight way of updating progress, so I think this whole API is badly designed. VACUUM, for example, is going to want a way to update the individual counter for the number of pages it's scanned after every page. It should not have to pass all of the other information that is part of a complete report. It should just be able to say pgstat_report_progress_update_counter(1, pages_scanned) or something of this sort. Don't marshal all of the data and then make pgstat_report_progress figure out what's changed. Provide a series of narrow APIs where the caller tells you exactly what they want to update, and you update only that. It's fine to have a pgstat_report_progress() function to update everything at once, for the use at the beginning of a command, but the incremental updates within the command should do something lighter-weight. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Reusing abbreviated keys during second pass of ordered [set] aggregates
On Sat, Oct 10, 2015 at 9:03 PM, Peter Geoghegan wrote: > On Fri, Sep 25, 2015 at 2:39 PM, Jeff Janes wrote: >> This needs a rebase, there are several conflicts in >> src/backend/executor/nodeAgg.c > > I attached a revised version of the second patch in the series, fixing > this bitrot. > > I also noticed a bug in tuplesort's TSS_SORTEDONTAPE case with the > previous patch, where no final on-the-fly merge step is required (no > merge step is required whatsoever, because replacement selection > managed to produce only one run). The function mergeruns() previously > only "abandoned" abbreviated ahead of any merge step iff there was > one. When there was only one run (not requiring a merge) it happened > to continue to keep its state consistent with abbreviated keys still > being in use. It didn't matter before, because abbreviated keys were > only for tuplesort to compare, but that's different now. > > That bug is fixed in this revision by reordering things within > mergeruns(). The previous order of the two things that were switched > is not at all significant (I should know, I wrote that code). I find the references to a "void" representation in this patch to be completely opaque. I see that there are some such references in tuplesort.c already, and most likely they were put there by commits that I did, so I guess I have nobody but myself to blame, but I don't know what this means, and I don't think we should let this terminology proliferate. My understanding is that the "void" representation is intended to whatever Datum we originally got, which might be a pointer. Why not just say that instead of referring to it this way? My understanding is also that it's OK if the abbreviated key stays the same even though the value has changed, but that the reverse could cause queries to return wrong answers. The first part of that justifies why this is safe when no abbreviation is available: we'll return an abbreviated value of 0 for everything, but that's fine. However, using the original Datum (which might be a pointer) seems unsafe, because two binary-identical values could be stored at different addresses and thus have different pointer representations. I'm probably missing something here, so clue me in... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical replication and multimaster
On 2015-12-03 09:54:23 +0300, konstantin knizhnik wrote: > But right now performance of Multimaster is not limited by logical > replication protocol - if I remove DTM and use asynchronous replication > (lightweight version of BDR:) > then I get 38k TPS instead of 12k. My guess is that that's to a large degree because BDR 'batches' WAL flushes / fsyncs over several connections. As the data is only applied in one connection, whereas the primary can use multiple backends, it is important no to constantly flush, as that's synchronous. What I did for bdr was to register the 'desired' flush position whenever replaying a commit (c.f. dlist_push_tail(&bdr_lsn_association, &flushpos->node); in process_remote_commit()) and whenever feedback is sent figure out how far the WAL actually has been flushed (c.f. bdr_get_flush_position()). Now that cannot trivially be done with 2PC, but it doesn't look all that hard to change the 2PC API to allow at least some batching of the fsyncs. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical replication and multimaster
On Sun, Dec 6, 2015 at 10:24 PM, Craig Ringer wrote: > * An API to enumerate currently registered bgworkers > * A way to securely make a libpq connection from a bgworker without messing > with passwords etc. Generate one-time cookies, sometihng like that. > * (unimportant but nice API fix): BGW_NO_RESTART_ON_CRASH Why would you have the bgworker connect to the database via TCP instead of just doing whatever it wants to do directly? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Redundant sentence within INSERT documentation page (exclusion constraints)
On Sun, Dec 6, 2015 at 9:36 PM, Peter Geoghegan wrote: > There is a redundant second appearance of more or less the same > sentence at one point in the new INSERT documentation -- I missed this > during the recent overhaul of the 9.5+ INSERT documentation. > > Attached is a trivial patch that fixes this issue. I don't know nearly as much about ON CONFLICT DO UPDATE as Andres, but even I can spot a redundancy when somebody shoves the evidence right under my nose. So, committed and back-patched to 9.5. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_hba_lookup function to get all matching pg_hba.conf entries
Haribabu Kommi wrote: > Reverting the context release patch is already provided in the first > mail of this > thread [1]. Forgot to mention about the same in further mails. > > Here I attached the same patch. This patch needs to be applied first before > pg_hba_lookup patch. I tested it in windows version also. So if you change the file and reload repeatedly, we leak all the memory allocated for HBA lines in TopMemoryContext? This doesn't sound great. Perhaps we need a dedicated context which can be reset at will so that it can be refilled with the right info when we reload the file. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Speedup twophase transactions
Hello. While working with cluster stuff (DTM, tsDTM) we noted that postgres 2pc transactions is approximately two times slower than an ordinary commit on workload with fast transactions — few single-row updates and COMMIT or PREPARE/COMMIT. Perf top showed that a lot of time is spent in kernel on fopen/fclose, so it worth a try to reduce file operations with 2pc tx. Now 2PC in postgres does following: * on prepare 2pc data (subxacts, commitrels, abortrels, invalmsgs) saved to xlog and to file, but file not is not fsynced * on commit backend reads data from file * if checkpoint occurs before commit, then files are fsynced during checkpoint * if case of crash replay will move data from xlog to files In this patch I’ve changed this procedures to following: * on prepare backend writes data only to xlog and store pointer to the start of the xlog record * if commit occurs before checkpoint then backend reads data from xlog by this pointer * on checkpoint 2pc data copied to files and fsynced * if commit happens after checkpoint then backend reads files * in case of crash replay will move data from xlog to files (as it was before patch) Most of that ideas was already mentioned in 2009 thread by Michael Paquier http://www.postgresql.org/message-id/c64c5f8b0908062031k3ff48428j824a9a46f2818...@mail.gmail.com where he suggested to store 2pc data in shared memory. At that time patch was declined because no significant speedup were observed. Now I see performance improvements by my patch at about 60%. Probably old benchmark overall tps was lower and it was harder to hit filesystem fopen/fclose limits. Now results of benchmark are following (dual 6-core xeon server): Current master without 2PC: ~42 ktps Current master with 2PC: ~22 ktps Current master with 2PC: ~36 ktps Benchmark done with following script: \set naccounts 10 * :scale \setrandom from_aid 1 :naccounts \setrandom to_aid 1 :naccounts \setrandom delta 1 100 \set scale :scale+1 BEGIN; UPDATE pgbench_accounts SET abalance = abalance - :delta WHERE aid = :from_aid; UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :to_aid; PREPARE TRANSACTION ':client_id.:scale'; COMMIT PREPARED ':client_id.:scale'; 2pc_xlog.diff Description: Binary data --- Stas Kelvich Postgres Professional: http://www.postgrespro.com Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unicode collations in FreeBSD 11, DragonFly BSD 4.4 without ICU
On Sun, Dec 6, 2015 at 4:45 PM, Thomas Munro wrote: > Since the ICU patch from the BSD ports trees has been discussed on > this list a few times I thought people might be interested in this > news: > > https://lists.freebsd.org/pipermail/freebsd-current/2015-October/057781.html > https://www.dragonflybsd.org/release44/ > https://forums.freebsd.org/threads/this-is-how-i-like-open-source.53943/ > > (Thanks to pstef on #postgresql for pointing me at this). Nice. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Freeze avoidance of very large table.
On Mon, Nov 30, 2015 at 12:58 PM, Bruce Momjian wrote: > On Mon, Nov 30, 2015 at 10:48:04PM +0530, Masahiko Sawada wrote: >> On Sun, Nov 29, 2015 at 2:21 AM, Jeff Janes wrote: >> > On Tue, Nov 24, 2015 at 3:13 PM, Masahiko Sawada >> > wrote: >> >> >> >> Yeah, we need to consider to compute checksum if enabled. >> >> I've changed the patch, and attached. >> >> Please review it. >> > >> > Thanks for the update. This now conflicts with the updates doesn to >> > fix pg_upgrade out-of-space issue on Windows. I've fixed (I think) the >> > conflict in order to do some testing, but I'd like to get an updated >> > patch from the author in case I did it wrong. I don't want to find >> > bugs that I just introduced myself. >> > >> >> Thank you for having a look. > > I would not bother mentioning this detail in the pg_upgrade manual page: > > + Since the format of visibility map has been changed in version 9.6, > + pg_upgrade creates and rewrite new > '_vm' > + file even if upgrading from 9.5 or before to 9.6 or later with link mode > (-k). Really? I know we don't always document things like this, but it seems like a good idea to me that we do so. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Making tab-complete.c easier to maintain
On Wed, Dec 09, 2015 at 03:49:20PM +, Greg Stark wrote: > On Wed, Dec 9, 2015 at 2:27 PM, David Fetter wrote: > > Agreed that the "whole new language" aspect seems like way too big a > > hammer, given what it actually does. > > Which would be easier to update when things change? This question seems closer to being on point with the patch sets proposed here. > Which would be possible to automatically generate from gram.y? This seems like it goes to a wholesale context-aware reworking of tab completion rather than the myopic ("What has happened within the past N tokens?", for slowly increasing N) versions of tab completions in both the current code and in the two proposals here. A context-aware tab completion wouldn't care how many columns you were into a target list, or a FROM list, or whatever, as it would complete based on the (possibly nested) context ("in a target list", e.g.) rather than on inferences made from some slightly variable number of previous tokens. Cheers, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] parallel joins, and better parallel explain
On Fri, Dec 4, 2015 at 3:07 AM, Amit Kapila wrote: > Do you think it will be useful to display in a similar way if worker > is not able to execute plan (like before it starts execution, the other > workers have already finished the work)? Maybe, but it would clutter the output a good deal. I think we should instead have the Gather node itself display the number of workers that it actually managed to launch, and then the user can infer that any execution nodes that don't mention those workers were not executed by them. > Other than that parallel-explain-v2.patch looks good. OK, I'm going to go ahead and commit that part. > I think the problem is at Gather node, the number of buffers (read + hit) > are greater than the number of pages in relation. The reason why it > is doing so is that in Workers (ParallelQueryMain()), it starts the buffer > usage accumulation before ExecutorStart() whereas in master backend > it always calculate it for ExecutorRun() phase, on changing it to accumulate > for ExecutorRun() phase above problem is fixed. Attached patch fixes the > problem. Why is it a bad thing to capture the cost of doing ExecutorStart() in the worker? I can see there's an argument that changing this would be more consistent, but I'm not totally convinced. The overhead of ExecutorStart() in the leader isn't attributable to any specific worker, but the overhead of ExecutorStart() in the worker can fairly be blamed on Gather, I think. I'm not dead set against this change but I'm not totally convinced, either. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Using quicksort for every external sort run
On 09/12/15 00:02, Jeff Janes wrote: > The second one consumes that giant tape run along with 232 small tape > runs. In terms of number of comparisons, binary merge works best when the inputs are of similar length. I'd assume the same goes for n-ary merge, but I don't know if comparison count is an issue here. -- Cheers, Jeremy -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Confusing results with lateral references
On Fri, Dec 4, 2015 at 10:23 AM, Tom Lane wrote: > Ashutosh Bapat writes: >> I am seeing different results with two queries which AFAIU have same >> semantics and hence are expected to give same results. > >> postgres=# select * from t1, (select distinct val, val2 from t2) t2ss where >> t1.val = t2ss.val for update of t1; > >> postgres=# select * from t1, lateral (select distinct val, val2 from t2 >> where t2.val = t1.val) t2ss for update of t1; > > (I renamed your inline sub-selects to avoid confusion between them and the > table t2.) > > I'm skeptical that those should be claimed to have identical semantics. > > In the first example, after we've found the join row (1,1,1,1), we block > to see if the pending update on t1 will commit. After it does, we recheck > the join condition using the updated row from t1 (and the original row > from t2ss). The condition fails, so the updated row is not output. Check. > The same thing happens in the second example, ie, we consider the updated > row from t1 and the non-updated row from t2ss (NOT t2). There are no join > conditions to recheck (in the outer query level), so the row passes, and > we output it. What's surprising is that t2.val = t1.val isn't rechecked here. I think that's not really possible, because of the DISTINCT operation, which prevents us from identifying a single row from t2 that accounts for the subquery's output row. Not sure whether it would work without the DISTINCT. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Foreign join pushdown vs EvalPlanQual
On Wed, Dec 9, 2015 at 3:22 AM, Etsuro Fujita wrote: > Sorry, my explanation might be not enough, but I'm not saying to hide the > subplan. I think it would be better to show the subplan somewhere in the > EXPLAIN outout, but I'm not sure that it's a good idea to show that in the > current form. We have two plan trees; one for normal query execution and > another for EvalPlanQual testing. I think it'd be better to show the > EXPLAIN output the way that allows users to easily identify each of the plan > trees. It's hard to do that because we don't identify that internally anywhere. Like I said before, the possibility of a ForeignScan having an outer subplan is formally independent of the new EPQ stuff, and I'd prefer to maintain that separation and just address this with documentation. Getting this bug fixed has been one of the more exhausting experiences of my involvement with PostgreSQL, and to be honest, I think I'd like to stop spending too much time on this now and work on getting the feature that this is intended to support working. Right now, the only people who can have an opinion on this topic are those who are following this thread in detail, and there really aren't that many of those. If we get the feature - join pushdown for postgres_fdw - working, then we might get some feedback from users about what they like about it or don't, and certainly if this is a frequent complaint then that bolsters the case for doing something about it, and possibly also helps us figure out what that thing should be. On the other hand, if we don't get the feature because we're busy debating interface details related to this patch, then none of these details matter anyway because nobody except developer is actually running the code in question. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rework the way multixact truncations work
On 2015-12-09 11:18:39 -0500, Robert Haas wrote: > If I correctly understand the scenario that you are describing, that > does happen - not for the same MXID, but for different ones. At least > the last time I checked, and I'm not sure if we've fixed this, it > could happen because the SLRU page that contains the multixact wasn't > flushed out of the SLRU buffers yet. That should be fixed, with the brute force solution of flushing buffers before searching for files on disk. > But apart from that, it could > happen any time there's a gap in the sequence of files, and that sure > doesn't seem like a can't-happen situation. We know that, on 9.3, > there's definitely a sequence of events that leads to a file > followed by a gap followed by the series of files that are still live. > Given the number of other bugs we've fixed in this area, I would not > like to bet on that being the only scenario where this crops up. It > *shouldn't* happen, and as far as we know, if you start and end on a > version newer than 4f627f8 and aa29c1c, it won't. Older branches, > though, I wouldn't like to bet on. Ok, fair enough. andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rework the way multixact truncations work
On Wed, Dec 9, 2015 at 10:41 AM, Andres Freund wrote: >> (I am glad you talked the author out of back-patching; otherwise, >> 9.4.5 and 9.3.10 would have introduced a data loss bug.) > > Isn't that a bug in a, as far as we know, impossible scenario? Unless I > miss something there's no known case where it's "expected" that > find_multixact_start() fails after initially succeeding? Sure, it sucks > that the bug survived review and that it was written in the first > place. But it not showing up during testing isn't meaningful, given it's > a should-never-happen scenario. If I correctly understand the scenario that you are describing, that does happen - not for the same MXID, but for different ones. At least the last time I checked, and I'm not sure if we've fixed this, it could happen because the SLRU page that contains the multixact wasn't flushed out of the SLRU buffers yet. But apart from that, it could happen any time there's a gap in the sequence of files, and that sure doesn't seem like a can't-happen situation. We know that, on 9.3, there's definitely a sequence of events that leads to a file followed by a gap followed by the series of files that are still live. Given the number of other bugs we've fixed in this area, I would not like to bet on that being the only scenario where this crops up. It *shouldn't* happen, and as far as we know, if you start and end on a version newer than 4f627f8 and aa29c1c, it won't. Older branches, though, I wouldn't like to bet on. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rework the way multixact truncations work
On Wed, Dec 9, 2015 at 9:43 AM, Noah Misch wrote: > Andres writing the patch that became commit 4f627f8 and you reviewing it were > gifts to Alvaro and to the community. Aware of that, I have avoided[1] saying > that I was shocked to see that commit's defects. Despite a committer-author > and _two_ committer reviewers, the patch was rife with wrong new comments, > omitted updates to comments it caused to become wrong, and unsolicited > whitespace churn. (Anyone could have missed the data loss bug, but these > collectively leap off the page.) This in beleaguered code critical to data > integrity. You call this thread's latest code a patch submission, but I call > it bandaging the tree after a recent commit that never should have reached the > tree. Hey, if you'd like me to post the traditional patch files, that's easy. > It would have been easier for me. I posted branches because it gives more > metadata to guide review. As for the choice to revert and redo ... Yes, I'd like patch files, one per topic. I wasn't very happy with the way that patch it was written; it seemed to me that it touched too much code and move a lot of things around unnecessarily, and I said so at the time. I would have preferred something more incremental, and I asked for it and didn't get it. Well, I'm not giving up: I'm asking for the same thing here. I didn't think it was a good idea for Andres to rearrange that much code in a single commit, because it was hard to review, and I don't think it's a good idea for you to do it, either. To the extent that you found bugs, I think that proves the point that large commits are hard to review and small commits that change things just a little bit at a time are the way to go. > I hope those who have not already read commit 4f627f8 will not waste time > reading it. They should instead ignore multixact changes from commit 4f627f8 > through its revert. The 2015-09-26 commits have not appeared in a supported > release, and no other work has built upon them. They have no tenure. (I am > glad you talked the author out of back-patching; otherwise, 9.4.5 and 9.3.10 > would have introduced a data loss bug.) Nobody reported a single defect > before my review overturned half the patch. A revert will indeed impose on > those who invested time to understand commit 4f627f8, but the silence about > its defects suggests the people understanding it number either zero or one. > Even as its author and reviewers, you would do better to set aside what you > thought you knew about this code. I just don't find this a realistic model of how people use the git log. Maybe you use it this way; I don't. I don't *want* git blame to make it seem as if 4f627f8 is not part of the history. For better or worse, it is. Ripping it out and replacing it monolithically will not change that; it will only make the detailed history harder to reconstruct, and I *will* want to reconstruct it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Making tab-complete.c easier to maintain
On Wed, Dec 9, 2015 at 2:27 PM, David Fetter wrote: > Agreed that the "whole new language" aspect seems like way too big a > hammer, given what it actually does. Which would be easier to update when things change? Which would be possible to automatically generate from gram.y? -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rework the way multixact truncations work
On 2015-12-09 09:43:19 -0500, Noah Misch wrote: > Aware of that, I have avoided[1] saying that I was shocked to see that > commit's defects. Despite a committer-author and _two_ committer > reviewers, the patch was rife with wrong new comments, omitted updates > to comments it caused to become wrong, It's not like that patch wasn't posted for review for months... > and unsolicited whitespace churn. Whitespace churn? The commit includes a pgindent run, because Alvaro asked me to do that, but that just affected a handful of lines. If you mean the variable ordering: given several variables were renamed anyway, additionally putting them in a easier to understand order, seems rather painless. If you mean 'pgindent immune' long lines - multixact.c is far from the only one with those, and they're prett harmless. > You call this thread's latest code a patch > submission, but I call it bandaging the tree after a recent commit > that never should have reached the tree. Oh, for christs sake. > Hey, if you'd like me to > post the traditional patch files, that's easy. It would have been > easier for me. You've been asked that, repeatedly. At least if you take 'traditional patch files' to include traditional, iterative, patches ontop of the current tree. > I hope those who have not already read commit 4f627f8 will not waste time > reading it. We have to, who knows what's hiding in there. Your git log even shows that you had conflicts in your approach (83cb04 Conflicts: src/backend/access/transam/multixact.c). > They should instead ignore multixact changes from commit 4f627f8 > through its revert. The 2015-09-26 commits have not appeared in a supported > release, and no other work has built upon them. > They have no tenure. Man. > (I am glad you talked the author out of back-patching; otherwise, > 9.4.5 and 9.3.10 would have introduced a data loss bug.) Isn't that a bug in a, as far as we know, impossible scenario? Unless I miss something there's no known case where it's "expected" that find_multixact_start() fails after initially succeeding? Sure, it sucks that the bug survived review and that it was written in the first place. But it not showing up during testing isn't meaningful, given it's a should-never-happen scenario. I'm actually kinda inclined to rip out the whole "previous pass" logic out alltogether, and replace it with a PANIC. It's a hard to test, should never happen, scenario. If it happens, things have already seriously gone sour. > > That's not a one-time problem for Andres during the course of review; > > that is a problem for every single person who looks at the commit > > history from now until the end of time. > > It's a service to future readers that no line of "git blame master <...>" will > refer to a 2015-09-26 multixact commit. And a disservice for everyone doing git log, or git blame for intermediate states of the tree. The benefit for git blame, are almost nonexistant, not seing a couple newlines changed, or not seing some intermediate commits isn't really important. > Blame reports will instead refer to > replacement commits designed to be meaningful for study in isolation. If I > instead structured the repairs as you ask, the blame would find one of 4f627f8 > or its various repair commits, none of which would be a self-contained unit of > development. So what? That's how development in general works. And how it actually happened in this specific case. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Include ppc64le build type for back branches
On Wed, Dec 9, 2015 at 2:54 AM, Tom Lane wrote: > Robert Haas writes: >> With respect to this particular thing, it's hard for me to imagine >> that anything will go wrong on ppcle that we wouldn't consider a >> back-patchable fix, so there might be no harm in adjusting >> config.guess and config.sub. > > FWIW, I also suspect that supporting ppc64le would not really take > much more than updating config.guess/config.sub; there's no evidence > in the git logs that we had to fix anything else in the newer branches. > > My concern here is about establishing project policy about whether > we will or won't consider back-patching support for newer platforms. > I think that the default answer should be "no", and I'd like to > see us set down some rules about what it takes to override that. > > Obviously, setting up a buildfarm member helps a good deal. But > is that sufficient, or necessary? I would say it's necessary but not sufficient. The other criterion I'd lay down is that you shouldn't need to really change anything in the code except maybe to fix up s_lock.h breakage or some equally minor wart. For example, suppose that somebody wanted a new platform which works just like UNIX except that open() takes 4 arguments instead of 3, and we've always got to pass 0 for the last one. Well, even though that's minor and easily done, I'm not going to argue for back-patching that to all supported branches. I don't think that would be good material for a back-patch; the necessary changes could flummox third-party code or just turn out to be buggy, and apparently this new platform wasn't that important up until now, so whatever. But I feel differently about this case because we basically already do support the platform, or so it seems. We just didn't know we supported it. Really, we expect our stuff to work anywhere that has a reasonably UNIX-like environment and, hey, it'll even use atomics automatically if it has a reasonably modern gcc. So, win. I don't think that we really gain anything by refusing to admit that the product works on a platform where it does work. We've put a lot of effort into being portable and I don't think we should feel bad about that. If we test PostgreSQL on a new platform and it works with no problems (or only trivial adjustments that seem like good back-patch candidates anyway), then I think it's fine to just say, yep, it works. I don't think that sets a precedent that we'll be willing to accept arbitrary changes in back-branches to make it work when it doesn't already. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Support for N synchronous standby servers - take 2
On Wed, Nov 18, 2015 at 2:06 PM, Masahiko Sawada wrote: > On Tue, Nov 17, 2015 at 7:52 PM, Kyotaro HORIGUCHI > wrote: >> Oops. >> >> At Tue, 17 Nov 2015 19:40:10 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI >> wrote in >> <20151117.194010.17198448.horiguchi.kyot...@lab.ntt.co.jp> >>> Hello, >>> >>> At Tue, 17 Nov 2015 18:13:11 +0900, Masahiko Sawada >>> wrote in >>> >>> > >> One question is that what is different between the leading "n" in >>> > >> s_s_names and the leading "n" of "n-priority"? >>> > > >>> > > Ah. Sorry for the ambiguous description. 'n' in s_s_names >>> > > representing an arbitrary integer number and that in "n-priority" >>> > > is literally an "n", meaning "a format with any number of >>> > > priority hosts" as a whole. As an instance, >>> > > >>> > > synchronous_replication_method = "n-priority" >>> > > synchronous_standby_names = "2, mercury, venus, earth, mars, jupiter" >>> > > >>> > > I added "n-" of "n-priority" to distinguish with "1-priority" so >>> > > if we won't provide "1-priority" for backward compatibility, >>> > > "priority" would be enough to represent the type. >>> > > >>> > > By the way, s_r_method is not essentially necessary but it would >>> > > be important to avoid complexity of autodetection of formats >>> > > including currently undefined ones. >>> > >>> > Than you for your explanation, I understood that. >>> > >>> > It means that the format of s_s_names will be changed, which would be not >>> > good. >>> >>> I believe that the format of definition of "replication set"(?) >>> is not fixed and it would be more complex format to support >>> nested definition. This should be in very different format from >>> the current simple list of names. This is a selection among three >>> or possiblly more disigns in order to be tolerable for future >>> changes, I suppose. >>> >>> 1. Additional formats of definition in future will be stored in >>>elsewhere of s_s_names. >>> >>> 2. Additional format will be stored in s_s_names, the format will >>>be automatically detected. >>> >>> 3. (ditto), the format is designated by s_r_method. >>> >>> 4. Any other way? >>> >>> I choosed the third way. What do you think about future expansion >>> of the format? >>> > > I agree with #3 way and the s_s_name format you suggested. > I think that It's extensible and is tolerable for future changes. > I'm going to implement the patch based on this idea if other hackers > agree with this design. > Please find the attached draft patch which supports multi sync replication. This patch adds a GUC parameter synchronous_replication_method, which represent the method of synchronous replication. [Design of replication method] synchronous_replication_method has two values; 'priority' and '1-priority' for now. We can expand the kind of its value (e.g, 'quorum', 'json' etc) in the future. * s_r_method = '1-priority' This method is for backward compatibility, so the syntax of s_s_names is same as today. The behavior is same as well. * s_r_method = 'priority' This method is for multiple synchronous replication using priority method. The syntax of s_s_names is, , [, ...] For example, s_r_method = 'priority' and s_s_names = '2, node1, node2, node3' means that the master waits for acknowledge from at least 2 lowest priority servers. If 4 standbys(node1 - node4) are available, the master server waits acknowledge from 'node1' and 'node2. The each status of wal senders are; =# select application_name, sync_state from pg_stat_replication order by application_name; application_name | sync_state --+ node1| sync node2| sync node3| potential node4| async (4 rows) After 'node2' crashed, the master will wait for acknowledge from 'node1' and 'node3'. The each status of wal senders are; =# select application_name, sync_state from pg_stat_replication order by application_name; application_name | sync_state --+ node1| sync node3| sync node4| async (3 rows) [Changing replication method] When we want to change the replication method, we have to change the s_r_method at first, and then do pg_reload_conf(). After changing replication method, we can change the s_s_names. [Expanding replication method] If we want to expand new replication method additionally, we need to implement two functions for each replication method: * int SyncRepGetSynchronousStandbysXXX(int *sync_standbys) This function obtains the list of standbys considered as synchronous at that time, and return its length. * bool SyncRepGetSyncLsnXXX(XLogRecPtr *write_pos, XLogRecPtr *flush_pos) This function obtains LSNs(write, flush) considered as synced. Also, this patch debug code is remain yet, you can debug this behavior using by enable DEBUG_REPLICATION macro. Please give me feedbacks. Regards, -- Masahiko Sawada 000_multi_sync_replication_v1.patch Description: Binary data -- Sent
Re: [HACKERS] [sqlsmith] Failed to generate plan on lateral subqueries
On Tue, Dec 08, 2015 at 12:13:41PM -0500, Tom Lane wrote: > Andreas Seltenreich writes: > >> I no longer see "failed to build any n-way joins" after pulling, > >> but there are still instances of "could not devise a query plan". > >> Samples below. > > > sorry, I spoke too soon: nine of the former have been logged > > through the night. I'm attaching a larger set of sample queries > > this time in case that there are still multiple causes for the > > observed errors. > > Hm. At least in the first of these cases, the problem is that the > code I committed yesterday doesn't account for indirect lateral > dependencies. That is, if S1 depends on S2 which depends on the > inner side of an outer join, it now knows not to join S2 directly to > the outer side of the outer join, but it doesn't realize that the > same must apply to S1. > > Maybe we should redefine lateral_relids as the transitive closure of > a rel's lateral dependencies? Not sure. That seems like it would fix the problem once and for good. Is there any reason to believe that the lateral dependencies could go in a loop, i.e. is there a reason to put in checks for same in the transitive closure construction? Cheers, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rework the way multixact truncations work
On Tue, Dec 08, 2015 at 01:05:03PM -0500, Robert Haas wrote: > On Tue, Dec 8, 2015 at 6:43 AM, Andres Freund wrote: > > On 2015-12-04 21:55:29 -0500, Noah Misch wrote: > >> On Thu, Dec 03, 2015 at 07:03:21PM +0100, Andres Freund wrote: > >> > Sorry, but I really just want to see these changes as iterative patches > >> > ontop of 9.5/HEAD instead of this process. I won't revert the reversion > >> > if you push it anyway, but I think it's a rather bad idea. > >> > >> I hear you. > > > > Not just me. > > > >> I evaluated your request and judged that the benefits you cited > >> did not make up for the losses I cited. Should you wish to change my mind, > >> your best bet is to find defects in the commits I proposed. If I > >> introduced > >> juicy defects, that discovery would lend much weight to your conjectures. > > > > I've absolutely no interest in "proving you wrong". And my desire to > > review patches that are in a, in my opinion, barely reviewable format is > > pretty low as well. > > I agree. Noah, it seems to me that you are offering a novel theory of > how patches should be submitted, reviewed, and committed, but you've > got three people, two of them committers, telling you that we don't > like that approach. I seriously doubt you're going to find anyone who > does. Andres writing the patch that became commit 4f627f8 and you reviewing it were gifts to Alvaro and to the community. Aware of that, I have avoided[1] saying that I was shocked to see that commit's defects. Despite a committer-author and _two_ committer reviewers, the patch was rife with wrong new comments, omitted updates to comments it caused to become wrong, and unsolicited whitespace churn. (Anyone could have missed the data loss bug, but these collectively leap off the page.) This in beleaguered code critical to data integrity. You call this thread's latest code a patch submission, but I call it bandaging the tree after a recent commit that never should have reached the tree. Hey, if you'd like me to post the traditional patch files, that's easy. It would have been easier for me. I posted branches because it gives more metadata to guide review. As for the choice to revert and redo ... > When stuff gets committed to the tree, people want to to be > able to answer the question "what has just now changed?" and it is > indisputable that what you want to do here will make that harder. I hope those who have not already read commit 4f627f8 will not waste time reading it. They should instead ignore multixact changes from commit 4f627f8 through its revert. The 2015-09-26 commits have not appeared in a supported release, and no other work has built upon them. They have no tenure. (I am glad you talked the author out of back-patching; otherwise, 9.4.5 and 9.3.10 would have introduced a data loss bug.) Nobody reported a single defect before my review overturned half the patch. A revert will indeed impose on those who invested time to understand commit 4f627f8, but the silence about its defects suggests the people understanding it number either zero or one. Even as its author and reviewers, you would do better to set aside what you thought you knew about this code. > That's not a one-time problem for Andres during the course of review; > that is a problem for every single person who looks at the commit > history from now until the end of time. It's a service to future readers that no line of "git blame master <...>" will refer to a 2015-09-26 multixact commit. Blame reports will instead refer to replacement commits designed to be meaningful for study in isolation. If I instead structured the repairs as you ask, the blame would find one of 4f627f8 or its various repair commits, none of which would be a self-contained unit of development. What's to enjoy about discovering that history? > I don't think you have the > right to force your proposed approach through in the face of concerted > opposition. That's correct; I do not have that right. Your objection still worries me. nm [1] http://www.postgresql.org/message-id/20151029065903.gc770...@tornado.leadboat.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Making tab-complete.c easier to maintain
On Wed, Dec 09, 2015 at 08:49:22PM +0900, Michael Paquier wrote: > On Wed, Dec 9, 2015 at 8:17 PM, Thomas Munro > wrote: > > On Mon, Dec 7, 2015 at 8:41 PM, Michael Paquier > > wrote: > >> On Tue, Nov 17, 2015 at 12:19 AM, Alvaro Herrera > >> wrote: > >>> Thomas Munro wrote: > New version attached, merging recent changes. > >>> > >>> I wonder about the TailMatches and Matches macros --- wouldn't it be > >>> better to have a single one, renaming TailMatches to Matches and > >>> replacing the current Matches() with an initial token that corresponds > >>> to anchoring to start of command? Just wondering, not terribly attached > >>> to the idea. > >> > >> + /* TODO:TM -- begin temporary, not part of the patch! */ > >> + Assert(!word_matches(NULL, "")); > >> + [...] > >> + Assert(!word_matches("foo", "")); > >> + /* TODO:TM -- end temporary */ > >> > >> Be sure to not forget to remove that later. > > > > Thanks for looking at this Michael. It's probably not much fun to > > review! Here is a new version with that bit removed. More responses > > inline below. > > I had a hard time not sleeping when reading it... That's very mechanical. > > > I agree that would probably be better but Alvaro suggested following > > the existing logic in the first pass, which was mostly based on tails, > > and then considering simpler head-based patterns in a future pass. > > Fine with me. > > So what do we do now? There is your patch, which is already quite big, > but as well a second patch based on regexps, which is far bigger. And > at the end they provide a similar result: > > Here is for example what the regexp patch does for some complex > checks, like ALTER TABLE RENAME: > /* ALTER TABLE xxx RENAME yyy */ > -else if (pg_strcasecmp(prev4_wd, "TABLE") == 0 && > - pg_strcasecmp(prev2_wd, "RENAME") == 0 && > - pg_strcasecmp(prev_wd, "CONSTRAINT") != 0 && > - pg_strcasecmp(prev_wd, "TO") != 0) > +else if (MATCH("TABLE #id RENAME !CONSTRAINT|TO")) > > And what Thomas's patch does: > +else if (TailMatches5("ALTER", "TABLE", MatchAny, "RENAME", MatchAny) && > + !TailMatches1("CONSTRAINT|TO")) > > The regexp patch makes the negative checks somewhat easier to read > (there are 19 positions in tab-complete.c doing that), still inventing > a new langage and having a heavy refactoring just tab completion of > psql seems a bit too much IMO, so my heart balances in favor of > Thomas' stuff. Thoughts from others? Agreed that the "whole new language" aspect seems like way too big a hammer, given what it actually does. Cheers, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: ResourceOwner optimization for tables with many partitions
Hello, Robert Thanks for your review. I believe I fixed items 1, 2 and 3 (see attachment). Also I would like to clarify item 4. > 4. It mixes together multiple ideas in a single patch, not only > introducing a hashing concept but also striping a brand-new layer of > abstraction across the resource-owner mechanism. I am not sure that > layer of abstraction is a very good idea, but if it needs to be done, > I think it should be a separate patch. Do I right understand that you suggest following? Current patch should be split in two parts. In first patch we create and use ResourceArray with array-based implementation (abstraction layer). Then we apply second patch which change ResourceArray implementation to hashing based (optimization). Best regards, Aleksander On Tue, 8 Dec 2015 17:30:33 -0500 Robert Haas wrote: > On Fri, Dec 4, 2015 at 7:15 AM, Aleksander Alekseev > wrote: > > Current implementation of ResourceOwner uses arrays to store > > resources like TupleDescs, Snapshots, etc. When we want to release > > one of these resources ResourceOwner finds it with linear scan. > > Granted, resource array are usually small so its not a problem most > > of the time. But it appears to be a bottleneck when we are working > > with tables which have a lot of partitions. > > > > To reproduce this issue: > > 1. run `./gen.pl 1 | psql my_database postgres` > > 2. run `pgbench -j 8 -c 8 -f q.sql -T 100 my_database` > > 3. in second terminal run `sudo perf top -u postgres` > > > > Both gen.pl and q.sql are attached to this message. > > > > You will see that postgres spends a lot of time in > > ResourceOwnerForget* procedures: > > > > 32.80% postgres [.] list_nth > > 20.29% postgres [.] ResourceOwnerForgetRelationRef > > 12.87% postgres [.] find_all_inheritors > > 7.90% postgres [.] get_tabstat_entry > > 6.68% postgres [.] ResourceOwnerForgetTupleDesc > > 1.17% postgres [.] hash_search_with_hash_value > > ... < 1% ... > > > > I would like to suggest a patch (see attachment) witch fixes this > > bottleneck. Also I discovered that there is a lot of code > > duplication in ResourceOwner. Patch fixes this too. The idea is > > quite simple. I just replaced arrays with something that could be > > considered hash tables, but with some specific optimizations. > > > > After applying this patch we can see that bottleneck is gone: > > > > 42.89% postgres [.] list_nth > > 18.30% postgres [.] find_all_inheritors > > 10.97% postgres [.] get_tabstat_entry > > 1.82% postgres [.] hash_search_with_hash_value > > 1.21% postgres [.] SearchCatCache > > ... < 1% ... > > > > For tables with thousands partitions we gain in average 32.5% more > > TPS. > > > > As far as I can see in the same time patch doesn't make anything > > worse. `make check` passes with asserts enabled and disabled. There > > is no performance degradation according to both standard pgbench > > benchmark and benchmark described above for tables with 10 and 100 > > partitions. > > I do think it's interesting to try to speed up the ResourceOwner > mechanism when there are many resources owned. We've had some forays > in that direction in the past, and I think it's useful to continue > that work. But I also think that this patch, while interesting, is > not something we can seriously consider committing in its current > form, because: > > 1. It lacks adequate comments and submission notes to explain the > general theory of operation of the patch. > > 2. It seems to use uint8 * rather freely as a substitute for char * or > void *, which doesn't seem like a great idea. > > 3. It doesn't follow PostgreSQL formatting conventions (uncuddled > curly braces, space after if and before open paren, etc.). > > 4. It mixes together multiple ideas in a single patch, not only > introducing a hashing concept but also striping a brand-new layer of > abstraction across the resource-owner mechanism. I am not sure that > layer of abstraction is a very good idea, but if it needs to be done, > I think it should be a separate patch. > diff --git a/src/backend/utils/resowner/resowner.c b/src/backend/utils/resowner/resowner.c index 0e7acbf..d2b37d5 100644 --- a/src/backend/utils/resowner/resowner.c +++ b/src/backend/utils/resowner/resowner.c @@ -29,6 +29,323 @@ #include "utils/snapmgr.h" /* + * ResourceArray is a common structure for storing different types of resources. + * + * ResourceOwner can own `HeapTuple`s, `Relation`s, `Snapshot`s, etc. For + * each resource type there are procedures ResourceOwnerRemember* and + * ResourceOwnerForget*. There are also ResourceOwnerEnlarge* procedures + * which should be called before corresponding ResourceOwnerRemember* calls + * (see below). Internally each type of resource is stored in separate + * ResourceArray. + * + * There are two major reasons for u
Re: [HACKERS] W-TinyLfu for cache eviction
On Wed, Dec 9, 2015 at 11:31 AM, Konstantin Knizhnik wrote: > I expect synchronization issues with implementation of this algorithm. It > seems to be hard to avoid some global critical section which can cause > significant performance degradation at MPP systems (see topic "Move > PinBuffer and UnpinBuffer to atomics"). The sketch updates don't really need to be globally consistent, anything that is faster than cycling of buffers through the LRU tier would be enough. In principle atomic increments could be used, but funneling all buffer hits onto a small amount of cachelines and then doing 4x the amount of writes would result in extremely nasty cache line ping-ponging. Caffeine implementation appears to solve this by batching the frequency sketch updates using a striped set of circular buffers. Re-implementing this is not exactly trivial and some prior experience with lock free programming seems well advised. Based on the paper it looks like this algorithm can work with a low quality primary eviction algorithm. Swapping our GCLOCK out for something simpler like plain CLOCK could simplify an atomics based PinBuffer approach. Another interesting avenue for research would be to use ideas in TinyLFU to implement a tier of "nailed" buffers that have backend local or striped pin-unpin accounting. But for checking if the replacement policy implemented by W-TinyLFU is good it isn't necessary to have a performant locking solution. I think a global lock would be good enough for a proof of concept that only evaluates cache hit ratios. Regards, Ants Aasma -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] exposing pg_controldata and pg_config as functions
On Thu, Sep 17, 2015 at 7:12 AM, Andres Freund wrote: > On 2015-08-20 09:59:25 -0400, Andrew Dunstan wrote: >> Is there any significant interest in either of these? >> >> Josh Berkus tells me that he would like pg_controldata information, and I >> was a bit interested in pg_config information, for this reason: I had a >> report of someone who had configured using --with-libxml but the xml tests >> actually returned the results that are expected without xml being >> configured. The regression tests thus passed, but should not have. It >> occurred to me that if we had a test like >> >> select pg_config('configure') ~ '--with-libxml' as has_xml; >> >> in the xml tests then this failure mode would be detected. > > On my reading of the thread there seems to be a tentative agreement that > pg_controldata is useful and still controversy around pg_config. Can we > split committing this? Yeah, the last version of the patch dates of August, and there is visibly agreement that the information of pg_controldata provided at SQL level is useful while the data of pg_config is proving to be less interesting for remote users. Could the patch be rebased and split as suggested above? -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Error with index on unlogged table
On 2015-12-09 21:03:47 +0900, Michael Paquier wrote: > Oh, OK. I didn't read though your lines correctly. So you basically > mean that we would look at the init files that are on disk, and check > if they are empty. If they are, we simply use XLogReadBufferExtended > to fetch the INIT_FORKNUM content and fill in another buffer for the > MAIN_FORKNUM. More or less right? We'd not just do so if they're empty, we'd just generally copy the file via shared buffers, instead of copy_file(). But we'd get the file size from the filesystem (which is fine, we make sure it is correct during replay). -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Error with index on unlogged table
On Wed, Dec 9, 2015 at 8:04 PM, Andres Freund wrote: > On 2015-12-09 19:36:11 +0900, Michael Paquier wrote: >> On Wed, Dec 9, 2015 at 4:41 PM, Andres Freund wrote: >> > On 2015-12-09 16:30:47 +0900, Michael Paquier wrote: >> >> > I'm kinda wondering if it wouldn't have been better to go through shared >> >> > buffers in ResetUnloggedRelationsInDbspaceDir() instead of using >> >> > copy_file(). >> >> >> >> For deployment with large shared_buffers settings, wouldn't that be >> >> actually more costly than the current way of doing? We would need to >> >> go through all the buffers at least once and look for the INIT_FORKNUM >> >> present to flush them. >> > >> > We could just check the file sizes on disk, and the check for the >> > contents of all the pages for each file. >> >> By doing it at replay, the flushes are spread across time. And by >> doing it at the end of recovery, all the flushes would be grouped. Do >> you think that's fine? > > The point is that we'd no flushes, because the data would come directly > from shared buffers... Oh, OK. I didn't read though your lines correctly. So you basically mean that we would look at the init files that are on disk, and check if they are empty. If they are, we simply use XLogReadBufferExtended to fetch the INIT_FORKNUM content and fill in another buffer for the MAIN_FORKNUM. More or less right? -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql: add \pset true/false
On Tue, Dec 8, 2015 at 8:51 PM, Michael Paquier wrote: > On Tue, Dec 8, 2015 at 7:18 PM, Kyotaro HORIGUCHI > wrote: >> At Sat, 5 Dec 2015 21:05:29 +0900, Michael Paquier >> wrote in >> >> > Regarding the patch, I >> > would tend to think that we should just reject it and try to cruft >> > something that could be more pluggable if there is really a need. >> > Thoughts? >> >> Honestly saying, I feel similarly with you:p I personally will do >> something like the following for the original objective. > > Are there other opinions? The -1 team is in majority at the end of this > thread.. So, marking the patch as rejected? Any objections? -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Making tab-complete.c easier to maintain
On Wed, Dec 9, 2015 at 8:17 PM, Thomas Munro wrote: > On Mon, Dec 7, 2015 at 8:41 PM, Michael Paquier > wrote: >> On Tue, Nov 17, 2015 at 12:19 AM, Alvaro Herrera >> wrote: >>> Thomas Munro wrote: New version attached, merging recent changes. >>> >>> I wonder about the TailMatches and Matches macros --- wouldn't it be >>> better to have a single one, renaming TailMatches to Matches and >>> replacing the current Matches() with an initial token that corresponds >>> to anchoring to start of command? Just wondering, not terribly attached >>> to the idea. >> >> + /* TODO:TM -- begin temporary, not part of the patch! */ >> + Assert(!word_matches(NULL, "")); >> + [...] >> + Assert(!word_matches("foo", "")); >> + /* TODO:TM -- end temporary */ >> >> Be sure to not forget to remove that later. > > Thanks for looking at this Michael. It's probably not much fun to > review! Here is a new version with that bit removed. More responses > inline below. I had a hard time not sleeping when reading it... That's very mechanical. > I agree that would probably be better but Alvaro suggested following > the existing logic in the first pass, which was mostly based on tails, > and then considering simpler head-based patterns in a future pass. Fine with me. So what do we do now? There is your patch, which is already quite big, but as well a second patch based on regexps, which is far bigger. And at the end they provide a similar result: Here is for example what the regexp patch does for some complex checks, like ALTER TABLE RENAME: /* ALTER TABLE xxx RENAME yyy */ -else if (pg_strcasecmp(prev4_wd, "TABLE") == 0 && - pg_strcasecmp(prev2_wd, "RENAME") == 0 && - pg_strcasecmp(prev_wd, "CONSTRAINT") != 0 && - pg_strcasecmp(prev_wd, "TO") != 0) +else if (MATCH("TABLE #id RENAME !CONSTRAINT|TO")) And what Thomas's patch does: +else if (TailMatches5("ALTER", "TABLE", MatchAny, "RENAME", MatchAny) && + !TailMatches1("CONSTRAINT|TO")) The regexp patch makes the negative checks somewhat easier to read (there are 19 positions in tab-complete.c doing that), still inventing a new langage and having a heavy refactoring just tab completion of psql seems a bit too much IMO, so my heart balances in favor of Thomas' stuff. Thoughts from others? -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Passing initially_valid values instead of !skip_validation to StoreRelCheck() in AddRelationNewConstraints()
On Wednesday, 9 December 2015, amul sul wrote: > >On Wednesday, 9 December 2015 12:55 PM, Amit Langote < > langote_amit...@lab.ntt.co.jp > wrote: > > >Thoughts? > > Wondering, have you notice failed regression tests? I did, I couldn't send an updated patch today before leaving for the day. > I have tried with new transformCheckConstraints() function & there will be > small fix in gram.y. > > Have look into attached patch & please share your thoughts and/or > suggestions. > Will take a look. Thanks, Amit
Re: [HACKERS] Making tab-complete.c easier to maintain
On Mon, Dec 7, 2015 at 8:41 PM, Michael Paquier wrote: > On Tue, Nov 17, 2015 at 12:19 AM, Alvaro Herrera > wrote: >> Thomas Munro wrote: >>> New version attached, merging recent changes. >> >> I wonder about the TailMatches and Matches macros --- wouldn't it be >> better to have a single one, renaming TailMatches to Matches and >> replacing the current Matches() with an initial token that corresponds >> to anchoring to start of command? Just wondering, not terribly attached >> to the idea. > > + /* TODO:TM -- begin temporary, not part of the patch! */ > + Assert(!word_matches(NULL, "")); > + [...] > + Assert(!word_matches("foo", "")); > + /* TODO:TM -- end temporary */ > > Be sure to not forget to remove that later. Thanks for looking at this Michael. It's probably not much fun to review! Here is a new version with that bit removed. More responses inline below. > - else if (pg_strcasecmp(prev5_wd, "DEFAULT") == 0 && > -pg_strcasecmp(prev4_wd, "PRIVILEGES") == 0 && > -(pg_strcasecmp(prev3_wd, "FOR") == 0 || > - pg_strcasecmp(prev3_wd, "IN") == 0)) > - { > - static const char *const > list_ALTER_DEFAULT_PRIVILEGES_REST[] = > - {"GRANT", "REVOKE", NULL}; > - > - COMPLETE_WITH_LIST(list_ALTER_DEFAULT_PRIVILEGES_REST); > - } > + else if (TailMatches5("DEFAULT", "PRIVILEGES", "FOR", "ROLE", > MatchAny) || > +TailMatches5("DEFAULT", "PRIVILEGES", "IN", > "SCHEMA", MatchAny)) > + CompleteWithList2("GRANT", "REVOKE"); > For this chunk I think that you need to check for ROLE|USER and not only > ROLE. Right, done. > + else if (TailMatches4("ALTER", "DOMAIN", MatchAny, "RENAME")) > { > static const char *const list_ALTERDOMAIN[] = > {"CONSTRAINT", "TO", NULL}; > I think you should remove COMPLETE_WITH_LIST here for consistency with the > rest. Right, done. > - else if (pg_strcasecmp(prev5_wd, "DOMAIN") == 0 && > -pg_strcasecmp(prev3_wd, "RENAME") == 0 && > -pg_strcasecmp(prev2_wd, "CONSTRAINT") == 0) > + else if (TailMatches3("RENAME", "CONSTRAINT", MatchAny)) > COMPLETE_WITH_CONST("TO"); > Perhaps you are missing DOMAIN here? Right, done. > - else if (pg_strcasecmp(prev4_wd, "ALTER") == 0 && > -pg_strcasecmp(prev3_wd, "SEQUENCE") == 0 && > -pg_strcasecmp(prev_wd, "NO") == 0) > - { > - static const char *const list_ALTERSEQUENCE2[] = > - {"MINVALUE", "MAXVALUE", "CYCLE", NULL}; > - > - COMPLETE_WITH_LIST(list_ALTERSEQUENCE2); > - } > + else if (TailMatches4("ALTER", "SEQUEMCE", MatchAny, "NO")) > + CompleteWithList3("MINVALUE", "MAXVALUE", "CYCLE"); > Typo here: s/SEQUEMCE/SEQUENCE. Oops, fixed. > - else if (pg_strcasecmp(prev5_wd, "TABLE") == 0 && > -pg_strcasecmp(prev3_wd, "RENAME") == 0 && > -(pg_strcasecmp(prev2_wd, "COLUMN") == 0 || > - pg_strcasecmp(prev2_wd, "CONSTRAINT") == 0) && > -pg_strcasecmp(prev_wd, "TO") != 0) > + else if (TailMatches6("ALTER", "TABLE", MatchAny, "RENAME", > "COLUMN|CONSTRAINT", MatchAny) && > +!TailMatches1("TO")) > This should use TailMatches5 without ALTER for consistency with the existing > code? Ok, done. > - else if (pg_strcasecmp(prev_wd, "CLUSTER") == 0 && > -pg_strcasecmp(prev2_wd, "WITHOUT") != 0) > + else if (TailMatches1("CLUSTER") && !TailMatches2("WITHOUT", > "CLUSTER")) > Here removing CLUSTER should be fine. Ok. > - else if (pg_strcasecmp(prev2_wd, "CLUSTER") == 0 && > -pg_strcasecmp(prev_wd, "ON") != 0 && > -pg_strcasecmp(prev_wd, "VERBOSE") != 0) > - { > + else if (TailMatches2("CLUSTER", MatchAny) && > !TailMatches1("VERBOSE")) > Handling of ON has been forgotten. Right, fixed. > - else if (pg_strcasecmp(prev3_wd, "CREATE") == 0 && > -!(pg_strcasecmp(prev2_wd, "USER") == 0 && > pg_strcasecmp(prev_wd, "MAPPING") == 0) && > -(pg_strcasecmp(prev2_wd, "ROLE") == 0 || > - pg_strcasecmp(prev2_wd, "GROUP") == 0 || > pg_strcasecmp(prev2_wd, "USER") == 0)) > + else if (TailMatches3("CREATE", "ROLE|GROUP|USER", MatchAny) && > +!TailMatches3("CREATE", "USER", "MAPPING")) > !TailMatches2("USER", "MAPPING")? Ok. > - else if (pg_strcasecmp(prev4_wd, "CREATE") == 0 && > -pg_strcasecmp(prev3_wd, "MATERIALIZED") == 0 && > -pg_strcasecmp(prev2_wd, "VIEW") == 0) > + else if (TailMatches3("CREATE", "MATERIALIZED", "VIEW")) > Fo
Re: [HACKERS] Error with index on unlogged table
On 2015-12-09 19:36:11 +0900, Michael Paquier wrote: > On Wed, Dec 9, 2015 at 4:41 PM, Andres Freund wrote: > > On 2015-12-09 16:30:47 +0900, Michael Paquier wrote: > >> > I'm kinda wondering if it wouldn't have been better to go through shared > >> > buffers in ResetUnloggedRelationsInDbspaceDir() instead of using > >> > copy_file(). > >> > >> For deployment with large shared_buffers settings, wouldn't that be > >> actually more costly than the current way of doing? We would need to > >> go through all the buffers at least once and look for the INIT_FORKNUM > >> present to flush them. > > > > We could just check the file sizes on disk, and the check for the > > contents of all the pages for each file. > > By doing it at replay, the flushes are spread across time. And by > doing it at the end of recovery, all the flushes would be grouped. Do > you think that's fine? The point is that we'd no flushes, because the data would come directly from shared buffers... -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Error with index on unlogged table
On Wed, Dec 9, 2015 at 4:41 PM, Andres Freund wrote: > On 2015-12-09 16:30:47 +0900, Michael Paquier wrote: >> > I'm kinda wondering if it wouldn't have been better to go through shared >> > buffers in ResetUnloggedRelationsInDbspaceDir() instead of using >> > copy_file(). >> >> For deployment with large shared_buffers settings, wouldn't that be >> actually more costly than the current way of doing? We would need to >> go through all the buffers at least once and look for the INIT_FORKNUM >> present to flush them. > > We could just check the file sizes on disk, and the check for the > contents of all the pages for each file. By doing it at replay, the flushes are spread across time. And by doing it at the end of recovery, all the flushes would be grouped. Do you think that's fine? -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] W-TinyLfu for cache eviction
On 03.12.2015 10:27, Vladimir Sitnikov wrote: I've recently noticed W-TinyLfu cache admission policy (see [1]) being used for caffeine "high performance caching library for Java 8". It demonstrates high cache hit ratios (see [2]) and enables to build high-throughput caches (see caffeine in [3]) Authors explicitly allow implementations of the algorithm (see [4]). Does it make sense to evaluate the algorithm for buffer replacement? I expect synchronization issues with implementation of this algorithm. It seems to be hard to avoid some global critical section which can cause significant performance degradation at MPP systems (see topic "Move PinBuffer and UnpinBuffer to atomics"). [1]: http://arxiv.org/pdf/1512.00727v1.pdf [2]: https://github.com/ben-manes/caffeine/wiki/Efficiency [3]: https://github.com/ben-manes/caffeine/wiki/Benchmarks [4]: https://github.com/ben-manes/caffeine/issues/23#issuecomment-161536706 Vladimir Sitnikov -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Passing initially_valid values instead of !skip_validation to StoreRelCheck() in AddRelationNewConstraints()
>On Wednesday, 9 December 2015 12:55 PM, Amit Langote > wrote: >Thoughts? Wondering, have you notice failed regression tests? I have tried with new transformCheckConstraints() function & there will be small fix in gram.y. Have look into attached patch & please share your thoughts and/or suggestions. Thanks, Amul Sul transformCheckConstraints-function-to-overrid-not-valid.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_hba_lookup function to get all matching pg_hba.conf entries
On Wed, Dec 9, 2015 at 5:31 PM, Amit Kapila wrote: > > Another bigger issue I see in the above part of code is that it doesn't > seem to be safe to call load_hba() at that place in PostgresMain() as > currently load_hba() is using a context created from PostmasterContext > to perform the parsing and some other stuff, the PostmasterContext > won't be available at that time. It is deleted immediately after > InitPostgres > is completed. So either we need to make PostmasterContext don't go > away after InitPostgres() or load_hba shouldn't use it and rather use > CurrentMemroyContext similar to ProcessConfigFile or may be use > TopMemoryContext instead of PostmasterContext if possible. I think > this needs some more thoughts. > > Apart from above, this patch doesn't seem to work on Windows or > for EXEC_BACKEND builds as we are loading the hba file in a > temporary context (PostmasterContext, refer PerformAuthentication) > which won't be alive for the backends life. This works on linux because > of fork/exec mechanism which allows to inherit the parsed file > (parsed_hba_lines). This is slightly tricky problem to solve and we > have couple of options (a) use TopMemoryContext instead of Postmaster > Context to load hba; (b) Use CurrentMemoryContext (c) pass the parsed > hba file for Windows/Exec_Backend using save_backend_variables/ > restore_backend_variables mechanism or if you have any other idea. > If you don't have any better idea, then you can evaluate above ideas > and see which one makes more sense. Reverting the context release patch is already provided in the first mail of this thread [1]. Forgot to mention about the same in further mails. Here I attached the same patch. This patch needs to be applied first before pg_hba_lookup patch. I tested it in windows version also. [1] - http://www.postgresql.org/message-id/cajrrpgffyf45mfk7ub+qhwhxn_ttmknrvhtudefqzuzzrwe...@mail.gmail.com Regards, Hari Babu Fujitsu Australia revert_hba_context_release_in_backend.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics
On Tue, Dec 8, 2015 at 6:00 PM, Amit Kapila wrote: > On Tue, Dec 8, 2015 at 3:56 PM, Alexander Korotkov < > a.korot...@postgrespro.ru> wrote: >> >> Agree. This patch need to be carefully verified. Current experiments >> just show that it is promising direction for improvement. I'll come with >> better version of this patch. >> >> Also, after testing on large machines I have another observation to >> share. For now, LWLock doesn't guarantee that exclusive lock would be ever >> acquired (assuming each shared lock duration is finite). It because when >> there is no exclusive lock, new shared locks aren't queued and LWLock state >> is changed directly. Thus, process which tries to acquire exclusive lock >> have to wait for gap in shared locks. >> > > I think this has the potential to starve exclusive lockers in worst case. > > >> But with high concurrency for shared lock that could happen very rare, >> say never. >> >> We did see this on big Intel machine in practice. pgbench -S gets shared >> ProcArrayLock very frequently. Since some number of connections is >> achieved, new connections hangs on getting exclusive ProcArrayLock. I think >> we could do some workaround for this problem. For instance, when exclusive >> lock waiter have some timeout it could set some special bit which prevents >> others to get new shared locks. >> >> > I think timeout based solution would lead to giving priority to > exclusive lock waiters (assume a case where each of exclusive > lock waiter timesout one after another) and make shared lockers > wait and a timer based solution might turn out to be costly for > general cases where wait is not so long. > Since all lwlock waiters are ordered in the queue, we can let only first waiter to set this bit. Anyway, once bit is set, shared lockers would be added to the queue. They would get the lock in queue order. > Another way could be to > check if the Exclusive locker needs to go for repeated wait for a > couple of times, then we can set such a bit. > I'm not sure what do you mean by repeated wait. Do you mean exclusive locker was waked twice up by timeout? Because now, without timeout, exclusive locker wouldn't be waked up until all shared locks are released. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] Foreign join pushdown vs EvalPlanQual
On 2015/12/09 13:26, Kouhei Kaigai wrote: On Tue, Dec 8, 2015 at 10:00 PM, Etsuro Fujita wrote: I think the actual regression test outputs are fine, and that your desire to suppress part of the plan tree from showing up in the EXPLAIN output is misguided. I like it just the way it is. To prevent user confusion, I think that when we add support to postgres_fdw for this we might also want to add some documentation explaining how to interpret this EXPLAIN output, but I don't think there's any problem with the output itself. I'm not sure that that's a good idea. one reason for that is I think that that would be more confusing to users when more than two foreign tables are involved in a foreign join as shown in the following example. Note that the outer plans will be shown recursively. Another reason is there is no consistency between the costs of the outer plans and that of the main plan. I still don't really see a problem here, but, regardless, the solution can't be to hide nodes that are in fact present from the user. We can talk about making further changes here, but hiding the nodes altogether is categorically out in my mind. If you really want to hide the alternative sub-plan, you can move the outer planstate onto somewhere private field on BeginForeignScan, then kick ExecProcNode() at the ForeignRecheck callback by itself. Explain walks down the sub-plan if outerPlanState(planstate) is valid. So, as long as your extension keeps the planstate privately, it is not visible from the EXPLAIN. Of course, I don't recommend it. Sorry, my explanation might be not enough, but I'm not saying to hide the subplan. I think it would be better to show the subplan somewhere in the EXPLAIN outout, but I'm not sure that it's a good idea to show that in the current form. We have two plan trees; one for normal query execution and another for EvalPlanQual testing. I think it'd be better to show the EXPLAIN output the way that allows users to easily identify each of the plan trees. Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers