Re: [HACKERS] Hang in pldebugger after git commit : 98a64d0
On Wed, Dec 14, 2016 at 4:34 PM, Ashutosh Sharma wrote: > Attached is the patch based on Amit's explanation above. Please have a > look and let me know for any concerns. Thank you Micheal and Andres > for sharing your thoughts and Amit for your valuable inputs. } +event = &set->events[0]; +event->events &= ~(WL_SOCKET_READABLE); #ifndef WIN32 I bet that this patch breaks many things for any non-WIN32 platform. What I think you want to do is modify the flag events associated to the socket read/write event to be updated in WaitEventAdjustWin32(), which gets called in ModifyWaitEvent(). By the way, position 0 refers to a socket for FeBeWaitSet, but that's a mandatory thing when a list of events is created with AddWaitEventToSet. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hang in pldebugger after git commit : 98a64d0
Hi, > Okay, but I think we need to re-enable the existing event handle for > required event (FD_READ) by using WSAEventSelect() to make it work > sanely. We have tried something on those lines and it resolved the > problem. Ashutosh will post a patch on those lines later today. Let > us know if you have something else in mind. Attached is the patch based on Amit's explanation above. Please have a look and let me know for any concerns. Thank you Micheal and Andres for sharing your thoughts and Amit for your valuable inputs. reenable_event_handle.patch Description: invalid/octet-stream -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
On 2016/12/09 19:46, Maksim Milyutin wrote: I would like to work on two tasks: - insert (and eventually update) tuple routing for foreign partition. - the ability to create an index on the parent and have all of the children inherit it; The first one has been implemented in pg_pathman somehow, but the code relies on dirty hacks, so the FDW API has to be improved. As for the extended index support, it doesn't look like a super-hard task. That would be great! I'd like to help review the first one. Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Indirect indexes
On Tue, Dec 13, 2016 at 11:11 PM, Pavan Deolasee wrote: > To be fair to myself, I did try to find patches with equal or more > complexity. But most of them had (multiple) reviewers assigned and were > being discussed for weeks and months. I did not think I could contribute > positively to those discussions, mostly because meaningful review at that > point would require much more in-depth knowledge of the area. So I picked > couple of patches which did not have any reviewers. May be not the best > decision, but I did what I thought was correct. I'm a little tired right now because it's almost 2am here, so maybe I should go to bed instead of writing emails when I might be excessively grumpy, but I just don't buy this. There are indeed a number of patches which have provoked lots of discussion already, but there are also many that have had virtually no review at all. Many of the parallel query patches fall into this category, and there are lots of others. CommitFest 2017-01 currently has 62 patches in the "Needs Review" state, and it is just not the case that all 62 of those are under active discussion and review. I bet there are a dozen that have had no replies at all and another dozen that have had only a handful of fairly casual replies without any in-depth discussion. Maybe more. Now, I will agree that diving into a patch in an area that you've never seen before can be challenging and intimidating. But you're not asking any less for your own patch. You would like someone to dive into your patch and figure it out, whether they know that area well or not, and let's face it: most people don't. I don't. I've read the email threads about WARM, but the concept hasn't sunk in for me yet. I understand HOT and I understand the heap and indexes pretty well so I'll figure it out eventually, but I don't have it figured out now and that's only going to get solved if I go put in a bunch of time to go learn. I imagine virtually everyone is in the same position. Similarly, I just spent a ton of time reviewing Amit Kapila's work on hash indexes and I don't know a heck of a lot about hash indexes -- virtually nobody does, because that area has been largely ignored for the last decade or so. Yet if everybody uses that as an excuse, then we don't get anywhere. To put that another way, reviewing patches is hard, but we still have to do it if we want to produce a quality release with good features. If we don't do a good job reviewing and therefore don't commit things, we'll have a disappointing release. If we don't do a good job reviewing but commit things anyway, then we'll have a horribly buggy release. If the patch we fail to review adequately happens to be WARM, then those will very possibly be data-corrupting bugs, which are particularly bad for our project's reputation for reliability. Those aren't good options, so we had better try to make sure that we do lots of high-quality review of that patch and all the others. If we can't turn to the people who are writing the patches to help review them, even though that's difficult, then I think that much-needed review won't happen. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [OSSTEST PATCH 0/1] PostgreSQL db: Retry on constraint violation [and 2 more messages]
On Tue, Dec 13, 2016 at 1:00 PM, Ian Jackson wrote: > The conversion is as follows: if a scenario is affected by the caveat, > in there must be at least one transaction T which firstly produces > "impossible" results I, and in which some later statement S produces a > serialisation failure. > > To exhibit the corresponding unavoidable bug: Execute an identical > scenario, with exactly the same sequence of steps in the same order, > up to S. However, instead of S, execute ROLLBACK. I am having a hard time understanding exactly what the argument on this thread is about, but I want to comment on this point. Saying that a set of transactions are serializable is equivalent to the statement that there is some serial order of execution which would have produced results equivalent to the actual results. That is, there must be at least one schedule (T1, ..., Tn) such that running the transactions one after another in that order would have produced the same results that were obtained by running them concurrently. Any transactions that roll back whether due to serialization anomalies or manual user intervention or any other reason are not part of the schedule, which considers only successfully committed transactions. The rolled-back transactions effectively didn't happen, and serializability as a concept makes no claim about the behavior of such transactions. That's not a PostgreSQL thing: that's database theory. However, in practice, the scenario that you mention should generally work fine in PostgreSQL, I think. If T performed any writes, the rollback throws them away, so imagine removing the actual T from the mix and replacing it with a substitute version T' which performs the same reads but no writes and then tries to COMMIT where T tried to ROLLBACK. T' will succeed, because we never roll back a read-only transaction at commit time. If it were to fail, it would have to fail *while performing one of the reads*, not later. But imagine a hypothetical database system in which anomalies are never detected until commit time. We somehow track the global must-happen-before graph and refuse the commit of any transaction which will create a cycle. Let's also suppose that this system uses snapshots to implement MVCC. In such a system, read-only transactions will sometimes fail at commit time if they've seen a view of the world that is inconsistent with the only remaining possible serial schedules. For example, suppose T1 updates A -> A' and reads B. Concurrently, T2 updates B -> B'; thus, T1 must precede T2. Next, T2 commits. Now, T3 begins and reads B', so that T2 must precede T3. Next T1 commits. T3, which took its snapshot before the commit of T1, now reads A. Thus T3 has to proceed T1. That's a cycle, so T3 won't be allowed to commit, but yet it's done a couple of reads and hasn't failed yet... because of an implementation detail of the system. That's probably annoying from a user perspective -- if a transaction is certainly going to fail we'd like to report the failure as early as possible -- and it's probably crushingly slow, but according to my understanding it's unarguably a correct implementation of serializability (assuming there are no bugs), yet it doesn't deliver the guarantee you're asking for here. I have not read any database literature on the interaction of serializability with subtransactions. This seems very thorny. Suppose T1 reads A and B and updates A -> A' while concurrently T2 reads A and B and updates B -> B'. This is obviously not serializable; if either transaction had executed before the other in a serial schedule, the second transaction in the schedule would have had to have seen (A, B') or (A', B) rather than (A, B), but that's not what happened. But what if each of T1 and T2 did the reads in a subtransaction, rolled it back, and then did the write in the main transaction and committed? The database system has two options. First, it could assume that the toplevel transaction may have relied on the results of the aborted subtransaction. But if it does that, then any serialization failure which afflicts a subtransaction must necessarily also kill the main transaction, which seems pedantic and unhelpful. If you'd wanted the toplevel transaction to be killled, you wouldn't have used a subtransaction, right? Second, it could assume that the toplevel transaction in no way relied on or used the values obtained from the aborted subtransaction. However, that defeats the whole purpose of having subtransactions in the first place. What's the point of being able to start subtransactions if you can't roll them back and then decide to do something else instead? It seems to me that what the database system should do is make that second assumption, and what the user should do is understand that to the degree that transactions depend on the results of aborted subtransactions, there may be serialization anomalies that go undetected. And the user should put up with that because t
Re: [HACKERS] Hang in pldebugger after git commit : 98a64d0
On Tue, Dec 13, 2016 at 9:49 PM, Andres Freund wrote: > On 2016-12-12 16:46:38 +0900, Michael Paquier wrote: >> OK, I think that I have spotted an issue after additional read of the >> code. When a WSA event is used for read/write activity on a socket, >> the same WSA event gets reused again and again. That's fine for >> performance reasons > > It's actually also required to not loose events, > i.e. correctness. Windows events are edge, not level triggered. > Are all Windows events edge triggered? What I read from msdn [1], it doesn't appear so. I am quoting text from msdn page [1] which seems to be saying the event FD_READ we use in this case is level triggered. "For FD_READ, FD_OOB, and FD_ACCEPT network events, network event recording and event object signaling are level-triggered." > The > previous implementation wasn't correct. So just resetting things ain't > ok. > Okay, but I think we need to re-enable the existing event handle for required event (FD_READ) by using WSAEventSelect() to make it work sanely. We have tried something on those lines and it resolved the problem. Ashutosh will post a patch on those lines later today. Let us know if you have something else in mind. [1] - https://msdn.microsoft.com/en-us/library/windows/desktop/ms741576(v=vs.85).aspx -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Indirect indexes
On Thu, Dec 8, 2016 at 3:11 AM, Robert Haas wrote: > On Thu, Oct 20, 2016 at 11:30 AM, Pavan Deolasee > wrote: > > We have a design to convert WARM chains back to HOT and that will > increase > > the percentage of WARM updates much beyond 50%. I was waiting for > feedback > > on the basic patch before putting in more efforts, but it went unnoticed > > last CF. > > While you did sign up to review one patch in the last CF, the amount > of review you did for that patch is surely an order of magnitude less > than what WARM will require. Maybe more than that. I don't mean to > point the finger at you specifically -- there are lots of people > slinging patches into the CommitFest who aren't doing as much review > as their own patches will require. I understand the point you're trying to make and I am not complaining that WARM did not receive a review, even though I would very much like that to happen because I see it's a right step forward in solving some of the problems Postgres users face. But I also understand that every committer and developer is busy with the stuff that they see important and interesting. To be fair to myself, I did try to find patches with equal or more complexity. But most of them had (multiple) reviewers assigned and were being discussed for weeks and months. I did not think I could contribute positively to those discussions, mostly because meaningful review at that point would require much more in-depth knowledge of the area. So I picked couple of patches which did not have any reviewers. May be not the best decision, but I did what I thought was correct. I'll try to do more review in the next CF. Obviously that does not guarantee that WARM will see a reviewer, but hopefully I would have done my bit. Thanks, Pavan -- Pavan Deolasee http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] Declarative partitioning - another take
On Mon, Dec 12, 2016 at 3:06 PM, Amit Langote wrote: > > Hi, > > On 2016/12/11 10:02, Venkata B Nagothi wrote: > > On Fri, Dec 9, 2016 at 11:11 PM, Amit Langote > > wrote: > >> On Fri, Dec 9, 2016 at 3:16 PM, Venkata B Nagothi > >> wrote: > >>> I am testing the partitioning feature from the latest master and got > the > >>> following error while loading the data - > >>> > >>> db01=# create table orders_y1993 PARTITION OF orders FOR VALUES FROM > >>> ('1993-01-01') TO ('1993-12-31'); > >>> CREATE TABLE > >>> > >>> db01=# copy orders from '/data/orders-1993.csv' delimiter '|'; > >>> ERROR: could not read block 6060 in file "base/16384/16412": read only > >> 0 of > >>> 8192 bytes > >>> CONTEXT: COPY orders, line 376589: > >>> "9876391|374509|O|54847|1997-07-16|3-MEDIUM > >> |Clerk#01993|0|ithely > >>> regular pack" > >> > >> Hmm. Could you tell what relation the file/relfilenode 16412 belongs > to? > >> > > > > db01=# select relname from pg_class where relfilenode=16412 ; > >relname > > -- > > orders_y1997 > > (1 row) > > > > > > I VACUUMED the partition and then re-ran the copy command and no luck. > > > > db01=# vacuum orders_y1997; > > VACUUM > > > > db01=# copy orders from '/data/orders-1993.csv' delimiter '|'; > > ERROR: could not read block 6060 in file "base/16384/16412": read only 0 > > of 8192 bytes > > CONTEXT: COPY orders, line 376589: > > "9876391|374509|O|54847|1997-07-16|3-MEDIUM > |Clerk#01993|0|ithely > > regular pack" > > > > I do not quite understand the below behaviour as well. I VACUUMED 1997 > > partition and then i got an error for 1992 partition and then after 1996 > > and then after 1994 and so on. > > > > [ ... ] > > > db01=# vacuum orders_y1997; > > VACUUM > > db01=# copy orders from '/data/orders-1993.csv' delimiter '|'; > > ERROR: could not read block 6060 in file "base/16384/16412": read only 0 > > of 8192 bytes > > CONTEXT: COPY orders, line 376589: > > "9876391|374509|O|54847|1997-07-16|3-MEDIUM > |Clerk#01993|0|ithely > > regular pack" > > db01=# > > > > Am i not understanding anything here ? > > I could not reproduce this issue. Also, I could not say what might have > gone wrong based only on the information I have seen so far. > > Have you tried inserting the same data using insert? > I can load the data into appropriate partitions using INSERT. So, no issues there. db01=# CREATE TABLE orders2( o_orderkey INTEGER, o_custkey INTEGER, o_orderstatus CHAR(1), o_totalprice REAL, o_orderdate DATE, o_orderpriority CHAR(15), o_clerk CHAR(15), o_shippriority INTEGER, o_comment VARCHAR(79)) partition by (o_orderdate); *db01=# insert into orders2 select * from orders where o_orderdate='1995-10-11';* *INSERT 0 3110* > create table orders_unpartitioned (like orders); > copy orders_unpartitioned from '/data/orders-1993.csv'; > insert into orders select * from orders_unpartitioned; > Loading the data into a normal table is not an issue (infact the csv is generated from the table itself) The issue is occurring only when i am trying to load the data from CSV file into a partitioned table - db01=# CREATE TABLE orders_y1992 PARTITION OF orders2 FOR VALUES FROM ('1992-01-01') TO ('1992-12-31'); CREATE TABLE db01=# copy orders2 from '/data/orders-1993.csv' delimiter '|'; ERROR: could not read block 6060 in file "base/16384/16407": read only 0 of 8192 bytes CONTEXT: COPY orders2, line 376589: "9876391|374509|O|54847|1997-07-16|3-MEDIUM |Clerk#01993|0|ithely regular pack" Not sure why COPY is failing. Regards, Venkata B N Database Consultant
Re: [HACKERS] pg_background contrib module proposal
On 13 Dec. 2016 20:54, "amul sul" wrote: postgres=> select * from pg_background_result(67069) as (x text); ERROR: terminating connection due to administrator command CONTEXT: background worker, pid 67069 postgres=> It'll also want to handle cancellation due to conflict with recovery if you intend it to be used on a standby, probably by making use of procsignal_sigusr1_handler. The rest of the work is done by CHECK_FOR_INTERRUPTS() . This only matters if it's meant to work on standbys of course. I haven't checked if you write to catalogs or otherwise do non-standby-friendly things.
Re: [HACKERS] [OSSTEST PATCH 0/1] PostgreSQL db: Retry on constraint violation
On Mon, Dec 12, 2016 at 4:46 PM, Thomas Munro wrote: > On Tue, Dec 13, 2016 at 10:47 AM, Kevin Grittner wrote: >> Barring objections I will back-patch to 9.2 through 9.5 tomorrow. >> (9.1 is out of support and the fix is already in 9.6 and forward.) > > +1 Done. I will add a test case based on Ian's work to the master branch to show the bug if this were to be reverted, but first want to look for similar cases, so we know the scope of the issue and we fix what we can. I probably won't get to that until early next month, since I'm on vacation the last two weeks in December and am trying to tie up some loose ends on other things first. Thanks, Ian, for the example to show that this should be considered a bug! -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hang in pldebugger after git commit : 98a64d0
On Wed, Dec 14, 2016 at 1:14 AM, Ashutosh Sharma wrote: > I have been working on this issue for last few days and it seems like > i have almost found the reason for this failure. My findings till date > are as follows. > > Just to confirm if the problem is happening due to reusability of > WaitEventSet structure, I had replaced a function call > ModifyWaitEvent() and WaitEventSetWait() which makes use of an already > existing WaitEventSet to store event handles with WaitLatchOrSocket() > and it worked. Actually WaitLatchOrSocket() creates a new WaitEventSet > structure every time it has to wait for an event and is also being > used in the old code. This clearly shows that we do have some problem > just because we are reusing the same set of object handles throughput > a backend session. I am further investigating on this and would post > the final analysis along with the fix very soon. Attached is the > patch that has the changes described > above. Any suggestions or inputs would be appreciated. So it would mean that there is a mismatch between what should be used for the wait event and what is actually used. One simple idea if you want to compare both would be to generate some elog(LOG) entries or whatever on the fields we are interesting on and see what is mismatching. That may help in understanding what's wrong in the wait events used. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical Replication WIP
On 13/12/16 21:42, Peter Eisentraut wrote: > On 12/10/16 2:48 AM, Petr Jelinek wrote: >> Attached new version with your updates and rebased on top of the current >> HEAD (the partitioning patch produced quite a few conflicts). > > I have attached a few more "fixup" patches, mostly with some editing of > documentation and comments and some compiler warnings. > > In 0006 in the protocol documentation I have left a "XXX ???" where I > didn't understand what it was trying to say. > Okay I'll address that separately, thanks. > All issues from (my) previous reviews appear to have been addressed. > > Comments besides that: > > > 0003-Add-SUBSCRIPTION-catalog-and-DDL-v12.patch > > Still wondering about the best workflow with pg_dump, but it seems all > the pieces are there right now, and the interfaces can be tweaked later. Right, either way there needs to be some special handling for subscriptions, having to request them specifically seems safest option to me, but I am open to suggestions there. > > DROP SUBSCRIPTION requires superuser, but should perhaps be owner check > only? > Hmm not sure that it requires superuser, I actually think it mistakenly didn't require anything. In any case will make sure it just does owner check. > DROP SUBSCRIPTION IF EXISTS crashes if the subscription does not in fact > exist. > Right, missing return. > Maybe write the grammar so that SLOT does not need to be a new key word. > The changes you made for CREATE PUBLICATION should allow that. > Hmm how would that look like? The opt_drop_slot would become IDENT IDENT? Or maybe you want me to add the WITH (definition) kind of thing? > The tests are not added to serial_schedule. Intentional? If so, document? > Not intentional, will fix. Never use it, easy to forget about it. > > 0004-Define-logical-replication-protocol-and-output-plugi-v12.patch > > Not sure why pg_catalog is encoded as a zero-length string. I guess it > saves some space. Maybe that could be explained in a brief code comment? > Yes it's to save space, mainly for built-in types. > > 0005-Add-logical-replication-workers-v12.patch > > The way the executor stuff is organized now looks better to me. > > The subscriber crashes if max_replication_slots is 0: > > TRAP: FailedAssertion("!(max_replication_slots > 0)", File: "origin.c", > Line: 999) > > The documentation says that replication slots are required on the > subscriber, but from a user's perspective, it's not clear why that is. Yeah honestly I think origins should not depend on max_replication_slots. They are not really connected (you can have many of one and none of the other and vice versa). Also max_replication_slots should IMHO default to max_wal_senders at this point. (In ideal world all of those 3 would be in DSM instead of SHM and only governed by some implementation maximum which is probably 2^16 and the GUCs would be removed) But yes as it is, we should check for that, probably both during CREATE SUBSCRIPTION and during apply start. > > Dropping a table that is part of a live subscription results in log > messages like > > WARNING: leaked hash_seq_search scan for hash table 0x7f9d2a807238 > > I was testing replicating into a temporary table, which failed like this: > > FATAL: the logical replication target public.test1 not found > LOG: worker process: (PID 2879) exited with exit code 1 > LOG: starting logical replication worker for subscription 16392 > LOG: logical replication apply for subscription mysub started > > That's okay, but those messages were repeated every few seconds or so > and would create quite some log volume. I wonder if that needs to be > reigned in somewhat. It retries every 5s or so I think, I am not sure how that could be improved besides using the wal_retrieve_retry_interval instead of hardcoded 5s (or maybe better add GUC for apply). Maybe some kind of backoff algorithm could be added as well. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical Replication WIP
On 13/12/16 22:05, Andres Freund wrote: > Hi, > > On 2016-12-13 15:42:17 -0500, Peter Eisentraut wrote: >> I think this is getting very close to the point where it's committable. >> So if anyone else has major concerns about the whole approach and >> perhaps the way the new code in 0005 is organized, now would be the time ... > > Uh. The whole cache invalidation thing is completely unresolved, and > that's just the publication patch. I've not looked in detail at later > patches. So no, I don't think so. > I already have code for that. I'll submit next version once I'll go over PeterE's review. BTW the relcache thing is not as bad as it seems from the publication patch because output plugin has to deal with relcache/publication cache invalidations, it handles most of the updates correctly. But there was still problem in terms of the write filtering so the publications still have to reset relcache too. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [sqlsmith] FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 10200)
On Wed, Dec 14, 2016 at 9:25 AM, Kyotaro HORIGUCHI wrote: > Ah, sorry for the confusion. > > At Tue, 13 Dec 2016 22:43:22 +0900, Michael Paquier > wrote in > >> On Tue, Dec 13, 2016 at 7:15 PM, Kyotaro HORIGUCHI >> wrote: >> > Thank you for the comment. >> >> Er, it is good to see more people interested in this problem... As the >> former author, still working actively on this patch, perhaps it would >> be better if I continue to code this patch, no? It is a waste to step >> on each other's toes. > > You are the author of this patch. I addressed that since the > comments are mainly on typos, not function. I think you do better > than me so the v6 from me should be abandoned. No problem. Thanks for the help! -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical Replication WIP
On 14/12/16 01:26, Peter Eisentraut wrote: > On 12/12/16 7:33 PM, Andres Freund wrote: >> +-- test switching between slots in a session >> +SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot1', >> 'test_decoding', true); >> +SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot2', >> 'test_decoding', true); >> +SELECT * FROM pg_logical_slot_get_changes('regression_slot1', NULL, NULL); >> +SELECT * FROM pg_logical_slot_get_changes('regression_slot2', NULL, NULL); >> >> Can we actually output something? Right now this doesn't test that >> much... > > This test was added because an earlier version of the patch would crash > on this. > I did improve the test as part of the tests improvements that were sent to committers list btw. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [sqlsmith] FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 10200)
Ah, sorry for the confusion. At Tue, 13 Dec 2016 22:43:22 +0900, Michael Paquier wrote in > On Tue, Dec 13, 2016 at 7:15 PM, Kyotaro HORIGUCHI > wrote: > > Thank you for the comment. > > Er, it is good to see more people interested in this problem... As the > former author, still working actively on this patch, perhaps it would > be better if I continue to code this patch, no? It is a waste to step > on each other's toes. You are the author of this patch. I addressed that since the comments are mainly on typos, not function. I think you do better than me so the v6 from me should be abandoned. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical Replication WIP
On 12/12/16 7:33 PM, Andres Freund wrote: > +-- test switching between slots in a session > +SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot1', > 'test_decoding', true); > +SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot2', > 'test_decoding', true); > +SELECT * FROM pg_logical_slot_get_changes('regression_slot1', NULL, NULL); > +SELECT * FROM pg_logical_slot_get_changes('regression_slot2', NULL, NULL); > > Can we actually output something? Right now this doesn't test that > much... This test was added because an earlier version of the patch would crash on this. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] exposing wait events for non-backends (was: Tracking wait event for latches)
On Tue, Dec 13, 2016 at 11:40:54AM -0500, Robert Haas wrote: > Let's confine ourselves to fixing one problem at a time. I think we > can get where we want to be in this case by adding one new column and > some new rows to pg_stat_activity. Agreed. Let's also remove the abuse of WAL senders with the query field at the same time. > Michael, is that something you're > going to do? If not, one of my esteemed colleagues here at > EnterpriseDB will have a try. If I had received feedback on the other thread, I would have coded a proposal of patch already. But as long as SCRAM is not done I will restrain from taking an extra project. I am fine to do reviews as I already looked at ways to solve the problem though. So if anybody has room to do it please be my guest. Regarding the way to solve things, I think that having in ProcGlobal an array of PGPROC entries for each auxiliary process is the way to go, the position of each entry in the array defining what the process type is. That would waste some shared memory, but we are not talking about that much here. That would as well remove the need of having checkpointerLatch, walWriteLatch and the startup fields in ProcGlobal. -- Michael signature.asc Description: PGP signature
Re: [HACKERS] Parallel safety of CURRENT_* family
On Thu, Dec 1, 2016 at 3:46 PM, Tom Lane wrote: > Robert Haas writes: >> On Thu, Dec 1, 2016 at 2:32 PM, Tom Lane wrote: >>> ... well, they would be if we passed down xactStartTimestamp to parallel >>> workers, but I can't find any code that does that. In view of the fact that >>> transaction_timestamp() is marked as parallel-safe, this is a bug in 9.6. > >> Yeah. Do you think we should arrange to pass that down, or change the >> marking? > > We can't fix the marking in existing 9.6 installations, so I think we > have to pass it down. (Which would be a better response anyway.) I happened across this thread today and took a look at what it would take to fix this. I quickly ran up against the fact that SerializeTransactionState() and RestoreTransactionState() are not exactly brilliantly designed, relying on the notion that each individual value that we want to serialize will be no wider than a TransactionId, which won't be true for timestamps. Even apart from that, the whole design of those functions is pretty lame, and I'm pretty sure I wrote all of that code myself, so I have nobody to blame but me. Anyway, here's a proposed patch to refactor that code into something a little more reasonable. It doesn't fix the actual problem here, but I think it's a good first step. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c index d643216..9c13717 100644 --- a/src/backend/access/transam/xact.c +++ b/src/backend/access/transam/xact.c @@ -65,6 +65,23 @@ #include "utils/timestamp.h" #include "pg_trace.h" +/* + * Serialization format for transaction state information. + */ +typedef struct SerializedTransactionState +{ + int xactIsoLevel; + bool xactDeferrable; + TransactionId xactTopTransactionId; + TransactionId currentTransactionId; + CommandId currentCommandId; + int nCurrentXids; + TransactionId currentXids[FLEXIBLE_ARRAY_MEMBER]; +} SerializedTransactionState; + +#define SIZE_OF_SERIALIZED_TRANSACTION_STATE(n_current_xids) \ + offsetof(SerializedTransactionState, currentXids) + \ + (n_current_xids * sizeof(TransactionId)) /* * User-tweakable parameters @@ -4812,8 +4829,7 @@ Size EstimateTransactionStateSpace(void) { TransactionState s; - Size nxids = 6; /* iso level, deferrable, top & current XID, - * command counter, XID count */ + Size nxids = 0; for (s = CurrentTransactionState; s != NULL; s = s->parent) { @@ -4823,7 +4839,7 @@ EstimateTransactionStateSpace(void) } nxids = add_size(nxids, nParallelCurrentXids); - return mul_size(nxids, sizeof(TransactionId)); + return SIZE_OF_SERIALIZED_TRANSACTION_STATE(nxids); } /* @@ -4847,16 +4863,17 @@ SerializeTransactionState(Size maxsize, char *start_address) TransactionState s; Size nxids = 0; Size i = 0; - Size c = 0; TransactionId *workspace; - TransactionId *result = (TransactionId *) start_address; + SerializedTransactionState *result; + + result = (SerializedTransactionState *) start_address; - result[c++] = (TransactionId) XactIsoLevel; - result[c++] = (TransactionId) XactDeferrable; - result[c++] = XactTopTransactionId; - result[c++] = CurrentTransactionState->transactionId; - result[c++] = (TransactionId) currentCommandId; - Assert(maxsize >= c * sizeof(TransactionId)); + Assert(maxsize >= SIZE_OF_SERIALIZED_TRANSACTION_STATE(0)); + result->xactIsoLevel = XactIsoLevel; + result->xactDeferrable = XactDeferrable; + result->xactTopTransactionId = XactTopTransactionId; + result->currentTransactionId = CurrentTransactionState->transactionId; + result->currentCommandId = currentCommandId; /* * If we're running in a parallel worker and launching a parallel worker @@ -4865,9 +4882,10 @@ SerializeTransactionState(Size maxsize, char *start_address) */ if (nParallelCurrentXids > 0) { - result[c++] = nParallelCurrentXids; - Assert(maxsize >= (nParallelCurrentXids + c) * sizeof(TransactionId)); - memcpy(&result[c], ParallelCurrentXids, + result->nCurrentXids = nParallelCurrentXids; + Assert(maxsize >= + SIZE_OF_SERIALIZED_TRANSACTION_STATE(nParallelCurrentXids)); + memcpy(&result->currentXids, ParallelCurrentXids, nParallelCurrentXids * sizeof(TransactionId)); return; } @@ -4882,7 +4900,7 @@ SerializeTransactionState(Size maxsize, char *start_address) nxids = add_size(nxids, 1); nxids = add_size(nxids, s->nChildXids); } - Assert((c + 1 + nxids) * sizeof(TransactionId) <= maxsize); + Assert(SIZE_OF_SERIALIZED_TRANSACTION_STATE(nxids) <= maxsize); /* Copy them to our scratch space. */ workspace = palloc(nxids * sizeof(TransactionId)); @@ -4900,8 +4918,8 @@ SerializeTransactionState(Size maxsize, char *start_address) qsort(workspace, nxids, sizeof(TransactionId), xidComparator); /* Copy data into output area. */ - result[c++] = (TransactionId) nxids; - memcpy(&result[c], workspace, nxids * sizeof(TransactionId)); + result->nCur
Re: [HACKERS] Logical Replication WIP
On 2016-12-13 06:55:31 +0100, Petr Jelinek wrote: > >> This is a quadratic algorithm - that could bite us... Not sure if we > >> need to care. If we want to fix it, one approach owuld be to use > >> RangeVarGetRelid() instead, and then do a qsort/deduplicate before > >> actually opening the relations. > >> > > > > I guess it could get really slow only with big inheritance tree, I'll > > look into how much work is the other way of doing things (this is not > > exactly hot code path). > > > > Actually looking at it, it only processes user input so I don't think > it's very problematic in terms of performance. You'd have to pass many > thousands of tables in single DDL to notice. Well, at least we should put a CHECK_FOR_INTERRUPTS there. At the moment it's IIRC uninterruptible, which isn't good for something directly triggered by the user. A comment that it's known to be O(n^2), but considered acceptable, would be good too. Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical Replication WIP
Hi, On 2016-12-13 15:42:17 -0500, Peter Eisentraut wrote: > I think this is getting very close to the point where it's committable. > So if anyone else has major concerns about the whole approach and > perhaps the way the new code in 0005 is organized, now would be the time ... Uh. The whole cache invalidation thing is completely unresolved, and that's just the publication patch. I've not looked in detail at later patches. So no, I don't think so. I think after the invalidation issue is resolved the publication patch might be close to being ready. I'm doubtful the later patches are. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical Replication WIP
On 12/10/16 2:48 AM, Petr Jelinek wrote: > Attached new version with your updates and rebased on top of the current > HEAD (the partitioning patch produced quite a few conflicts). I have attached a few more "fixup" patches, mostly with some editing of documentation and comments and some compiler warnings. In 0006 in the protocol documentation I have left a "XXX ???" where I didn't understand what it was trying to say. All issues from (my) previous reviews appear to have been addressed. Comments besides that: 0003-Add-SUBSCRIPTION-catalog-and-DDL-v12.patch Still wondering about the best workflow with pg_dump, but it seems all the pieces are there right now, and the interfaces can be tweaked later. DROP SUBSCRIPTION requires superuser, but should perhaps be owner check only? DROP SUBSCRIPTION IF EXISTS crashes if the subscription does not in fact exist. Maybe write the grammar so that SLOT does not need to be a new key word. The changes you made for CREATE PUBLICATION should allow that. The tests are not added to serial_schedule. Intentional? If so, document? 0004-Define-logical-replication-protocol-and-output-plugi-v12.patch Not sure why pg_catalog is encoded as a zero-length string. I guess it saves some space. Maybe that could be explained in a brief code comment? 0005-Add-logical-replication-workers-v12.patch The way the executor stuff is organized now looks better to me. The subscriber crashes if max_replication_slots is 0: TRAP: FailedAssertion("!(max_replication_slots > 0)", File: "origin.c", Line: 999) The documentation says that replication slots are required on the subscriber, but from a user's perspective, it's not clear why that is. Dropping a table that is part of a live subscription results in log messages like WARNING: leaked hash_seq_search scan for hash table 0x7f9d2a807238 I was testing replicating into a temporary table, which failed like this: FATAL: the logical replication target public.test1 not found LOG: worker process: (PID 2879) exited with exit code 1 LOG: starting logical replication worker for subscription 16392 LOG: logical replication apply for subscription mysub started That's okay, but those messages were repeated every few seconds or so and would create quite some log volume. I wonder if that needs to be reigned in somewhat. I think this is getting very close to the point where it's committable. So if anyone else has major concerns about the whole approach and perhaps the way the new code in 0005 is organized, now would be the time ... -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services From df169d3fec6d67c3daf301d9ed9903545358dd7e Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Mon, 12 Dec 2016 12:00:00 -0500 Subject: [PATCH 2/8] fixup! Add PUBLICATION catalogs and DDL --- src/backend/commands/publicationcmds.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/src/backend/commands/publicationcmds.c b/src/backend/commands/publicationcmds.c index 954b2bd65d..f1d7404ffe 100644 --- a/src/backend/commands/publicationcmds.c +++ b/src/backend/commands/publicationcmds.c @@ -389,7 +389,6 @@ AlterPublication(AlterPublicationStmt *stmt) { Relation rel; HeapTuple tup; - AclResult aclresult; rel = heap_open(PublicationRelationId, RowExclusiveLock); @@ -564,12 +563,11 @@ PublicationAddTables(Oid pubid, List *rels, bool if_not_exists, foreach(lc, rels) { Relation rel = (Relation) lfirst(lc); - AclResult aclresult; ObjectAddress obj; /* Must be owner of the table or superuser. */ if (!pg_class_ownercheck(RelationGetRelid(rel), GetUserId())) - aclcheck_error(aclresult, ACL_KIND_CLASS, + aclcheck_error(ACLCHECK_NOT_OWNER, ACL_KIND_CLASS, RelationGetRelationName(rel)); obj = publication_add_relation(pubid, rel, if_not_exists); -- 2.11.0 From 2cab724505487663b53f8af09c8e16a70c52014d Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Fri, 9 Dec 2016 12:00:00 -0500 Subject: [PATCH 4/8] fixup! Add SUBSCRIPTION catalog and DDL --- doc/src/sgml/catalogs.sgml | 28 ++--- doc/src/sgml/ref/alter_subscription.sgml | 25 +--- doc/src/sgml/ref/create_subscription.sgml | 47 ++ doc/src/sgml/ref/drop_subscription.sgml| 23 ++- doc/src/sgml/ref/psql-ref.sgml | 6 +-- src/backend/catalog/pg_subscription.c | 2 +- src/backend/commands/subscriptioncmds.c| 6 +-- src/backend/parser/gram.y | 2 +- .../libpqwalreceiver/libpqwalreceiver.c| 2 +- src/backend/tcop/utility.c | 5 +++ src/bin/pg_dump/pg_dump.c | 7 +--- src/test/regress/parallel_schedule | 2 +- 12 files changed, 75 insertions(+), 80 deletions(-) diff --git a/doc/src/sgml/catalogs.sgml b/doc/src
Re: [HACKERS] New design for FK-based join selectivity estimation
ronan.dunk...@dalibo.com writes: > On mardi 13 décembre 2016 09:10:47 CET Adrien Nayrat wrote: >> The commit 100340e2dcd05d6505082a8fe343fb2ef2fa5b2a introduce an >> estimation error : > The problem is, for semi and anti joins, we assume that we have nohing to do > (costsize.c:4253): > else if (jointype == JOIN_SEMI || jointype == JOIN_ANTI) > { > /* >* For JOIN_SEMI and JOIN_ANTI, the selectivity is > defined as the >* fraction of LHS rows that have matches. If the > referenced >* table is on the inner side, that means the > selectivity is 1.0 >* (modulo nulls, which we're ignoring for now). We > already >* covered the other case, so no work here. >*/ > } > This results in assuming that the whole outerrel will match, no matter the > selectivity of the innerrel. Yeah. In the terms of this example, the FK means that every outer row would have a match, if the query were select * from t3 where j in (select * from t4); But actually it's select * from t3 where j in (select * from t4 where j<10); so of course we should not expect a match for every row. > If I understand it correctly and the above is right, I think we should ignore > SEMI or ANTI joins altogether when considering FKs, and keep the > corresponding > restrictinfos for later processing since they are already special-cased later > on. That seems like an overreaction. While the old code happens to get this example exactly right, eqjoinsel_semi is still full of assumptions and approximations, and it doesn't do very well at all if it lacks MCV lists for both sides. I'm inclined to think that what we want to have happen in this case is to estimate the fraction of outer rows having a match as equal to the selectivity of the inner query's WHERE clauses, ie the semijoin selectivity should be sizeof(inner result) divided by sizeof(inner relation). regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [OSSTEST PATCH 0/1] PostgreSQL db: Retry on constraint violation [and 2 more messages]
On Tue, Dec 13, 2016 at 12:00 PM, Ian Jackson wrote: > Kevin Grittner writes: > I still hope to be able to convince you that the definition of > SERIALIZABLE (in the pgsql docs) ought to be a stronger version, which > covers even non-committed transactions. That doesn't seem likely. The stronger definition would perform so poorly that even if you managed to convince me, the PostgreSQL community as a whole has already rejected it. > "you must discard any results from a transaction which later throws >a serialization failure" You argue that saying "If A happens, you must not use the results" means that "If A doesn't happen you can use the results." That does not logically follow; your argument based on it has no merit. Basically, a serializable transaction must successfully commit in order to consider any results from it to be valid. The quote above is discussing one way that can happen, which does not preclude other ways. >>> 1. Declare that all spurious failures, in SERIALIZABLE transactions, >>> are bugs. >> >> It's not clear to me what you mean here. > > I mean that the pgsql documentation should simply say that for > SERIALIZABLE transactions, there is always a coherent serialisation. > Even if the transaction failed due to a constraint violation, or > serialisation failure, or database disconnection, or whatever. > > I think there is no "useful" weaker guarantee. That's wrong on the face of it, and certainly at odds with all research papers and standards in the area. > By which I mean that for any reasonable weaker guarantee: any scenario > in which pgsql violates the stronger guarantee, can be converted into > a scenario where pgsql violates the weaker guarantee. You have failed to provide any evidence of that. Twisting statements and the principles of logical inference don't advance your position. > The fact that pgsql might generate "spurious" constraint violations, > even in SERIALIZABLE transactions, has been known for many years. Yes, it was clearly discussed during development. > See the URLs etc. in my patch, which include a reference from 2009. This paper from 2007 was among those cited during development of the PostgreSQL serializable implementation: Automating the Detection of Snapshot Isolation Anomalies. Sudhir Jorwekar, Alan Fekete, Krithi Ramamritham, S. Sudarshan. VLDB ‘07, September 23-28, 2007, Vienna, Austria. https://www.cse.iitb.ac.in/~sudarsha/Pubs-dir/VLDB07-snapshot.pdf Among relevant statements in that paper: | The database system ensures the preservation of some integrity | constraints which are explicitly declared to the system in the | schema definition, such as uniqueness of primary key and | referential integrity. Some of the SI anomalies are avoided due | to the dbms enforcement of these constraints. Of course, both pre-date development of the feature itself -- which was released as part of PostgreSQL 9.1, in September, 2011. > Many people have always regarded this as a bug. The pgsql > authors have not agreed. Right. For some people, a system has a bug if it does not behave as they would most like it to. In my world, it is not a bug if it is functioning as defined, intended, and documented. The change requested is a feature request, and a feature I would also like to see. It is not one that anyone has offered to pay to have developed, which is a significant barrier to implementation. >> The behavior we have been providing, modulus the bug you just found >> and any related bugs of the same ilk, is "we don't allow >> serialization anomalies into the database or in results from >> serializable transactions which commit." That is enough to be very >> useful to many. > > As I explain, I think that if you make that promise, you will have to > do all the work necessary to make the stronger promise anyway. That is absolutely *not* true of your suggestion that results returned from a serializable transaction which does not subsequently commit must be free from serialization anomalies. You are just factually wrong there. > I think all scenarios of this kind can be shown to allow serialisation > anomalies. Which kind? Catching an exception from a declarative constraint? Not all of them can, especially after the patch that went into 9.6 and that I'm intending to back-patch to other supported branches shortly. Other exceptions? I have seen no evidence that any others can, and I have a lot of reasons to believe that the special properties of the accesses from constraint implementations which allow those exceptions to be problematic are not present for other exceptions. Now, that is not an assertion that it is impossible for the code to have some other bug, but if there is such a bug we would need to find it to be able to fix it. > I'm afraid I have no idea. But if that is a complete description of > the bug (if it doesn't affect "INSERT ... ON CONFLICT" for example) > then that is good. A bug was found in the initial release of INSERT ...
Re: [HACKERS] New design for FK-based join selectivity estimation
On mardi 13 décembre 2016 09:10:47 CET Adrien Nayrat wrote: > Hi hackers, > > The commit 100340e2dcd05d6505082a8fe343fb2ef2fa5b2a introduce an > estimation error : [] > > Estimated row is 10x larger since 100340e2d > > Regards, Hello, I think I understand what the problem is. In get_foreign_key_join_selectiviy, we remove the restrict info clauses which match a foreign key. This is done so that the selectivy is not applied twice (once in the function itself, once when processing the restrictinfos). The problem is, for semi and anti joins, we assume that we have nohing to do (costsize.c:4253): else if (jointype == JOIN_SEMI || jointype == JOIN_ANTI) { /* * For JOIN_SEMI and JOIN_ANTI, the selectivity is defined as the * fraction of LHS rows that have matches. If the referenced * table is on the inner side, that means the selectivity is 1.0 * (modulo nulls, which we're ignoring for now). We already * covered the other case, so no work here. */ } This results in assuming that the whole outerrel will match, no matter the selectivity of the innerrel. If I understand it correctly and the above is right, I think we should ignore SEMI or ANTI joins altogether when considering FKs, and keep the corresponding restrictinfos for later processing since they are already special-cased later on. Regards, -- Ronan Dunklau -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb problematic operators
Nico Williams writes: > On Tue, Dec 13, 2016 at 10:26:24AM +0900, Michael Paquier wrote: >> On Mon, Dec 12, 2016 at 10:22 PM, Greg Stark wrote: >>> One option might be for Postgres to define duplicate operator names >>> using ¿ or something else. >> Are you sure that using a non-ASCII character is a good idea for an >> in-core operator? I would think no. > Eventually language designers will cross that Rubicon in mainstream > languages. And why not? Because it will create all sorts of character-set-conversion hazards. In the current design of Postgres it flat out would not work, because template0 has to be encoding-agnostic. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
On Tue, Dec 13, 2016 at 12:22 PM, Ildar Musin wrote: > We've noticed that PartitionDispatch object is built on every INSERT query > and that it could create unnecessary overhead. Wouldn't it be better to keep > it in relcache? You might be able to cache some of that data in the relcache, but List *keystate is pointing to query-lifespan data, so you can't cache that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Indexes
On Mon, Dec 12, 2016 at 9:21 PM, Amit Kapila wrote: > The reason is to make the operations consistent in master and standby. > In WAL patch, for clearing the SPLIT_CLEANUP flag, we write a WAL and > if we don't release the lock after writing a WAL, the operation can > appear on standby even before on master. Currently, without WAL, > there is no benefit of doing so and we can fix by using > MarkBufferDirty, however, I thought it might be simpler to keep it > this way as this is required for enabling WAL. OTOH, we can leave > that for WAL patch. Let me know which way you prefer? It's not required for enabling WAL. You don't *have* to release and reacquire the buffer lock; in fact, that just adds overhead. You *do* have to be aware that the standby will perhaps see the intermediate state, because it won't hold the lock throughout. But that does not mean that the master must release the lock. >> I don't think we should be afraid to call MarkBufferDirty() instead of >> going through these (fairly stupid) hasham-specific APIs. > > Right and anyway we need to use it at many more call sites when we > enable WAL for hash index. I propose the attached patch, which removes _hash_wrtbuf() and causes those functions which previously called it to do MarkBufferDirty() directly. Aside from hopefully fixing the bug we're talking about here, this makes the logic in several places noticeably less contorted. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company diff --git a/src/backend/access/hash/hash.c b/src/backend/access/hash/hash.c index f1511d0..0eeb37d 100644 --- a/src/backend/access/hash/hash.c +++ b/src/backend/access/hash/hash.c @@ -635,7 +635,8 @@ loop_top: num_index_tuples = metap->hashm_ntuples; } - _hash_wrtbuf(rel, metabuf); + MarkBufferDirty(metabuf); + _hash_relbuf(rel, metabuf); /* return statistics */ if (stats == NULL) @@ -724,7 +725,6 @@ hashbucketcleanup(Relation rel, Bucket cur_bucket, Buffer bucket_buf, OffsetNumber deletable[MaxOffsetNumber]; int ndeletable = 0; bool retain_pin = false; - bool curr_page_dirty = false; vacuum_delay_point(); @@ -805,7 +805,7 @@ hashbucketcleanup(Relation rel, Bucket cur_bucket, Buffer bucket_buf, { PageIndexMultiDelete(page, deletable, ndeletable); bucket_dirty = true; - curr_page_dirty = true; + MarkBufferDirty(buf); } /* bail out if there are no more pages to scan. */ @@ -820,15 +820,7 @@ hashbucketcleanup(Relation rel, Bucket cur_bucket, Buffer bucket_buf, * release the lock on previous page after acquiring the lock on next * page */ - if (curr_page_dirty) - { - if (retain_pin) -_hash_chgbufaccess(rel, buf, HASH_WRITE, HASH_NOLOCK); - else -_hash_wrtbuf(rel, buf); - curr_page_dirty = false; - } - else if (retain_pin) + if (retain_pin) _hash_chgbufaccess(rel, buf, HASH_READ, HASH_NOLOCK); else _hash_relbuf(rel, buf); @@ -862,6 +854,7 @@ hashbucketcleanup(Relation rel, Bucket cur_bucket, Buffer bucket_buf, bucket_opaque = (HashPageOpaque) PageGetSpecialPointer(page); bucket_opaque->hasho_flag &= ~LH_BUCKET_NEEDS_SPLIT_CLEANUP; + MarkBufferDirty(bucket_buf); } /* @@ -873,7 +866,7 @@ hashbucketcleanup(Relation rel, Bucket cur_bucket, Buffer bucket_buf, _hash_squeezebucket(rel, cur_bucket, bucket_blkno, bucket_buf, bstrategy); else - _hash_chgbufaccess(rel, bucket_buf, HASH_WRITE, HASH_NOLOCK); + _hash_chgbufaccess(rel, bucket_buf, HASH_READ, HASH_NOLOCK); } void diff --git a/src/backend/access/hash/hashinsert.c b/src/backend/access/hash/hashinsert.c index 572146a..59c4213 100644 --- a/src/backend/access/hash/hashinsert.c +++ b/src/backend/access/hash/hashinsert.c @@ -208,11 +208,12 @@ restart_insert: (void) _hash_pgaddtup(rel, buf, itemsz, itup); /* - * write and release the modified page. if the page we modified was an + * dirty and release the modified page. if the page we modified was an * overflow page, we also need to separately drop the pin we retained on * the primary bucket page. */ - _hash_wrtbuf(rel, buf); + MarkBufferDirty(buf); + _hash_relbuf(rel, buf); if (buf != bucket_buf) _hash_dropbuf(rel, bucket_buf); diff --git a/src/backend/access/hash/hashovfl.c b/src/backend/access/hash/hashovfl.c index e2d208e..8fbf494 100644 --- a/src/backend/access/hash/hashovfl.c +++ b/src/backend/access/hash/hashovfl.c @@ -149,10 +149,11 @@ _hash_addovflpage(Relation rel, Buffer metabuf, Buffer buf, bool retain_pin) /* logically chain overflow page to previous page */ pageopaque->hasho_nextblkno = BufferGetBlockNumber(ovflbuf); + MarkBufferDirty(buf); if ((pageopaque->hasho_flag & LH_BUCKET_PAGE) && retain_pin) - _hash_chgbufaccess(rel, buf, HASH_WRITE, HASH_NOLOCK); + _hash_chgbufaccess(rel, buf, HASH_READ, HASH_NOLOCK); else - _hash_wrtbuf(rel, buf); + _hash_relbuf(rel, buf); return ovflbuf; } @@ -304,7 +305,8 @@ found: /* mark page "in use" in t
Re: [HACKERS] [OSSTEST PATCH 0/1] PostgreSQL db: Retry on constraint violation [and 2 more messages]
Kevin Grittner writes ("Re: [HACKERS] [OSSTEST PATCH 0/1] PostgreSQL db: Retry on constraint violation [and 2 more messages]"): > [various things] I get the feeling from your message that I have irritated you. I'm sorry if I have. In particular, I wanted to say that I certainly don't expect bugs to be fixed immediately. I appreciate your attention to detail. I also appreciate you taking the time to deal on your -hackers list with someone who really isn't a database hacker! I still hope to be able to convince you that the definition of SERIALIZABLE (in the pgsql docs) ought to be a stronger version, which covers even non-committed transactions. > [Ian Jackson:] > > But I have just demonstrated a completely general technique which can > > be used to convert any "spurious" constraint failure into a nonsense > > transaction actually committed to the database. > > Really? I see a lot of daylight between "you must discard any > results from a transaction which later throws a serialization > failure" and "if you catch the exception from a constraint > violation within a subtransaction it is possible to create a > serialization anomaly". I don't actually see how one relates to > the other. What am I missing? I don't think it can be useful to state (in the pgsql docs) the caveat you propose, to wit: "you must discard any results from a transaction which later throws a serialization failure" I think any scenario where this caveat declares a behaviour to be "not a bug", can be converted into a scenario where there is undoubtedly a bug. So this caveat does not actually reduce the difficulty of implementing the promises made by the documentation. The conversion is as follows: if a scenario is affected by the caveat, in there must be at least one transaction T which firstly produces "impossible" results I, and in which some later statement S produces a serialisation failure. To exhibit the corresponding unavoidable bug: Execute an identical scenario, with exactly the same sequence of steps in the same order, up to S. However, instead of S, execute ROLLBACK. Now this anomalous transaction T' is not "a transaction which later throws a serialization failure". There is no requirement to discard the "impossible" results I. The results I are supposed to be correct, but they are not. (I am making the hopefully-warranted assumption that ROLLBACK cannot itself cause a serialisation failure.) In theory this argument does not prove that every bug which the caveat is trying to address, is as important as the same bug without the caveat. This is because perhaps T' is an unrealistic transaction somehow. But I don't really think that the primary documentation should be complicated, and should give weaker statements about the system behaviour, simply as an aid to bug prioritisation. > > 1. Declare that all spurious failures, in SERIALIZABLE transactions, > > are bugs. > > It's not clear to me what you mean here. I mean that the pgsql documentation should simply say that for SERIALIZABLE transactions, there is always a coherent serialisation. Even if the transaction failed due to a constraint violation, or serialisation failure, or database disconnection, or whatever. I think there is no "useful" weaker guarantee. By which I mean that for any reasonable weaker guarantee: any scenario in which pgsql violates the stronger guarantee, can be converted into a scenario where pgsql violates the weaker guarantee. Conversely, any correct implementation of the weaker guarantee necessarily implements the stronger guarantee too. So the weaker guarantee does not make implementation easier. All it does is complicate application code. > > 2. State that the SERIALIZABLE guarantee in Postgresql only applies to > > transactions which (a) complete successsfully and (b) contain only > > very simple pgsql constructs. > > About 24 hours ago you reported a bug which was hard enough to hit > that it took 5 years for anyone to find it. Well, of course I have a slightly different point of view. The fact that pgsql might generate "spurious" constraint violations, even in SERIALIZABLE transactions, has been known for many years. See the URLs etc. in my patch, which include a reference from 2009. Many people have always regarded this as a bug. The pgsql authors have not agreed. >From my point of view, I have developed a trick which allows me to convince you that it has been a bug all along :-). > The behavior we have been providing, modulus the bug you just found > and any related bugs of the same ilk, is "we don't allow > serialization anomalies into the database or in results from > serializable transactions which commit." That is enough to be very > useful to many. As I explain, I think that if you make that promise, you will have to do all the work necessary to make the stronger promise anyway. So you may as well make the stronger (and more comprehensible!) promise. > > Of course I understand that these pr
[HACKERS] [PATCH] Rename pg_switch_xlog to pg_switch_wal
Based on discussion in https://www.postgresql.org/message-id/CAE1wr-w%3DLE1cK5uG_rmAh-VBxc4_Bnw-gAE3qSqL-%3DtWwvLvjQ%40mail.gmail.com . Tested via regression tests. To be applied in master only and to be included in 10.0. -- Vladimir Rusinov Storage SRE, Google Ireland Google Ireland Ltd.,Gordon House, Barrow Street, Dublin 4, Ireland Registered in Dublin, Ireland Registration Number: 368047 From 2502b9f36f08965a66f4125952f23e1f6ede4d83 Mon Sep 17 00:00:00 2001 From: Vladimir Rusinov Date: Mon, 12 Dec 2016 13:23:32 + Subject: [PATCH 1/1] Rename pg_switch_xlog to pg_switch_wal. Based on discussion in https://www.postgresql.org/message-id/CAE1wr-w%3DLE1cK5uG_rmAh-VBxc4_Bnw-gAE3qSqL-%3DtWwvLvjQ%40mail.gmail.com. Tested via regression tests. To be applied in master only and to be included in 10.0. Signed-off-by: Vladimir Rusinov --- doc/src/sgml/backup.sgml | 2 +- doc/src/sgml/func.sgml | 8 doc/src/sgml/high-availability.sgml| 2 +- src/backend/access/transam/xlogfuncs.c | 6 -- src/backend/catalog/system_views.sql | 2 +- src/include/access/xlog_fn.h | 2 +- src/include/catalog/pg_proc.h | 2 +- src/test/recovery/t/002_archiving.pl | 2 +- src/test/recovery/t/003_recovery_targets.pl| 2 +- src/test/regress/expected/hs_standby_functions.out | 2 +- src/test/regress/sql/hs_primary_extremes.sql | 2 +- src/test/regress/sql/hs_primary_setup.sql | 2 +- src/test/regress/sql/hs_standby_functions.sql | 2 +- 13 files changed, 19 insertions(+), 17 deletions(-) diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml index 6eaed1e..dc00a00 100644 --- a/doc/src/sgml/backup.sgml +++ b/doc/src/sgml/backup.sgml @@ -726,7 +726,7 @@ test ! -f /mnt/server/archivedir/000100A90065 && cp pg_wal/0 Also, you can force a segment switch manually with -pg_switch_xlog if you want to ensure that a +pg_switch_wal if you want to ensure that a just-finished transaction is archived as soon as possible. Other utility functions related to WAL management are listed in . diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index eca98df..e11aec9 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -17949,7 +17949,7 @@ SELECT set_config('log_statement_stats', 'off', false); pg_backup_start_time -pg_switch_xlog +pg_switch_wal pg_xlogfile_name @@ -18043,7 +18043,7 @@ SELECT set_config('log_statement_stats', 'off', false); -pg_switch_xlog() +pg_switch_wal() pg_lsn Force switch to a new transaction log file (restricted to superusers by default, but other users can be granted EXECUTE to run the function) @@ -18122,11 +18122,11 @@ postgres=# select pg_start_backup('label_goes_here'); -pg_switch_xlog moves to the next transaction log file, allowing the +pg_switch_wal moves to the next transaction log file, allowing the current file to be archived (assuming you are using continuous archiving). The return value is the ending transaction log location + 1 within the just-completed transaction log file. If there has been no transaction log activity since the last transaction log switch, -pg_switch_xlog does nothing and returns the start location +pg_switch_wal does nothing and returns the start location of the transaction log file currently in use. diff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml index 6b89507..a1dd93b 100644 --- a/doc/src/sgml/high-availability.sgml +++ b/doc/src/sgml/high-availability.sgml @@ -2201,7 +2201,7 @@ LOG: database system is ready to accept read only connections WAL file control commands will not work during recovery, -e.g. pg_start_backup, pg_switch_xlog etc. +e.g. pg_start_backup, pg_switch_wal etc. diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c index 01cbd90..6547b8f 100644 --- a/src/backend/access/transam/xlogfuncs.c +++ b/src/backend/access/transam/xlogfuncs.c @@ -277,13 +277,15 @@ pg_stop_backup_v2(PG_FUNCTION_ARGS) } /* - * pg_switch_xlog: switch to next xlog file + * pg_switch_wal: switch to next wal (aka xlog) file + * + * Renamed from pg_switch_xlog in 10.0 * * Permission checking for this function is managed through the normal * GRANT system. */ Datum -pg_switch_xlog(PG_FUNCTION_ARGS) +pg_switch_wal(PG_FUNCTION_ARGS) { XLogRecPtr switchpoint; diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql index df59d18..ffd1fad 100644 --- a/src/backend/catalog/system_views.sql +++ b/src/backend/catalog/system_views.sql @@ -1039,7 +1039,7 @@ REVOKE EXECUTE ON FUNCTION pg_start_backup(text, boolean, boolean) FROM public;
Re: [HACKERS] Declarative partitioning - another take
Hi hackers, On 08.12.2016 19:44, Dmitry Ivanov wrote: That would be fantastic. I and my colleagues at EnterpriseDB can surely help review; of course, maybe you and some of your colleagues would like to help review our patches, too. Certainly, I'll start reviewing as soon as I get familiar with the code. Do you think this is likely to be something where you can get something done quickly, with the hope of getting it into v10? Yes, I've just cleared my schedule in order to make this possible. I'll bring in the patches ASAP. We've noticed that PartitionDispatch object is built on every INSERT query and that it could create unnecessary overhead. Wouldn't it be better to keep it in relcache? Thanks! -- Ildar Musin i.mu...@postgrespro.ru -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] exposing wait events for non-backends (was: Tracking wait event for latches)
On Mon, Dec 12, 2016 at 6:45 PM, Kevin Grittner wrote: > On Mon, Dec 12, 2016 at 10:33 AM, Andres Freund > wrote: > > On 2016-12-12 13:26:32 -0300, Alvaro Herrera wrote: > >> Robert Haas wrote: > > >>> 1. Show all processes that have a PGPROC in pg_stat_activity, > > >>> 2. Add a second view, say pg_stat_system_activity, > > >> I vote 1. > > > > +1 > > +1 > +1 more, if we're still counting. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] jsonb problematic operators
On Tue, Dec 13, 2016 at 10:26:24AM +0900, Michael Paquier wrote: > On Mon, Dec 12, 2016 at 10:22 PM, Greg Stark wrote: > > One option might be for Postgres to define duplicate operator names > > using ¿ or something else. I think ¿ is a good choice because it's a > > common punctuation mark in spanish so it's probably not hard to find > > on a lot of keyboards or hard to find instructions on how to type one. > > Are you sure that using a non-ASCII character is a good idea for an > in-core operator? I would think no. Eventually language designers will cross that Rubicon in mainstream languages. And why not? It sure would be convenient... from the designer's p.o.v. Of course, _users_ would be annoyed, as most users in the English-speaking world will have no idea how to type such characters, most others also will not know how to, and there will be users still using non-Unicode locales who will be unable to type such characters at all. Cut-n-paste will save the day, not doubt, though mostly/only for users using Unicode locales. But it is tempting. Using non-ASCII Unicode characters for _alternatives_ seems like a possible starting point though, since that leaves users with a universally- available ASCII alternative. Still, now users would then have to recognize multiple equivalent forms... ugh. Nico -- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb problematic operators
On Mon, Dec 12, 2016 at 8:26 PM, Michael Paquier wrote: > On Mon, Dec 12, 2016 at 10:22 PM, Greg Stark wrote: >> On 12 December 2016 at 04:59, Craig Ringer wrote: >>> I didn't realise Pg's use of ? was that old, so thanks. That makes >>> offering alternatives much less appealing. >> >> One option might be for Postgres to define duplicate operator names >> using ¿ or something else. I think ¿ is a good choice because it's a >> common punctuation mark in spanish so it's probably not hard to find >> on a lot of keyboards or hard to find instructions on how to type one. >> >> There is always a risk in allowing redundant syntaxes though. For >> example people running grep to find all uses of an operator will miss >> the alternate spelling. There may even be security implications for >> that though to be honest that seems unlikely in this case. > > Are you sure that using a non-ASCII character is a good idea for an > in-core operator? I would think no. I would agree with your thought. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [OSSTEST PATCH 0/1] PostgreSQL db: Retry on constraint violation [and 2 more messages]
On Tue, Dec 13, 2016 at 9:50 AM, Ian Jackson wrote: > Kevin Grittner writes ("Re: [HACKERS] [OSSTEST PATCH 0/1] PostgreSQL db: > Retry on constraint violation [and 2 more messages]"): >> On Tue, Dec 13, 2016 at 5:30 AM, Ian Jackson >> wrote: >>> Are all of these cases fixed by fcff8a57519847 "Detect SSI conflicts >>> before reporting constraint violations" ? >> >> No. It was specifically meant to address duplicate keys, and there >> was one particular set of steps which it was not able to address. >> See post by Thomas Munro. Hopefully, he, I, or someone else will >> have a chance to work on the one known remaining issue and look for >> others. Your efforts have been helpful; it would be great if you >> can find and document any other test cases which show a >> less-than-ideal SQLSTATE or other outright serialization anomalies. > > Well, my own usage of postgresql is not really that advanced and we do > not have very many complicated constraints. So for projects I'm > responsible for, what has been done in 9.6 is going to be good enough > in practice (when it finally bubbles through via my distro etc.); and > I have a workaround for now. I worked in a shop with over 30,000 types of transactions, and *none* of them involved using a subtransaction to catch and ignore an attempted constraint violation. I know of several larger shops which have not experienced such problems, either. I agree you found a bug and it needs to be fixed, but it doesn't seem to be a bug that occurs in common usage patterns; otherwise, it would probably have taken less than five years for it to be found. > But I am trying to save others (who may have more complicated > situations) from being misled. A motive I certainly appreciate. >>> All statements in such transactions, even aborted transactions, need >>> to see results, and have behaviour, which are completely consistent >>> with some serialisaton of all involved transactions. This must apply >>> up to (but not including) any serialisation failure error. >> >> If I understand what you are saying, I disagree. > > But I have just demonstrated a completely general technique which can > be used to convert any "spurious" constraint failure into a nonsense > transaction actually committed to the database. Really? I see a lot of daylight between "you must discard any results from a transaction which later throws a serialization failure" and "if you catch the exception from a constraint violation within a subtransaction it is possible to create a serialization anomaly". I don't actually see how one relates to the other. What am I missing? > I think there are only two possible coherent positions: > > 1. Declare that all spurious failures, in SERIALIZABLE transactions, > are bugs. It's not clear to me what you mean here. > 2. State that the SERIALIZABLE guarantee in Postgresql only applies to > transactions which (a) complete successsfully and (b) contain only > very simple pgsql constructs. About 24 hours ago you reported a bug which was hard enough to hit that it took 5 years for anyone to find it. I think you might consider allowing a bit more time to analyze the situation before declaring that serializable transactions only work "in very simple constructs". Some of the transactions with which I have seen it reliably work involved in excess of 20,000 lines of procedural code. What it seems to me we should do is try to identify the conditions under which a bug can occur (which seems likely to be limited to a subset of cases where errors thrown by declarative constraints are caught within a subtransaction, and the subtransaction work is discarded without throwing another error), and see whether the bugs can be fixed. > I think (2) would be rather a bad concession! (It is probably also > contrary to some standards document somewhere, if that matters.) > Furthermore if you adopt (2) you would have to make a list of the > "safe" pgsql constructs. I think the unsafe constructs are likely to be a *much* shorter list. > That would definitely exclude the exception trapping facility; but > what other facilities or statements might be have similar effects ? > ISTM that very likely INSERT ... ON CONFLICT can be used the same way. > Surely you do not want to say "PostgreSQL does not give a > transactional guarantee when INSERT ... ON CONFLICT is used". I think we've chased down the bugs in this area. Counterexamples welcome. >> To prevent incorrect results from being returned even when a >> transaction later fails with a serialization failure would require >> blocking > > I'm afraid I don't understand enough about database implementation to > answer this with confidence. But this does not seem likely to be > true. Blocking can surely always be avoided, by simply declaring a > serialisation failure instead of blocking. I have no idea whether > that is a practical approach in the general case, or whether it brings > its own problems. The "false positive" rate (throw
Re: [HACKERS] exposing wait events for non-backends (was: Tracking wait event for latches)
On Mon, Dec 12, 2016 at 8:13 PM, Michael Paquier wrote: > On Tue, Dec 13, 2016 at 10:05 AM, Craig Ringer wrote: >> We should probably expose a proc_type or something, with types: >> >> * client_backend >> * bgworker >> * walsender >> * autovacuum >> * checkpointer >> * bgwriter > > A text field is adapted then, more than a single character. Sure. >> for simpler filtering. >> >> I don't think existing user code is likely to get upset by more >> processes appearing in pg_stat_activity, and it'll be very handy. > > Indeed, for WAL senders now abusing of the query field is definitely > not consistent. Even if having this information is useful, adding such > a column would make sense. Still, one thing that is important to keep > with pg_stat_activity is the ability to count the number of > connections that are part of max_connections for monitoring purposes. > The docs definitely would need an example of such a query counting > only client_backend and WAL senders and tell users that this can be > used to count how many active connections there are. Let's confine ourselves to fixing one problem at a time. I think we can get where we want to be in this case by adding one new column and some new rows to pg_stat_activity. Michael, is that something you're going to do? If not, one of my esteemed colleagues here at EnterpriseDB will have a try. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
On Tue, Dec 13, 2016 at 1:58 AM, Amit Langote wrote: > Attaching the above patch, along with some other patches posted earlier, > and one more patch fixing another bug I found. Patch descriptions follow: > > 0001-Miscallaneous-code-and-comment-improvements.patch > > Fixes some obsolete comments while improving others. Also, implements > some of Tomas Vondra's review comments. Committed with some pgindenting. > 0002-Miscallaneous-documentation-improvements.patch > > Fixes inconsistencies and improves some examples in the documentation. > Also, mentions the limitation regarding row movement. Ignored because I committed what I think is the same or a similar patch earlier. Please resubmit any remaining changes. > 0003-Invalidate-the-parent-s-relcache-after-partition-cre.patch > > Fixes a bug reported by Tomas, whereby a parent's relcache was not > invalidated after creation of a new partition using CREATE TABLE PARTITION > OF. This resulted in tuple-routing code not being to able to find a > partition that was created by the last command in a given transaction. Shouldn't StorePartitionBound() be responsible for issuing its own invalidations, as StorePartitionKey() already is? Maybe you'd need to pass "parent" as another argument, but that way you know you don't have the same bug at some other place where the function is called. > 0004-Fix-a-bug-of-insertion-into-an-internal-partition.patch > > Fixes a bug I found this morning, whereby an internal partition's > constraint would not be enforced if it is targeted directly. See example > below: > > create table p (a int, b char) partition by range (a); > create table p1 partition of p for values from (1) to (10) partition by > list (b); > create table p1a partition of p1 for values in ('a'); > insert into p1 values (0, 'a'); -- should fail, but doesn't I expect I'm missing something here, but why do we need to hoist RelationGetPartitionQual() out of InitResultRelInfo() instead of just having BeginCopy() pass true instead of false? (Also needs a rebase due to the pgindent cleanup.) > 0005-Fix-a-tuple-routing-bug-in-multi-level-partitioned-t.patch > > Fixes a bug discovered by Dmitry Ivanov, whereby wrong indexes were > assigned to the partitions of lower levels (level > 1), causing spurious > "partition not found" error as demonstrated in his email [1]. Committed. It might have been good to include a test case. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hang in pldebugger after git commit : 98a64d0
On 2016-12-12 16:46:38 +0900, Michael Paquier wrote: > OK, I think that I have spotted an issue after additional read of the > code. When a WSA event is used for read/write activity on a socket, > the same WSA event gets reused again and again. That's fine for > performance reasons It's actually also required to not loose events, i.e. correctness. Windows events are edge, not level triggered. The previous implementation wasn't correct. So just resetting things ain't ok. - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hang in pldebugger after git commit : 98a64d0
Hi All, I have been working on this issue for last few days and it seems like i have almost found the reason for this failure. My findings till date are as follows. Just to confirm if the problem is happening due to reusability of WaitEventSet structure, I had replaced a function call ModifyWaitEvent() and WaitEventSetWait() which makes use of an already existing WaitEventSet to store event handles with WaitLatchOrSocket() and it worked. Actually WaitLatchOrSocket() creates a new WaitEventSet structure every time it has to wait for an event and is also being used in the old code. This clearly shows that we do have some problem just because we are reusing the same set of object handles throughput a backend session. I am further investigating on this and would post the final analysis along with the fix very soon. Attached is the patch that has the changes described above. Any suggestions or inputs would be appreciated. On Tue, Dec 13, 2016 at 9:34 PM, Ashutosh Sharma wrote: > Hi Micheal, > >> >> Ashutosh, could you try it and see if it improves things? >> - > > Thanks for your patch. I would like to inform you that I didn't find any > improvement with your patch. > > With Regards, > Ashutosh Sharma > EnterpriseDB: http://www.enterprisedb.com fix_wait_events.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hang in pldebugger after git commit : 98a64d0
Hi Micheal, > Ashutosh, could you try it and see if it improves things? > - > Thanks for your patch. I would like to inform you that I didn't find any improvement with your patch. With Regards, Ashutosh Sharma EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Patch to implement pg_current_logfile() function
Hi Gilles, I don't plan to be able to get back to this patch until late this week or early next week. My plan is to then go though the whole thing and fix everything I can find. If we're both working at the same time this could lead to wasted effort so I will write as soon as I start work and if you are working at the same time I'll send smaller hunks. By the by, my last email contained some stupidity in it's code suggestion because it is structured like: while if (...) do something; else break; do more...; Stupid to have an if/else construct. while if (!...) break; do something; do more...; Is cleaner. Apologies if my coding out loud is confusing things. I'll fix this in the next go-round if you don't get to it first. Regards, Karl Free Software: "You don't pay back, you pay forward." -- Robert A. Heinlein -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [OSSTEST PATCH 0/1] PostgreSQL db: Retry on constraint violation [and 2 more messages]
Kevin Grittner writes ("Re: [HACKERS] [OSSTEST PATCH 0/1] PostgreSQL db: Retry on constraint violation [and 2 more messages]"): > On Tue, Dec 13, 2016 at 5:30 AM, Ian Jackson > wrote: > > Are all of these cases fixed by fcff8a57519847 "Detect SSI conflicts > > before reporting constraint violations" ? > > No. It was specifically meant to address duplicate keys, and there > was one particular set of steps which it was not able to address. > See post by Thomas Munro. Hopefully, he, I, or someone else will > have a chance to work on the one known remaining issue and look for > others. Your efforts have been helpful; it would be great if you > can find and document any other test cases which show a > less-than-ideal SQLSTATE or other outright serialization anomalies. Well, my own usage of postgresql is not really that advanced and we do not have very many complicated constraints. So for projects I'm responsible for, what has been done in 9.6 is going to be good enough in practice (when it finally bubbles through via my distro etc.); and I have a workaround for now. But I am trying to save others (who may have more complicated situations) from being misled. > > All statements in such transactions, even aborted transactions, need > > to see results, and have behaviour, which are completely consistent > > with some serialisaton of all involved transactions. This must apply > > up to (but not including) any serialisation failure error. > > If I understand what you are saying, I disagree. But I have just demonstrated a completely general technique which can be used to convert any "spurious" constraint failure into a nonsense transaction actually committed to the database. Of course my transactions are contrived, but I don't see a way to rule out the possibility that a real application might do something similar. I think there are only two possible coherent positions: 1. Declare that all spurious failures, in SERIALIZABLE transactions, are bugs. 2. State that the SERIALIZABLE guarantee in Postgresql only applies to transactions which (a) complete successsfully and (b) contain only very simple pgsql constructs. I think (2) would be rather a bad concession! (It is probably also contrary to some standards document somewhere, if that matters.) Furthermore if you adopt (2) you would have to make a list of the "safe" pgsql constructs. That would definitely exclude the exception trapping facility; but what other facilities or statements might be have similar effects ? ISTM that very likely INSERT ... ON CONFLICT can be used the same way. Surely you do not want to say "PostgreSQL does not give a transactional guarantee when INSERT ... ON CONFLICT is used". > To prevent incorrect results from being returned even when a > transaction later fails with a serialization failure would require > blocking I'm afraid I don't understand enough about database implementation to answer this with confidence. But this does not seem likely to be true. Blocking can surely always be avoided, by simply declaring a serialisation failure instead of blocking. I have no idea whether that is a practical approach in the general case, or whether it brings its own problems. But whether or not it is true, I think that "it's hard" is not really a good answer to "PostgreSQL should implement behaviour for SERIALIZABLE which can be coherently described and which covers practically useful use cases". > > I am concerned that there are other possible bugs of this form. > > In earlier messages on this topic, it has been suggested that the > > "impossible" unique constraint violation is only one example of a > > possible "leakage". > > As I see it, the main point of serializable transactions is to > prevent serialization anomalies from being persisted in the > database or seen by a serializable transaction which successfully > commits. As I have demonstrated, "spurious error" conditions can result in "serialisation anomaly persisted in the database". So there is not a real distinction here. There is another very important use case for serialisable transactions which is read-only report transactions. An application doing such a transaction needs to see a consistent snapshot. Such a transaction is readonly so will never commit. Any reasonable definition of what SERIALIZABLE means needs to give a sensible semantics for readonly transactions. > Well, the test includes commits and teardown, but this gets you to > the problem. Connection2 gets this: > > ERROR: duplicate key value violates unique constraint "invoice_pkey" > DETAIL: Key (year, invoice_number)=(2016, 3) already exists. I think my error trapping technique can be used to convert this into a scenario which commits "impossible" information to the database. Do you agree ? > If connection1 had explicitly read the "gap" into which it inserted > its row (i.e., with a SELECT statement) there would be a > serialization failure instead. Getting the RI index maintenance
Re: [HACKERS] [PATCH] Fix for documentation of timestamp type
On Tue, Dec 13, 2016 at 10:41 AM, Tom Lane wrote: > Robert Haas writes: >> I find this a bit unclear, because the revised text kind of jumps back >> and forth between the floating-point and integer formats. Perhaps >> something like this: > > Your wording seems OK to me, although I'd drop the "instead". Good idea. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
On Fri, Dec 9, 2016 at 5:46 AM, Maksim Milyutin wrote: > I would like to work on two tasks: > - insert (and eventually update) tuple routing for foreign partition. > - the ability to create an index on the parent and have all of the children > inherit it; > > The first one has been implemented in pg_pathman somehow, but the code > relies on dirty hacks, so the FDW API has to be improved. As for the > extended index support, it doesn't look like a super-hard task. Great! I think that the second one will be fairly tricky. You will need some way of linking the child indexes back to the parent index. And then you will need to prohibit them from being dropped or modified independent of the parent index. And you will need to cascade ALTER commands on the parent index (which will have no real storage, I hope) down to the children. Unfortunately, index names have to be unique on a schema-wide basis, so we'll somehow need to generate names for the "child" indexes. But those names might collide with existing objects, and will need to be preserved across a dump-and-restore. The concept is simple but getting all of the details right is hard. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Fix for documentation of timestamp type
Robert Haas writes: > I find this a bit unclear, because the revised text kind of jumps back > and forth between the floating-point and integer formats. Perhaps > something like this: Your wording seems OK to me, although I'd drop the "instead". regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
On Thu, Dec 8, 2016 at 11:58 PM, Amit Langote wrote: > Hierarchical lock manager stuff is interesting. Are you perhaps alluding > to a new *intention* lock mode as described in the literature on multiple > granularity locking [1]? Yes. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Fix for documentation of timestamp type
On Mon, Dec 12, 2016 at 8:50 AM, Aleksander Alekseev wrote: > I suggest to rewrite the documentation a bit to make it more clear that > by default timestamp is stored in microseconds. Corresponding patch is > attached. I find this a bit unclear, because the revised text kind of jumps back and forth between the floating-point and integer formats. Perhaps something like this: When timestamp values are stored as eight-byte integers (currently the default), microsecond precision is available over the full range of values. In this case, the internal representation is the number of microseconds before or after midnight 2000-01-01. When timestamp values are stored as double precision floating-point numbers instead (a deprecated compile-time option), the internal representation is the number of seconds before or after midnight 2000-01-01. With this representation, the effective limit of precision might be less than 6; in practice, microsecond precision is achieved for dates within a few years of 2000-01-01, but the precision degrades for dates further away. Note that using floating-point datetimes allows a larger range of timestamp values to be represented than shown above: from 4713 BC up to 5874897 AD. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_catversion builtin function
Jesper Pedersen writes: > Attached is a new builtin function that exposes the CATALOG_VERSION_NO > constant under the pg_catversion() function, e.g. I'm pretty sure that we intentionally didn't expose that, reasoning that users should only care about the user-visible version number. What exactly is the argument for exposing this? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch to implement pg_current_logfile() function
On Thu, Dec 8, 2016 at 4:34 PM, Karl O. Pinc wrote: >> I do think that the blizzard of small patches that you've >> submitted in some of your emails may possibly be a bit overwhelming. >> We shouldn't really need a stack of a dozen or more patches to >> implement one small feature. Declarative partitioning only had 7. >> Why does this need more than twice that number? > > I'm trying to break up changes into patches which are as small > as possible to make them more easily understood by providing > a guide for the reader/reviewer. So rather than > a single patch I'd make, say, 3. One for documentation to describe > the change. Another which does whatever refactoring is necessary > to the existing code, but which otherwise does not introduce any > functional changes. And another which adds the new code which makes > use of the refactoring. At each stage the code should continue > to work without bugs. The other party can then decide to incorporate > the patchs into the main patch, which is itself another attached > patch. Sure, I don't think the intention is bad. It didn't really work out here, perhaps because the individual changes were too small. I think for small changes it's often more helpful to say "change X to be like Y" than to provide a patch, or else it's worth combining several small patches just to keep the number from getting too big. I don't know; I can't point to anything you really did wrong here; the process just didn't seem to be converging. > It is worth noting that the PG pg_current_logfile() function > is dependent upon the content of the current_logfiles file. So > if the file is wrong the function will also give wrong answers. > But I don't care if you don't. Well, I want the current_logfiles file to be correct, but that doesn't mean that it's reasonable to expect people to enumerate all the logfiles that have ever existed by running it. That would be fragile for tons of reasons, like whether or not the server hits the connection limit at the wrong time, whether network connectivity is unimpaired, whether the cron job that's supposed to run it periodically hiccups for some stupid reason, and many others. > A related thought is this. There are 3 abstraction levels of > concern. The level beneath PG, the PG level, and the level > of the user's application. When PG detects an error from either > "above" or "below" its job is to describe the problem from > its own perspective. HINTs are attempts to cross the boundary > into the application's perspective. Yes, I agree. EnterpriseDB occasionally get complaints from our customers saying that PG is buggy because it reported an operating system error. No, really, if the operating system can't read the file, that's not our fault! Call your sysadmin! I also agree with your perspective on HINTs. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch to implement pg_current_logfile() function
On Mon, Dec 12, 2016 at 4:31 AM, Gilles Darold wrote: > Applied in last version of the patch v18 as well as removing of an > unused variable. OK, this looks a lot closer to being committable. But: @@ -570,11 +583,13 @@ SysLogger_Start(void) * a time-based rotation. */ first_syslogger_file_time = time(NULL); -filename = logfile_getname(first_syslogger_file_time, NULL); +last_file_name = logfile_getname(first_syslogger_file_time, NULL); + +syslogFile = logfile_open(last_file_name, "a", false); -syslogFile = logfile_open(filename, "a", false); +update_metainfo_datafile(); -pfree(filename); +pfree(last_file_name); #ifdef EXEC_BACKEND switch ((sysloggerPid = syslogger_forkexec())) This leaves last_file_name pointing to free'd storage, which seems like a recipe for bugs, because the next call to update_metainfo_datafile() might try to follow that pointer. I think you need to make a clear decision about what the contract is for last_file_name and last_csv_file_name. Perhaps the idea is to always keep them valid, but then you need to free the old value when reassigning them and not free the current value. The changes to logfile_rotate() appear to be mostly unrelated refactoring which Karl was proposing for separate commit, not to be incorporated into this patch. Please remove whatever deltas are not essential to the new feature being implemented. misc.c no longer needs to add an include of . I hope, anyway. + errmsg("log format not supported, possible values are stderr or csvlog"))); This doesn't follow our message style guidelines. https://www.postgresql.org/docs/devel/static/error-style-guide.html Basically, a comma splice is not an acceptable way of joining together two independent thoughts. Obviously, people speak and write that way informally all the time, but we try to be a bit rigorous about grammar in our documentation and error messages. I think the way to do this would be something like errmsg("log format \"%s\" is not supported"), errhint("The supported log formats are \"stderr\" and \"csvlog\"."). +FreeFile(fd); +ereport(ERROR, +(errcode_for_file_access(), + errmsg("could not read file \"%s\": %m", +LOG_METAINFO_DATAFILE))); You don't need to FreeFile(fd) before ereport(ERROR). See header comments for AllocateFile(). +/* report the entry corresponding to the requested format */ +if (strcmp(logfmt, log_format) == 0) +break; +} +/* Close the current log filename file. */ +if (FreeFile(fd)) +ereport(ERROR, +(errcode_for_file_access(), + errmsg("could not close file \"%s\": %m", +LOG_METAINFO_DATAFILE))); + +if (lbuffer[0] == '\0') +PG_RETURN_NULL(); + +/* Recheck requested log format against the one extracted from the file */ +if (logfmt != NULL && (log_format == NULL || +strcmp(logfmt, log_format) != 0)) +PG_RETURN_NULL(); Your email upthread claims that you fixed this per my suggestions, but it doesn't look fixed to me. That check to see whether lbuffer[0] == '\0' is either protecting against nothing at all (in which case it could be removed) or it's protecting against some weirdness that can only happen here because of the strange way this logic is written. It's hard to understand all the cases here because we can exit the loop either having found the entry we want or not, and there's no clear indication of which one it is. Why not change the if-statement at the end of the loop like this: if (logfmt == NULL || strcmp(logfmt, log_format) == 0) { FreeFile(fd); PG_RETURN_TEXT_P(cstring_to_text(log_filepath)); } Then after the loop, just: FreeFile(fd); PG_RETURN_NULL(); + + A file recording the log file(s) currently written to by the syslogger + and the file's log formats, stderr + or csvlog. Each line of the file is a space separated list of + two elements: the log format and the full path to the log file including the + value of . The log format must be present + in to be listed in + current_logfiles. + + + + + pg_ctl + and current_logfiles + + + stderr + and current_logfiles + + + log_destination configuration parameter + and current_logfiles + + + +Although logs directed to stderr may be written +to the filesystem, when the writing of stderr is +managed outside of the PostgreSQL database +server the location of such files in the filesystem is not reflected in +the content of current_logfiles. One such case is +when the pg_ctl command is used to start +the postgres database server, capture +the stderr output of the server, and direct it to a +file. + + + +There are other notable situations related +to stderr logging. +Non-Po
Re: [HACKERS] Fixing matching of boolean index columns to sort ordering
On Mon, Dec 12, 2016 at 10:08 PM, Tom Lane wrote: > Every WHERE clause is a > boolean expression, so there's no principled reason why such a rule > wouldn't result in canonicalizing e.g. "i = 42" into "(i = 42) = true", > wreaking havoc on every other optimization we have. Restricting it > to only apply to simple boolean columns is no answer because (a) indexes > can be on boolean expressions not just simple columns, and (b) part > of the point of the canonicalization is to make logically-equivalent > expressions look alike, so declining to apply it in some cases would > ruin that. > Given our reliance on operators in indexing a canonicalization rule that says all boolean yielding expressions must contain an operator would yield the desired result. "i = 42" already has an operator so no further normalization is necessary to make it conform to such a rule. The indexing concern may still come into play here, I'm not familiar enough with indexes on column lists versus indexes on expressions to know off the top of my head. The definition of "looking alike" seems like it would be met since all such expression would look alike in having an operator. That said, not adding "= true" is more visually appealing - though given all of the other things we do in ruleutils this seems like a minor addition. David J.
Re: [HACKERS] Proposal : Parallel Merge Join
On Tue, Dec 13, 2016 at 6:42 AM, Dilip Kumar wrote: >> This patch is hard to read because it is reorganizing a bunch of code >> as well as adding new functionality. Perhaps you could separate it >> into two patches, one with the refactoring and then the other with the >> new functionality. > > Okay, I can do that. I have created two patches as per the suggestion. 1. mergejoin_refactoring_v2.patch --> Move functionality of considering various merge join path into new function. 2. parallel_mergejoin_v2.patch -> This adds the functionality of supporting partial mergejoin paths. This will apply on top of mergejoin_refactoring_v2.patch. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com mergejoin_refactoring_v2.patch Description: Binary data parallel_mergejoin_v2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] postgres_fdw bug in 9.6
On Mon, Dec 12, 2016 at 5:45 PM, Tom Lane wrote: > One way that we could make things better is to rely on the knowledge > that EPQ isn't asked to evaluate joins for more than one row per input > relation, and therefore using merge or hash join technology is really > overkill. We could make a tree of simple nestloop joins, which aren't > going to care about sort order, if we could identify the correct join > clauses to apply. At least some of that could be laid off on the FDW, > which if it's gotten this far at all, ought to know what join clauses > need to be enforced by the foreign join. So I'm thinking a little bit > in terms of "just collect the foreign scans for all the base rels > included in the join and then build a cross-product nestloop join tree, > applying all the join clauses at the top". This would have the signal > value that it's guaranteed to succeed and so can be left for later, > rather than having to expensively redo it at each level of join planning. > > (Hm, that does sound a lot like "throw it away and start again", doesn't > it. But what we've got here is busted enough that I'm not sure there > are good alternatives. Maybe for 9.6 we could just rewrite > GetExistingLocalJoinPath, and limp along doing a lot of redundant > computation during planning.) I think there was an earlier version of the patch that actually built up NestLoop joins as it went, but at some point (possibly at my suggestion) it got rewritten to try to fish out existing paths instead. Apparently, the other idea was better. > BTW, what's "existing" about the result of GetExistingLocalJoinPath? It's an existing path - i.e. already added to the RelOptInfo - to perform the join locally. > And for that matter, what's "outer" about the content of fdw_outerpath? It's got the same meaning here that "outer path" does in general. Right now, a Foreign Scan only has an outer subpath if it needs one for EPQ rechecks, but in theory I suppose it could be used for something else; every Plan has a lefttree, whether or not we choose to populate it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [OSSTEST PATCH 0/1] PostgreSQL db: Retry on constraint violation [and 2 more messages]
On Tue, Dec 13, 2016 at 5:30 AM, Ian Jackson wrote: > I am concerned that there are other possible bugs of this form. > In earlier messages on this topic, it has been suggested that the > "impossible" unique constraint violation is only one example of a > possible "leakage". As I see it, the main point of serializable transactions is to prevent serialization anomalies from being persisted in the database or seen by a serializable transaction which successfully commits. It is certainly very nice from a programming perspective if the SQLSTATE permits easy identification of which failures it can be expected to probably yield a different result on retry, but it doesn't seem to me that the standard requires that, and other researchers and developers in this area have taken advantage of the fact that constraints prevent certain types of serialization anomalies from reaching the database. In the initial implementation of serializable transactions we noted papers that described this, and counted on it for correctness. Note that with a one year development cycle for major releases, a feature often initially gets implemented in a "bare bones" format that is useful but not ideal, and later releases build on it. Unfortunately there has not been anyone putting the resources into building on the initial implementation (in 9.1) as the current implementation has worked well enough for people to be focused on other areas. > Earlier you wrote: > > If I recall correctly, the constraints for which there can be > errors appearing due to concurrent transactions are primary key, > unique, and foreign key constraints. I don't remember seeing it > happen, but it would not surprise me if an exclusion constraint can > also cause an error due to a concurrent transaction's interaction > with the transaction receiving the error. > > Are all of these cases fixed by fcff8a57519847 "Detect SSI conflicts > before reporting constraint violations" ? No. It was specifically meant to address duplicate keys, and there was one particular set of steps which it was not able to address. See post by Thomas Munro. Hopefully, he, I, or someone else will have a chance to work on the one known remaining issue and look for others. Your efforts have been helpful; it would be great if you can find and document any other test cases which show a less-than-ideal SQLSTATE or other outright serialization anomalies. > I can try playing around with other kind of constraints, to try to > discover different aspects or versions of this bug, but my knowledge > of the innards of databases is very limited and I may not be > particularly effective. Certainly if I try and fail, I wouldn't have > confidence that no such bug existed. Right. Proving the absence of any bug when dealing with race conditions is notoriously hard. > All statements in such transactions, even aborted transactions, need > to see results, and have behaviour, which are completely consistent > with some serialisaton of all involved transactions. This must apply > up to (but not including) any serialisation failure error. If I understand what you are saying, I disagree. To prevent incorrect results from being returned even when a transaction later fails with a serialization failure would require blocking, and would have a major detrimental effect on both concurrency and throughput compared to the techniques PostgreSQL is using. As far as I'm aware, the guarantee you seek can only be provided by strict two phase locking (S2PL). Benchmarks by Dan R. K. Ports of MIT CSAIL showed S2PL to be consistently slower than the SSI techniques used by PostgreSQL -- with throughput up to almost 5x worse in some workloads, even with SSI's requirement that results from a transaction which later gets a serialization failure must be ignored. Please see Section 8, and particularly the performance of S2PL in Figure 4 of this paper, which Dan and I presented to the VLDB conference in Istanbul: http://vldb.org/pvldb/vol5/p1850_danrkports_vldb2012.pdf > It would be nice if the documentation stated the error codes that > might be generated. AFAICT that's just 40P01 and 40001 ? Those are the only ones I know of. > (I'm not sure what 40002 is.) That doesn't appear to me to be related to transaction serialization issues. >> For the record, read-write-unique-4.spec's permutation r2 w1 w2 c1 c2 >> remains an open question for further work. > > Is this another possible bug of this form ? Yes. See the last specified permutation in this file: https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/test/isolation/specs/read-write-unique-4.spec;h=ec447823484cff99a61f2c2e67ed3aef2210d680;hb=refs/heads/master If it's not clear how to read that to construct the problem case, the README file might help: https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/test/isolation/README;h=bea278a856fffc911a7819fe8055f7e328fcac5f;hb=refs/heads/master I guess it's short enough to jus
Re: [HACKERS] postgres_fdw bug in 9.6
On Tue, Dec 13, 2016 at 4:15 AM, Tom Lane wrote: > Jeff Janes writes: >> I have a test case where I made the fdw connect back to itself, and >> stripped out all the objects that I could and still reproduce the case. It >> is large, 21MB compressed, 163MB uncompressed, so I am linking it here: >> https://drive.google.com/open?id=0Bzqrh1SO9FcEZkpPM0JwUEMzcmc > > Thanks for the test case. I believe I understand the fundamental problem, > or one of the fundamental problems --- I'm not exactly convinced there > aren't several here. > > The key issue is that GetExistingLocalJoinPath believes it can return > any random join path as what to use for the fdw_outerpath of a foreign > join. You can get away with that, perhaps, as long as you only consider > two-way joins. But in a nest of foreign joins, this fails to consider > the interrelationships of join paths and their children. In particular, > what we have happening in this example is that GetExistingLocalJoinPath > seizes randomly on a MergePath for an upper join relation, clones it, > sees that the outer child is a join ForeignPath, and replaces that outer > child with its fdw_outerpath ... which was chosen equally cavalierly by > some previous execution of GetExistingLocalJoinPath, and does not have > the sort ordering expected by the MergePath. So we'd generate an invalid > EPQ plan, except that the debug cross-check in create_mergejoin_plan > notices the discrepancy. Thanks for the detailed explanation. > > I believe there are probably more problems here, or at least if there > aren't, it's not clear why not. Because of GetExistingLocalJoinPath's > lack of curiosity about what's underneath the join pathnode it picks, > it seems to me that it's possible for it to return a path tree that > *isn't* all local joins. If we're looking at, say, a hash path for > a 4-way join, whose left input is a hash path for a 3-way join, whose > left input is a 2-way foreign join, what's stopping that from being > returned as a "local" path tree? Right, the so called "local join tree" can contain ForeignPath representing joins between foreign relations. AFAIU what protects us from getting into problem is that postgresRecheckForeignScan calls ExecProcNode() on the outer plan. So, even if there are ForeignPaths in fdw_outerpath tree, corresponding plans would use a local join plan to apply EPQ recheck. The code in GetExistingLocalJoinPath() avoids the redirection at least for the inner or outer ForeignPaths. Any comment claiming that path tree under fdw_outerpath is entirely "local" should be removed, if it survives the fix. > > Likewise, it seems like the code is trying to reject any custom-path join > types, or at least this barely-intelligible comment seems to imply that: > > * Just skip anything else. We don't know if corresponding > * plan would build the output row from whole-row references > * of base relations and execute the EPQ checks. > > But this coding fails to notice any such join type that's below the > level of the immediate two join inputs. Similarly, here we assume that any custom plan would tell us whether the given row survives EPQ recheck. As long as a custom plan doing that correctly, it shouldn't be a problem. > > We've probably managed to not notice this so far because foreign joins > generally ought to dominate any local join method, so that there wouldn't > often be cases where the surviving paths use local joins for input > sub-joins. But Jeff's test case proves it can happen. > > I kind of wonder why this infrastructure exists at all; it's not the way > I'd have foreseen handling EPQ for remote joins. However, while "throw > it away and start again" might be good advice going forward, I suppose > it won't be very popular for applying to 9.6. > > One way that we could make things better is to rely on the knowledge > that EPQ isn't asked to evaluate joins for more than one row per input > relation, and therefore using merge or hash join technology is really > overkill. We could make a tree of simple nestloop joins, which aren't > going to care about sort order, if we could identify the correct join > clauses to apply. At least some of that could be laid off on the FDW, > which if it's gotten this far at all, ought to know what join clauses > need to be enforced by the foreign join. So I'm thinking a little bit > in terms of "just collect the foreign scans for all the base rels > included in the join and then build a cross-product nestloop join tree, > applying all the join clauses at the top". This would have the signal > value that it's guaranteed to succeed and so can be left for later, > rather than having to expensively redo it at each level of join planning. I am not able to understand how this strategy of applying all join clauses on top of cross product would work if there are OUTER joins involved. Wouldn't nullable sides change the result? > > (Hm, that does sound a lot
Re: [HACKERS] Hash Indexes
On 12/11/2016 11:37 PM, Amit Kapila wrote: On Sun, Dec 11, 2016 at 11:54 AM, Amit Kapila wrote: On Wed, Dec 7, 2016 at 2:02 AM, Jeff Janes wrote: With above fixes, the test ran successfully for more than a day. There was a small typo in the previous patch which is fixed in attached. I don't think it will impact the test results if you have already started the test with the previous patch, but if not, then it is better to test with attached. A mix work load (INSERT, DELETE and VACUUM primarily) is successful here too using _v2. Thanks ! Best regards, Jesper -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [sqlsmith] FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 10200)
On Tue, Dec 13, 2016 at 7:15 PM, Kyotaro HORIGUCHI wrote: > Thank you for the comment. Er, it is good to see more people interested in this problem... As the former author, still working actively on this patch, perhaps it would be better if I continue to code this patch, no? It is a waste to step on each other's toes. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] pg_catversion builtin function
Hi Hackers, Attached is a new builtin function that exposes the CATALOG_VERSION_NO constant under the pg_catversion() function, e.g. test=# SELECT pg_catversion(); pg_catversion --- 201612121 (1 row) Although it mostly useful during the development cycle to verify if dump/restore is needed; it could have other use-cases. I'm unsure of the OID assignment rules - feel free to point me towards information regarding this. I'll register this patch with the next CF. Best regards, Jesper >From 39d52f5389bf3ef1814c1f201df6531feb2a5c7f Mon Sep 17 00:00:00 2001 From: jesperpedersen Date: Tue, 13 Dec 2016 08:27:18 -0500 Subject: [PATCH] pg_catversion builtin function --- doc/src/sgml/func.sgml | 6 ++ src/backend/utils/adt/version.c | 10 ++ src/include/catalog/pg_proc.h | 3 +++ src/include/utils/builtins.h| 1 + 4 files changed, 20 insertions(+) diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 0f9c9bf..6fc78ab 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -15497,6 +15497,12 @@ SELECT * FROM pg_ls_dir('.') WITH ORDINALITY AS t(ls,n); text PostgreSQL version information. See also for a machine-readable version. + + + pg_catversion() + int + returns the catalog version number + diff --git a/src/backend/utils/adt/version.c b/src/backend/utils/adt/version.c index f247992..55f97fa 100644 --- a/src/backend/utils/adt/version.c +++ b/src/backend/utils/adt/version.c @@ -14,6 +14,7 @@ #include "postgres.h" +#include "catalog/catversion.h" #include "utils/builtins.h" @@ -22,3 +23,12 @@ pgsql_version(PG_FUNCTION_ARGS) { PG_RETURN_TEXT_P(cstring_to_text(PG_VERSION_STR)); } + +/* + * Return the catalog version number + */ +Datum +pgsql_catversion(PG_FUNCTION_ARGS) +{ + PG_RETURN_INT32(CATALOG_VERSION_NO); +} diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h index cd7b909..b23f54a 100644 --- a/src/include/catalog/pg_proc.h +++ b/src/include/catalog/pg_proc.h @@ -235,6 +235,9 @@ DATA(insert OID = 1258 ( textcat PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 DATA(insert OID = 84 ( boolne PGNSP PGUID 12 1 0 0 0 f f f t t f i s 2 0 16 "16 16" _null_ _null_ _null_ _null_ _null_ boolne _null_ _null_ _null_ )); DATA(insert OID = 89 ( version PGNSP PGUID 12 1 0 0 0 f f f f t f s s 0 0 25 "" _null_ _null_ _null_ _null_ _null_ pgsql_version _null_ _null_ _null_ )); DESCR("PostgreSQL version string"); +DATA(insert OID = 7000 ( pg_catversionPGNSP PGUID 12 1 0 0 0 f f f f t f s s 0 0 23 "" _null_ _null_ _null_ _null_ _null_ pgsql_catversion _null_ _null_ _null_ )); +DESCR("PostgreSQL catalog version number"); + DATA(insert OID = 86 ( pg_ddl_command_in PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 32 "2275" _null_ _null_ _null_ _null_ _null_ pg_ddl_command_in _null_ _null_ _null_ )); DESCR("I/O"); diff --git a/src/include/utils/builtins.h b/src/include/utils/builtins.h index 7ed1623..83b2846 100644 --- a/src/include/utils/builtins.h +++ b/src/include/utils/builtins.h @@ -893,6 +893,7 @@ extern Datum text_format_nv(PG_FUNCTION_ARGS); /* version.c */ extern Datum pgsql_version(PG_FUNCTION_ARGS); +extern Datum pgsql_catversion(PG_FUNCTION_ARGS); /* xid.c */ extern Datum xidin(PG_FUNCTION_ARGS); -- 2.7.4 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [sqlsmith] FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 10200)
On Tue, Dec 13, 2016 at 12:49 PM, Fujii Masao wrote: Thanks for the review. > Isn't it better to mention "an exclusive backup" here? What about > > EXCLUSIVE_BACKUP_STARTING means that pg_start_backup() is starting an > exclusive > backup. > EXCLUSIVE_BACKUP_STOPPING means that pg_stop_backup() is stopping an exclusive > backup. OK, changed this way. > I think that it's worth explaining why ExclusiveBackupState is necessary, > in the comment. Changed like that at the top of the declaration of ExclusiveBackupState: * State of an exclusive backup, necessary to control concurrent activities * across sessions when working on exclusive backups. > - * exclusiveBackup is true if a backup started with pg_start_backup() is > - * in progress, and nonExclusiveBackups is a counter indicating the > number > + * exclusiveBackupState is changed to EXCLUSIVE_BACKUP_STARTING for the > + * duration of pg_start_backup(), then to EXCLUSIVE_BACKUP_IN_PROGRESS > once > + * it is done, and nonExclusiveBackups is a counter indicating the number > > Why didn't you mention EXCLUSIVE_BACKUP_STOPPING and _NONE? > Basically it's better to explain fully how the state changes. Let's simplify things instead and just say that the meaning of the values is described where ExclusiveBackupState is declared. > +(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), > + errmsg("a backup is already starting"))); > +} > +if (XLogCtl->Insert.exclusiveBackupState == > EXCLUSIVE_BACKUP_STOPPING) > +{ > +WALInsertLockRelease(); > +ereport(ERROR, > +(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), > + errmsg("a backup is currently stopping"))); > > Isn't it better to use "an exclusive backup" explicitly rather than > "a backup" here? Yes. It would be better to not touch the original in-progress messages though. > You changed the code so that pg_stop_backup() removes backup_label before > it marks the exclusive-backup-state as not-in-progress. Could you tell me > why you did this change? It's better to explain it in the comment. That's actually mentioned in the comments :) This is to ensure that there cannot be any other pg_stop_backup() running in parallel and trying to remove the backup label file. The key point here is that the backup_label file is removed during EXCLUSIVE_BACKUP_STOPPING, that the removal of the backup_label file is kept last on purpose (that's what HEAD does anyway), and that we can rollback to an in-progress state in case of failure. I have rewritten this block like that. Does that sound better? +* Remove backup_label for an exclusive backup. Before doing anything +* the status of the exclusive backup is switched to +* EXCLUSIVE_BACKUP_STOPPING to ensure that there cannot be concurrent +* operations. In case of failure, in which case the status of the +* exclusive backup is switched back to in-progress. The removal of the +* backup_label file is kept last as it is the critical piece proving +* if an exclusive backup is running or not in the event of a system +* crash. Thanks, -- Michael diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 084401d..8fa9cf4 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -463,6 +463,29 @@ typedef union WALInsertLockPadded } WALInsertLockPadded; /* + * State of an exclusive backup, necessary to control concurrent activities + * across sessions when working on exclusive backups. + * + * EXCLUSIVE_BACKUP_NONE means that there is no exclusive backup actually + * running, to be more precise pg_start_backup() is not being executed for + * an exclusive backup and there is no exclusive backup in progress. + * EXCLUSIVE_BACKUP_STARTING means that pg_start_backup() is starting an + * exclusive backup. + * EXCLUSIVE_BACKUP_STOPPING means that pg_stop_backup() is stopping an + * exclusive backup. + * EXCLUSIVE_BACKUP_IN_PROGRESS means that pg_start_backup() has finished + * running and an exclusive backup is in progress. pg_stop_backup() is + * needed to finish it. + */ +typedef enum ExclusiveBackupState +{ + EXCLUSIVE_BACKUP_NONE = 0, + EXCLUSIVE_BACKUP_STARTING, + EXCLUSIVE_BACKUP_STOPPING, + EXCLUSIVE_BACKUP_IN_PROGRESS +} ExclusiveBackupState; + +/* * Shared state data for WAL insertion. */ typedef struct XLogCtlInsert @@ -503,13 +526,14 @@ typedef struct XLogCtlInsert bool fullPageWrites; /* - * exclusiveBackup is true if a backup started with pg_start_backup() is - * in progress, and nonExclusiveBackups is a counter indicating the number - * of streaming base backups currently in progress. forcePageWrites is set - * to true when either of these is non-zero. lastBackupStart is the latest - * checkpoint redo location used as a starting point for an online backup. + * The possible values of exclusiveBackupStat
Re: [HACKERS] Broken SSL tests in master
On Mon, Dec 12, 2016 at 1:34 AM, Heikki Linnakangas wrote: > On 12/01/2016 09:45 PM, Andres Freund wrote: >> >> And nobody has added a buildfarm module to run it manually on their >> servers either :( > > I just added a module to run "make -C src/test/ssl check" in chipmunk. So at > least that's covered now. > > http://pgbuildfarm.org/cgi-bin/show_stage_log.pl?nm=chipmunk&dt=2016-12-10%2012%3A08%3A31&stg=test-ssl-check Thanks. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_background contrib module proposal
On Tue, Dec 13, 2016 at 2:05 PM, Andrew Borodin wrote: > 2016-12-13 12:55 GMT+05:00 amul sul : >> I think background-session code is not that much deviated from >> pg_background code, > It is not derived, though it is not much deviated. background sessions > code do not have detailed exceptions and code comments, but it is > doing somewhat similar things. >>IIUC background-session patch we can manage to >> reuse it, as suggested by Robert. This will allow us to maintain >> session in long run, we could reuse this session to run subsequent >> queries instead of maintaining separate worker pool. Thoughts? > > One API to deal with both features would be much better, sure. > "Object" like sessions pool are much easier to implement on top of > session "object" then on top of worker process, PIDs etc. > >>> 4. Query as a future (actually it is implemented already by >>> pg_background_result) >>> 5. Promised result. Declare that you are waiting for data of specific >>> format, execute a query, someone (from another backend) will >>> eventually place a data to promised result and your query continues >>> execution. >> >> Could you please elaborate more? >> Do you mean two way communication between foreground & background process? > > It is from C++11 threading: future, promise and async - these are > related but different kinds of aquiring result between threads. > Feature - is when caller Cl start thread T(or dequeue thread from > pool) and Cl can wait until T finishes and provides result. > Here waiting the result is just like selecting from pg_background_result(). > Promise - is when you declare a variable and caller do not know which > thread will put the data to this variable. But any thread reading > promise will wait until other thread put a data to promise. > There are more parallelism patterns there, like async, deffered, lazy, > but futures and promises from my point of view are most used. > Nice, thanks for detailed explanation. We can use shm_mq infrastructure to share any kind of message between two processes, but perhaps we might end up with overestimating what originally pg_background could used for - the user backend will launch workers and give them an initial set of instruction and then results will stream back from the workers to the user backend. >>> 6. Cancelation: a way to signal to background query that it's time to >>> quit gracefully. >> Let me check this too. > I think Craig is right: any background query must be ready to be shut > down. That's what databases are about, you can safely pull the plug at > any moment. SIGTERM is handled in current pg_background patch, user can terminate backend execution using pg_cancel_backend() or pg_terminate_backend() as shown below: postgres=> select pg_background_launch('insert into foo values(generate_series(1,1))'); pg_background_launch -- 67069 (1 row) postgres=> select pg_terminate_backend(67069); pg_terminate_backend -- t (1 row) postgres=> select * from pg_background_result(67069) as (x text); ERROR: terminating connection due to administrator command CONTEXT: background worker, pid 67069 postgres=> Thanks & Regards, Amul -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ToDo: no blocking (waiting) DDL
2016-12-13 13:03 GMT+01:00 Alvaro Herrera : > Pavel Stehule wrote: > > Hi > > > > I don't remember well, there was maybe similar ToDo. > > > > Yesterday I got a incident on high load system when I executed DROP INDEX > > cmd. > > > > This statement waited on a finish of long transaction, but it stopped any > > other statements. > > > > Can be nice when waiting on lock statement doesn't block another > > statements. > > https://postgr.es/m/CAKddOFDz0+F2uspVN5BZgtN7Wp+ > vqbeluqpwfvgwap5jzvx...@mail.gmail.com yes, it is it. Can be applied on some DDL statements too. Thank you regards Pavel > > > -- > Álvaro Herrerahttps://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services >
Re: [HACKERS] ToDo: no blocking (waiting) DDL
Pavel Stehule wrote: > Hi > > I don't remember well, there was maybe similar ToDo. > > Yesterday I got a incident on high load system when I executed DROP INDEX > cmd. > > This statement waited on a finish of long transaction, but it stopped any > other statements. > > Can be nice when waiting on lock statement doesn't block another > statements. https://postgr.es/m/cakddofdz0+f2uspvn5bzgtn7wp+vqbeluqpwfvgwap5jzvx...@mail.gmail.com -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] ToDo: no blocking (waiting) DDL
Hi I don't remember well, there was maybe similar ToDo. Yesterday I got a incident on high load system when I executed DROP INDEX cmd. This statement waited on a finish of long transaction, but it stopped any other statements. Can be nice when waiting on lock statement doesn't block another statements. Regards Pavel
Re: [HACKERS] [OSSTEST PATCH 0/1] PostgreSQL db: Retry on constraint violation [and 2 more messages]
Thanks to everyone for your attention. Kevin Grittner writes ("Re: [HACKERS] [OSSTEST PATCH 0/1] PostgreSQL db: Retry on constraint violation"): > On Mon, Dec 12, 2016 at 8:45 AM, Ian Jackson > wrote: > > AIUI the documented behavour is that "every set of successful > > transactions is serialisable". > > Well, in context that is referring to serializable transactions. > No such guarantee is provided for other isolation levels. Indeed. > I didn't [get the same results]. First, I got this when I tried to > start the concurrent transactions using the example as provided: I'm sorry, I didn't actually try my paraphrased repro recipe, so it contained an error: > test=# SELECT count(*) FROM t WHERE k=1; -- save value My Perl code said "k=?" and of course ? got (effectively) substituted with "'1'" rather than "1". I introduced the error when transcribing the statements from my Perl script to my prose. Sorry about that. Next time I will test my alleged recipe with psql. I should also have told you that I was running this on 9.1. Kevin Grittner writes ("Re: [HACKERS] [OSSTEST PATCH 0/1] PostgreSQL db: Retry on constraint violation"): > On Mon, Dec 12, 2016 at 12:32 PM, Kevin Grittner wrote: > > As you can see, this generated a serialization failure. > > That was on 9.6. On earlier versions it does indeed allow the > transaction on connection 2 to commit, yielding a non-serializable > result. This makes a pretty strong case for back-patching this > commit: Right. That's part of my point. Thanks. [back to the first message] > If you have some way to cause a set of concurrent serializable > transactions to generate results from those transactions which > commit which is not consistent with some one-at-a-time order of > execution, I would be very interested in seeing the test case. > The above, however, is not it. I am concerned that there are other possible bugs of this form. In earlier messages on this topic, it has been suggested that the "impossible" unique constraint violation is only one example of a possible "leakage". Earlier you wrote: If I recall correctly, the constraints for which there can be errors appearing due to concurrent transactions are primary key, unique, and foreign key constraints. I don't remember seeing it happen, but it would not surprise me if an exclusion constraint can also cause an error due to a concurrent transaction's interaction with the transaction receiving the error. Are all of these cases fixed by fcff8a57519847 "Detect SSI conflicts before reporting constraint violations" ? I can try playing around with other kind of constraints, to try to discover different aspects or versions of this bug, but my knowledge of the innards of databases is very limited and I may not be particularly effective. Certainly if I try and fail, I wouldn't have confidence that no such bug existed. And, Thomas Munro writes ("Re: [HACKERS] [OSSTEST PATCH 0/1] PostgreSQL db: Retry on constraint violation"): > Ian's test case uses an exception handler to convert a difference in > error code into a difference in committed effect, thereby converting a > matter of programmer convenience into a bug. Precisely. I think this is a fully general technique, which means that any situation where a transaction can "spuriously" fail is a similar bug. So I think that ISOLATION LEVEL SERIALIZABLE needs to do what a naive programmer would expect: All statements in such transactions, even aborted transactions, need to see results, and have behaviour, which are completely consistent with some serialisaton of all involved transactions. This must apply up to (but not including) any serialisation failure error. If that is the behaviour of 9.6 then I would like to submit a documentation patch which says so. If the patch is to be backported, then this ought to apply to all (patched) 9.x versions ? It would be nice if the documentation stated the error codes that might be generated. AFAICT that's just 40P01 and 40001 ? (I'm not sure what 40002 is.) > For the record, read-write-unique-4.spec's permutation r2 w1 w2 c1 c2 > remains an open question for further work. Is this another possible bug of this form ? Ian. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [sqlsmith] FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 10200)
Thank you for the comment. At Tue, 13 Dec 2016 12:49:12 +0900, Fujii Masao wrote in > On Tue, Nov 8, 2016 at 11:18 AM, Kyotaro HORIGUCHI > wrote: > >> By the way, as long as I look at that, the patch applies cleanly on > >> master and 9.6 but not further down. If a committer shows up to push > >> this patch, I can prepare versions for back branches as needed. > > > > Ok, the attached is the version just rebased to the current > > master. It is clieanly appliable without whitespace errors on > > master and 9.6 with some displacements but fails on 9.5. > > > > I will mark this as "Ready for Committer". > > Thanks for updating the patch! Here are the review comments; > > + * EXCLUSIVE_BACKUP_NONE means that there is no exclusive backup actually > + * running, to be more precise pg_start_backup() is not being executed for > + * an exclusive backup and there is exclusive backup in progress. > > "there is exclusive" should be "there is no exclusive"? > ISTM that the description following "to be more" doesn't seem to be necessary. I agree. One can know that by reading the description on EXCLUSIVE_BACKUP_STARTING (and STOPPING, which is not mentioned here) > + * EXCLUSIVE_BACKUP_STARTING means that pg_start_backup() is running and > + * that is is not done yet. > + * EXCLUSIVE_BACKUP_STOPPING means that pg_stop_backup() is running and > + * that is is not done yet. > > Typo: "is is" Oops. Sorry for overlooking such a silly typo. > Isn't it better to mention "an exclusive backup" here? What about > > EXCLUSIVE_BACKUP_STARTING means that pg_start_backup() is starting an > exclusive > backup. > EXCLUSIVE_BACKUP_STOPPING means that pg_stop_backup() is stopping an exclusive > backup. It seems fine. Done in the attached version. > I think that it's worth explaining why ExclusiveBackupState is necessary, > in the comment. The attached version contains the following comment. | * State of an exclusive backup. | * | * EXCLUSIVE_BACKUP_STARTING and EXCLUSIVE_BACKUP_STOPPING blocks | * pg_start_backup and pg_stop_backup to prevent a race condition. Both of the | * functions returns error on these state. Does this make sense? > - * exclusiveBackup is true if a backup started with pg_start_backup() is > - * in progress, and nonExclusiveBackups is a counter indicating the > number > + * exclusiveBackupState is changed to EXCLUSIVE_BACKUP_STARTING for the > + * duration of pg_start_backup(), then to EXCLUSIVE_BACKUP_IN_PROGRESS > once > + * it is done, and nonExclusiveBackups is a counter indicating the number > > Why didn't you mention EXCLUSIVE_BACKUP_STOPPING and _NONE? > Basically it's better to explain fully how the state changes. Hmm, it is modfied as the following in the attached. | * exclusiveBackupState indicates state of an exlusive backup, see | * ExclusiveBackupState for the detail. and the comment for ExclusiveBackupState explains the state transition. Does this work for you? > +(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), > + errmsg("a backup is already starting"))); > +} > +if (XLogCtl->Insert.exclusiveBackupState == > EXCLUSIVE_BACKUP_STOPPING) > +{ > +WALInsertLockRelease(); > +ereport(ERROR, > +(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), > + errmsg("a backup is currently stopping"))); > > Isn't it better to use "an exclusive backup" explicitly rather than > "a backup" here? pg_stop_backup already says like that. To unify with them, the messages of pg_start_backup is modifed as the following. exlusive backup is already starting exlusive backup is currently stopping exlusive backup is currently in progress > You changed the code so that pg_stop_backup() removes backup_label before > it marks the exclusive-backup-state as not-in-progress. Could you tell me > why you did this change? It's better to explain it in the comment. Previously the backup_label is the traffic signal for the backup state and that is the cause of this problem. Now the exclusiveBackupState takes the place of it, so it naturally should be _NONE after all steps have been finished. It seems to me so natural so that no additional explanation is needed. If they are done in the opposite order, the existence check of backup_label still left in pg_start_backup may fire. | * Check for existing backup label --- implies a backup is already | * running. (XXX given that we checked exclusiveBackupState above, | * maybe it would be OK to just unlink any such label file?) | */ | if (stat(BACKUP_LABEL_FILE, &stat_buf) != 0) | { |if (errno != ENOENT) | ereport(ERROR, -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 084401d..fd2d809 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/trans
Re: [HACKERS] Typmod associated with multi-row VALUES constructs
Hello, it seems to me to work as expected. At Thu, 08 Dec 2016 15:58:10 -0500, Tom Lane wrote in <7083.1481230...@sss.pgh.pa.us> > I've pushed the previous patch to HEAD. Attached is a proposed patch > (against 9.6) that we could use for the back branches; it takes the > brute force approach of just computing the correct value on-demand > in the two functions at issue. The key question of course is whether > this is acceptable from a performance standpoint. I did a simple test > > using a 1000-entry VALUES list: > > select count(a) from ( > values > ('0'::varchar(3), '0'::varchar(4)), > ('1'::varchar(3), '1'::varchar(4)), > ('2'::varchar(3), '2'::varchar(4)), > ('3'::varchar(3), '3'::varchar(4)), > ('4'::varchar(3), '4'::varchar(4)), > ... > ('996'::varchar(3), '996'::varchar(4)), > ('997'::varchar(3), '997'::varchar(4)), > ('998'::varchar(3), '998'::varchar(4)), > ('999'::varchar(3), '999'::varchar(4)) > ) v(a,b); > > Since all the rows do have the same typmod, this represents the worst > case where we have to scan all the way to the end to confirm the typmod, > and it has about as little overhead otherwise as I could think of doing. > I ran it like this: > > pgbench -U postgres -n -c 1 -T 1000 -f bigvalues.sql regression > > and could not see any above-the-noise-level difference --- in fact, > it seemed like it was faster *with* the patch, which is obviously > impossible; I blame that on chance realignments of loops vs. cache > line boundaries. > > So I think this is an okay candidate for back-patching. If anyone > wants to do their own performance tests, please do. For all the additional computation, I had the same result on 9.6. The patch accelerates the processing around the noise rate.. 9.6 without the patch 103.2 tps 9.6 withthe patch 108.3 tps For the master branch, master102.9 tps 0b78106c^ 103.4 tps regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
On 2016/12/13 0:17, Tomas Vondra wrote: > On 12/12/2016 07:37 AM, Amit Langote wrote: >> >> Hi Tomas, >> >> On 2016/12/12 10:02, Tomas Vondra wrote: >>> >>> 2) I'm wondering whether having 'table' in the catalog name (and also in >>> the new relkind) is too limiting. I assume we'll have partitioned indexes >>> one day, for example - do we expect to use the same catalogs? >> >> I am not sure I understand your idea of partitioned indexes, but I doubt >> it would require entries in the catalog under consideration. Could you >> perhaps elaborate more? >> > > OK, let me elaborate. Let's say we have a partitioned table, and I want to > create an index. The index may be either "global" i.e. creating a single > relation for data from all the partitions, or "local" (i.e. partitioned > the same way as the table). > > Local indexes are easier to implement (it's essentially what we have now, > except that we need to create the indexes manually for each partition), > and don't work particularly well for some use cases (e.g. unique > constraints). This is what I mean by "partitioned indexes". > > If the index is partitioned just like the table, we probably won't need to > copy the partition key info (so, nothing in pg_partitioned_table). > I'm not sure it makes sense to partition the index differently than the > table - I don't see a case where that would be useful. > > The global indexes would work better for the unique constraint use case, > but it clearly contradicts our idea of TID (no information about which > partition that references). > > So maybe the catalog really only needs to track info about tables? Not > sure. I'm just saying it'd be unfortunate to have _table in the name, and > end up using it for indexes too. Hmm, I didn't quite think of the case where the index is partitioned differently from the table, but perhaps that's possible with some other databases. What you describe as "local indexes" or "locally partitioned indexes" is something I would like to see being pursued in the near term. In that case, we would allow defining indexes on the parent that are recursively defined on the partitions and marked as inherited index, just like we have inherited check constraints and NOT NULL constraints. I have not studied whether we could implement (globally) *unique* indexes with this scheme though, wherein the index key is a superset of the partition key. >>> 5) Half the error messages use 'child table' while the other half uses >>> 'partition'. I think we should be using 'partition' as child tables really >>> have little meaning outside inheritance (which is kinda hidden behind the >>> new partitioning stuff). >> >> One way to go about it may be to check all sites that can possibly report >> an error involving child tables (aka "partitions") whether they know from >> the context which name to use. I think it's possible, because we have >> access to the parent relation in all such sites and looking at the relkind >> can tell whether to call child tables "partitions". >> > > Clearly, this is a consequence of building the partitioning on top of > inheritance (not objecting to that approach, merely stating a fact). > > I'm fine with whatever makes the error messages more consistent, if it > does not make the code significantly more complex. It's a bit confusing > when some use 'child tables' and others 'partitions'. I suspect even a > single DML command may return a mix of those, depending on where exactly > it fails (old vs. new code). So, we have mostly some old DDL (CREATE/ALTER TABLE) and maintenance commands that understand inheritance. All of the their error messages apply to partitions as well, wherein they will be referred to as "child tables" using old terms. We now have some cases where the commands cause additional error messages for only partitions because of additional restrictions that apply to them. We use "partitions" for them because they are essentially new error messages. There won't be a case where single DML command would mix the two terms, because we do not allow mixing partitioning and regular inheritance. Maybe I misunderstood you though. >>> So if I understand the plan correctly, we first do a parallel scan of the >>> parent, then partition_1, partition_2 etc. But AFAIK we scan the tables in >>> Append sequentially, and each partition only has 1000 rows each, making >>> the parallel execution rather inefficient. Unless we scan the partitions >>> concurrently. >>> >>> In any case, as this happens even with plain inheritance, it's probably >>> more about the parallel query than about the new partitioning patch. >> >> Yes, I have seen some discussion [2] about a Parallel Append, which would >> result in plans you're probably thinking of. >> > > Actually, I think that thread is about allowing partial paths even if only > some appendrel members support it. My point is that in this case it's a > bit silly to even build the partial paths, when the partitions only have > 100
Re: [HACKERS] pg_background contrib module proposal
2016-12-13 12:55 GMT+05:00 amul sul : > I think background-session code is not that much deviated from > pg_background code, It is not derived, though it is not much deviated. background sessions code do not have detailed exceptions and code comments, but it is doing somewhat similar things. >IIUC background-session patch we can manage to > reuse it, as suggested by Robert. This will allow us to maintain > session in long run, we could reuse this session to run subsequent > queries instead of maintaining separate worker pool. Thoughts? One API to deal with both features would be much better, sure. "Object" like sessions pool are much easier to implement on top of session "object" then on top of worker process, PIDs etc. >> 4. Query as a future (actually it is implemented already by >> pg_background_result) >> 5. Promised result. Declare that you are waiting for data of specific >> format, execute a query, someone (from another backend) will >> eventually place a data to promised result and your query continues >> execution. > > Could you please elaborate more? > Do you mean two way communication between foreground & background process? It is from C++11 threading: future, promise and async - these are related but different kinds of aquiring result between threads. Feature - is when caller Cl start thread T(or dequeue thread from pool) and Cl can wait until T finishes and provides result. Here waiting the result is just like selecting from pg_background_result(). Promise - is when you declare a variable and caller do not know which thread will put the data to this variable. But any thread reading promise will wait until other thread put a data to promise. There are more parallelism patterns there, like async, deffered, lazy, but futures and promises from my point of view are most used. >> 6. Cancelation: a way to signal to background query that it's time to >> quit gracefully. > Let me check this too. I think Craig is right: any background query must be ready to be shut down. That's what databases are about, you can safely pull the plug at any moment. I've remembered one more parallelism pattern: critical region. It's when you ask the system "plz no TERM now, and, if you can, postpone the vacuums, OOMKs and that kind of stuff" from user code. But I do not think it's usable pattern here. Thank you for your work. Best regards, Andrey Borodin. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New design for FK-based join selectivity estimation
Hi hackers, The commit 100340e2dcd05d6505082a8fe343fb2ef2fa5b2a introduce an estimation error : create table t3 as select j from generate_series(1,1) i,generate_series(1,100) j ; create table t4 as select j from generate_series(1,100) j ; create unique index ON t4(j); alter table t3 add constraint fk foreign key (j) references t4(j); analyze; 9.5.5 explain analyze select * from t3 where j in (select * from t4 where j<10); QUERY PLAN Hash Semi Join (cost=2.36..18053.61 rows=9 width=4) (actual time=0.217..282.325 rows=9 loops=1) Hash Cond: (t3.j = t4.j) -> Seq Scan on t3 (cost=0.00..14425.00 rows=100 width=4) (actual time=0.112..116.063 rows=100 loops=1) -> Hash (cost=2.25..2.25 rows=9 width=4) (actual time=0.083..0.083 rows=9 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 9kB -> Seq Scan on t4 (cost=0.00..2.25 rows=9 width=4) (actual time=0.019..0.074 rows=9 loops=1) Filter: (j < 10) Rows Removed by Filter: 91 Planning time: 0.674 ms Execution time: 286.043 ms On 9.6 HEAD explain analyze select * from t3 where j in (select * from t4 where j<10); QUERY PLAN --- Hash Semi Join (cost=2.36..18053.61 rows=100 width=4) (actual time=0.089..232.327 rows=9 loops=1) Hash Cond: (t3.j = t4.j) -> Seq Scan on t3 (cost=0.00..14425.00 rows=100 width=4) (actual time=0.047..97.926 rows=100 loops=1) -> Hash (cost=2.25..2.25 rows=9 width=4) (actual time=0.032..0.032 rows=9 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 9kB -> Seq Scan on t4 (cost=0.00..2.25 rows=9 width=4) (actual time=0.008..0.030 rows=9 loops=1) Filter: (j < 10) Rows Removed by Filter: 91 Planning time: 0.247 ms Execution time: 235.295 ms (10 rows) Estimated row is 10x larger since 100340e2d Regards, -- Adrien NAYRAT http://dalibo.com - http://dalibo.org signature.asc Description: OpenPGP digital signature
Re: [HACKERS] Quorum commit for multiple synchronous replication.
At Tue, 13 Dec 2016 08:46:06 +0530, Amit Kapila wrote in > On Mon, Dec 12, 2016 at 9:54 PM, Masahiko Sawada > wrote: > > On Mon, Dec 12, 2016 at 9:52 PM, Fujii Masao wrote: > >> On Mon, Dec 12, 2016 at 9:31 PM, Masahiko Sawada > >> wrote: > >>> On Sat, Dec 10, 2016 at 5:17 PM, Amit Kapila > >>> wrote: > > Few comments: > >>> > >>> Thank you for reviewing. > >>> > + * In 10.0 we support two synchronization methods, priority and > + * quorum. The number of synchronous standbys that transactions > + * must wait for replies from and synchronization method are specified > + * in synchronous_standby_names. This parameter also specifies a list > + * of standby names, which determines the priority of each standby for > + * being chosen as a synchronous standby. In priority method, the > standbys > + * whose names appear earlier in the list are given higher priority > + * and will be considered as synchronous. Other standby servers > appearing > + * later in this list represent potential synchronous standbys. If any > of > + * the current synchronous standbys disconnects for whatever reason, > + * it will be replaced immediately with the next-highest-priority > standby. > + * In quorum method, the all standbys appearing in the list are > + * considered as a candidate for quorum commit. > > In the above description, is priority method represented by FIRST and > quorum method by ANY in the synchronous_standby_names syntax? If so, > it might be better to write about it explicitly. > >>> > >>> Added description. > >>> > > + * specified in synchronous_standby_names. The priority method is > + * represented by FIRST, and the quorum method is represented by ANY > > Full stop is missing after "ANY". > > >>> > 6. > + int sync_method; /* synchronization method */ > /* member_names contains nmembers consecutive nul-terminated C strings > */ > char member_names[FLEXIBLE_ARRAY_MEMBER]; > } SyncRepConfigData; > > Can't we use 1 or 2 bytes to store sync_method information? > >>> > >>> I changed it to uint8. > >>> > > + int8 sync_method; /* synchronization method */ > > > I changed it to uint8. > > In mail, you have mentioned uint8, but in the code it is int8. I > think you want to update code to use uint8. > > > > > >> +standby_list{ $$ = > >> create_syncrep_config("1", $1, SYNC_REP_PRIORITY); } > >> +| NUM '(' standby_list ')'{ $$ = > >> create_syncrep_config($1, $3, SYNC_REP_QUORUM); } > >> +| ANY NUM '(' standby_list ')'{ $$ = > >> create_syncrep_config($2, $4, SYNC_REP_QUORUM); } > >> +| FIRST NUM '(' standby_list ')'{ $$ = > >> create_syncrep_config($2, $4, SYNC_REP_PRIORITY); } > >> > >> Isn't this "partial" backward-compatibility (i.e., "NUM (list)" works > >> differently from curent version while "list" works in the same way as > >> current one) very confusing? > >> > >> I prefer to either of > >> > >> 1. break the backward-compatibility, i.e., treat the first syntax of > >> "standby_list" as quorum commit > >> 2. keep the backward-compatibility, i.e., treat the second syntax of > >> "NUM (standby_list)" as sync rep with the priority > > > > +1. > > > There were some comments when I proposed the quorum commit. If we do > > #1 it breaks the backward-compatibility with 9.5 or before as well. I > > don't think it's a good idea. On the other hand, if we do #2 then the > > behaviour of s_s_name is 'NUM (standby_list)' == 'FIRST NUM > > (standby_list)''. But it would not what most of user will want and > > would confuse the users of future version who will want to use the > > quorum commit. Since many hackers thought that the sensible default > > behaviour is 'NUM (standby_list)' == 'ANY NUM (standby_list)' the > > current patch chose to changes the behaviour of s_s_names and document > > that changes thoroughly. > > > > Your arguments are sensible, but I think we should address the point > of confusion raised by Fujii-san. As a developer, I feel breaking > backward compatibility (go with Option-1 mentioned above) here is a > good move as it can avoid confusions in future. However, I know many > a time users are so accustomed to the current usage that they feel > irritated with the change in behavior even it is for their betterment, > so it is advisable to do so only if it is necessary or we have > feedback from a couple of users. So in this case, if we don't want to > go with Option-1, then I think we should go with Option-2. If we go > with Option-2, then we can anyway comeback to change the behavior > which is more sensible for future after feedback from users. This implicitly put an assumption that replication configuration is defined by s_s_names. But in the past discussion, some people suggested that quorum commit should be configured by another