Re: hot_standby_feedback vs excludeVacuum and snapshots
On 9 June 2018 at 15:41, Amit Kapila wrote: > On Fri, Jun 8, 2018 at 5:27 PM, Simon Riggs wrote: >> On 8 June 2018 at 11:33, Amit Kapila wrote: >>> On Fri, Jun 8, 2018 at 1:53 PM, Simon Riggs wrote: >>>> >>>> So the attached patch fixes both the bug in the recent commit and the >>>> one I just found by observation of 49bff5300d527, since they are >>>> related. >>>> >>>> StandbyReleaseOldLocks() can sweep in the same way as >>>> ExpireOldKnownAssignedTransactionIds(). >>>> >>> >> >>> How will this avoid the bug introduced by your recent commit >>> (32ac7a11)? After that commit, we no longer consider vacuum's xid in >>> RunningTransactions->oldestRunningXid calculation, so it is possible >>> that we still remove/release its lock on standby earlier than >>> required. >> >> In the past, the presence of an xid was required to prevent removal by >> StandbyReleaseOldLocks(). >> >> The new patch removes that requirement and removes when the xid is >> older than oldestxid >> > > The case I have in mind is: "Say vacuum got xid 600 while truncating, > and then some other random transactions 601,602 starts modifying the > database. At this point, I think the computed value of > RunningTransactions->oldestRunningXid will be 601. Now, on standby > when StandbyReleaseOldLocks sees the lock from transaction 600, it > will remove it which doesn't appear correct to me.". OK, thanks. Patch attached. >> The normal path of removal is at commit or abort, >> StandbyReleaseOldLocks is used almost never (as originally intended). >> > Okay, but that doesn't prevent us to ensure that whenever used, it > does the right thing. What do you mean? Has anyone argued in favour of doing the wrong thing? I'm playing around with finding a test case to prove this area works, rather than just manual testing. Suggestions welcome. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services execution_ordering_in_running_xact.v1.patch Description: Binary data
Re: hot_standby_feedback vs excludeVacuum and snapshots
On 8 June 2018 at 19:03, Andres Freund wrote: > Hi, > > On 2018-06-08 09:23:02 +0100, Simon Riggs wrote: >> I have also found another bug which affects what we do next. >> >> For context, AEL locks are normally removed by COMMIT or ABORT. >> StandbyReleaseOldLocks() is just a sweeper to catch anything that >> didn't send an abort before it died, so it hardly ever activates. The >> coding of StandbyReleaseOldLocks() is backwards... if it ain't in the >> running list, then we remove it. >> >> But that wasn't working correctly either, since as of 49bff5300d527 we >> assigned AELs to subxids. Subxids weren't in the running list and so >> AELs held by them would have been removed at the wrong time, an extant >> bug in PG10. It looks to me like they would have been removed either >> early or late, up to the next runningxact info record. They would be >> removed, so no leakage, but the late timing wouldn't be noticeable for >> tests or most usage, since it would look just like lock contention. >> Early release might give same issue of block access to non-existent >> block/file. > > Yikes, that's kinda bad. It can also just cause plain crashes, no? The > on-disk / catalog state isn't necessarily consistent during DDL, which > is why we hold AE locks. At the very least it can cause corruption of > in-use relcache entries and such. In practice the fact this probably > hits only a limited number of people because it requires DDL to happen > in subtransactions, which probably isn't *that* common. Yep, kinda bad, hence fix. >> So the attached patch fixes both the bug in the recent commit and the >> one I just found by observation of 49bff5300d527, since they are >> related. > > Can we please keep them separate? The attached patch is separate. It fixes 49bff5300d527 and also the committed code, but should work fine even if we revert. Will doublecheck. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: hot_standby_feedback vs excludeVacuum and snapshots
On 8 June 2018 at 19:03, Andres Freund wrote: >> It seems to have affected Greg. > > As far as I can tell Greg was just theorizing? I'll wait for Greg to say whether this was an actual case that needs to be fixed or not. If not, happy to revert. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: hot_standby_feedback vs excludeVacuum and snapshots
On 8 June 2018 at 11:33, Amit Kapila wrote: > On Fri, Jun 8, 2018 at 1:53 PM, Simon Riggs wrote: >> >> So the attached patch fixes both the bug in the recent commit and the >> one I just found by observation of 49bff5300d527, since they are >> related. >> >> StandbyReleaseOldLocks() can sweep in the same way as >> ExpireOldKnownAssignedTransactionIds(). >> > > How will this avoid the bug introduced by your recent commit > (32ac7a11)? After that commit, we no longer consider vacuum's xid in > RunningTransactions->oldestRunningXid calculation, so it is possible > that we still remove/release its lock on standby earlier than > required. In the past, the presence of an xid was required to prevent removal by StandbyReleaseOldLocks(). The new patch removes that requirement and removes when the xid is older than oldestxid The normal path of removal is at commit or abort, StandbyReleaseOldLocks is used almost never (as originally intended). -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: hot_standby_feedback vs excludeVacuum and snapshots
On 7 June 2018 at 22:25, Andres Freund wrote: > On 2018-06-07 14:19:18 -0700, Andres Freund wrote: > Look at: > > void > ProcArrayApplyRecoveryInfo(RunningTransactions running) > ... > /* > * Remove stale locks, if any. > * > * Locks are always assigned to the toplevel xid so we don't need to > care > * about subxcnt/subxids (and by extension not about ->suboverflowed). > */ > StandbyReleaseOldLocks(running->xcnt, running->xids); > > by excluding running transactions you have, as far as I can tell, > effectively removed the vacuum truncation AEL from the standby. I agree, that is correct, there is a bug in my recent commit that causes a small race window that could potentially lead to someone reading the size of a relation just before it is truncated and then falling off the end of the scan, resulting in a block access ERROR, potentially much later than the truncate. I have also found another bug which affects what we do next. For context, AEL locks are normally removed by COMMIT or ABORT. StandbyReleaseOldLocks() is just a sweeper to catch anything that didn't send an abort before it died, so it hardly ever activates. The coding of StandbyReleaseOldLocks() is backwards... if it ain't in the running list, then we remove it. But that wasn't working correctly either, since as of 49bff5300d527 we assigned AELs to subxids. Subxids weren't in the running list and so AELs held by them would have been removed at the wrong time, an extant bug in PG10. It looks to me like they would have been removed either early or late, up to the next runningxact info record. They would be removed, so no leakage, but the late timing wouldn't be noticeable for tests or most usage, since it would look just like lock contention. Early release might give same issue of block access to non-existent block/file. So the attached patch fixes both the bug in the recent commit and the one I just found by observation of 49bff5300d527, since they are related. StandbyReleaseOldLocks() can sweep in the same way as ExpireOldKnownAssignedTransactionIds(). > I also don't understand why this change would be backpatched in the > first place. It's a relatively minor efficiency thing, no? As for everything, that is open to discussion. Yes, it seems minor to me until it affects you, then its not. It seems to have affected Greg. The attached patch, or a later revision, needs to be backpatched to PG10 independently of the recent committed patch. I have yet to test this manually, but will do so tomorrow morning. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services remove_standby_subxid_locks.v1.patch Description: Binary data
Re: hot_standby_feedback vs excludeVacuum and snapshots
On 7 June 2018 at 22:19, Andres Freund wrote: > Wonder if the right thing here wouldn't be to instead transiently > acquire an AEL lock during replay when truncating a relation? The way AELs are replayed in generic, all AEL requests are handled that way. So yes, you could invent a special case for this specific situation. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: computing completion tag is expensive for pgbench -S -M prepared
On 7 June 2018 at 20:27, Tom Lane wrote: > Simon Riggs writes: >> If we're going to compress the protocol, it seems sensible to remove >> extraneous information first. > > Breaking the wire protocol was nowhere in this thread. No, it wasn't. But there is another thread on the subject of compressing libpq, which is what I was referring to. Andres' patch is clearly very efficient at adding the SELECT tag. I am questioning if we can remove that need altogether. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: computing completion tag is expensive for pgbench -S -M prepared
On 7 June 2018 at 19:20, Andres Freund wrote: > On 2018-06-07 11:40:48 +0100, Simon Riggs wrote: >> On 7 June 2018 at 11:29, Pavel Stehule wrote: >> >> >> Do we actually need the completion tag at all? In most cases?? >> > >> > >> > affected rows is taken from this value on protocol level >> >> I didn't mean we should remove the number of rows. Many things rely on that. > > How do you mean it then? We can't really easily change how we return > that value on the protocol level, and the command tag is basically just > returned as a string in the protocol. If we were to design the protocol > today I'm sure we just would declare the rowcount and affected rowcounts > separate fields or something, but ... I meant remove the pointless noise word at the start of the tag that few clients care about. I was thinking of just returning "SQL" instead, but that wasn't after much thought. But now I think about it more returning any fixed string, "SQL" or "SELECT", in the protocol seems like a waste of bandwidth and a potential route in to decrypt the stream. If we're going to compress the protocol, it seems sensible to remove extraneous information first. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: computing completion tag is expensive for pgbench -S -M prepared
On 7 June 2018 at 11:29, Pavel Stehule wrote: >> Do we actually need the completion tag at all? In most cases?? > > > affected rows is taken from this value on protocol level I didn't mean we should remove the number of rows. Many things rely on that. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: computing completion tag is expensive for pgbench -S -M prepared
On 7 June 2018 at 06:01, David Rowley wrote: > On 7 June 2018 at 16:13, Andres Freund wrote: >> in PortalRun(). That's actually fairly trivial to optimize - we don't >> need the full blown snprintf machinery here. A quick benchmark >> replacing it with: >> >>memcpy(completionTag, "SELECT ", sizeof("SELECT ")); >>pg_lltoa(nprocessed, completionTag + 7); > > I'd also noticed something similar with some recent benchmarks I was > doing for INSERTs into partitioned tables. In my case I saw as high as > 0.7% of the time spent building the INSERT tag. So I think it's worth > fixing this. > > I think it would be better to invent a function that accepts a > CmdType, int64 and Oid that copies the tag into the supplied buffer, > then make a more generic change that also replaces the code in > ProcessQuery() which builds the tag. I'm sure there must be some way > to get the CmdType down to the place you've patched so we can get rid > of the if (strcmp(portal->commandTag, "SELECT") == 0) line too. Sounds better Do we actually need the completion tag at all? In most cases?? Perhaps we should add a parameter to make it optional and turn it off by default, except for psql. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Possible bug in logical replication.
On 6 June 2018 at 17:22, Alvaro Herrera wrote: > This thread seems to have died down without any fix being proposed. > Simon, you own this open item. Thanks, will look. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: I'd like to discuss scaleout at PGCon
On 5 June 2018 at 17:14, MauMau wrote: > Furthermore, an extra hop and double parsing/planning could matter for > analytic queries, too. For example, SAP HANA boasts of scanning 1 > billion rows in one second. In HANA's scaleout architecture, an > application can connect to any worker node and the node communicates > with other nodes only when necessary (there's one special node called > "master", but it manages the catalog and transactions; it's not an > extra hop like the coordinator in XL). Vertica is an MPP analytics > database, but it doesn't have a node like the coordinator, either. To > achieve maximum performance for real-time queries, the scaleout > architecture should avoid an extra hop when possible. Agreed. When possible. When is it possible? Two ways I know of are: 1. have pre-knowledge in the client of where data is located (difficult to scale) 2. you must put data in all places the client can connect to (i.e. multimaster) Perhaps there are more? -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: plans for PostgreSQL 12
On 4 June 2018 at 06:08, Pavel Stehule wrote: > 4. optimization expression without necessity to create snapshots - > experiments > > @4 There are lot of not database expressions in PLpgSQL - like var1 := var1 > + var2 or var1 := var1 + konst. Own calculation needs about 1% of time of > total expression evaluation time. Almost all time get preparing plan cache, > preparing snapshot, .. For this case, when no database object is used, we > don't need use this infrastructure. I would to measure performance impact, > and testing if these optimizations are interesting or not. Sounds good. I think this would need to be restricted by operator and datatype, since in general you won't know if the datatype functions need a snapshot or not. Immutable functions for the operators ought to do it, but I think that might not be enough. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: I'd like to discuss scaleout at PGCon
On 2 June 2018 at 22:46, Ashutosh Bapat wrote: >>>> And that is why both XL and "FDW approach" rely on a central coordinator. >>> >>> I don't think we ever specified that "FDW approach" "relies" on a >>> central coordinator. One could configure and setup a cluster with >>> multiple coordinators using FDWs. >> >> Yes, of course. You're just misunderstanding me. I'm talking about a >> query coordinator "role". There can be many coordinator components and >> they can be spread out in variours ways, but for any one SQL query >> there needs to be one coordinator node. Not a SPOF. > > In your earlier mail, which is included above, you mentioned central > coordinator for multi-node transaction control and global deadlock > detection. That doesn't sound like a "query coordinator role". It > sounds more like GTM, which is an SPOF. In XL, GTM is a singe component managing transaction ids. That has a standby, so is not a SPOF. But that is not what I mean. I don't believe that a GTM-style component is necessary in a future in-core scalablility solution. Each incoming query needs to be planned and executed from one coordinator component, then the work performed across many workers on different nodes (or just one). We could have coordinator components on each worker node, or we could have a set of coordinator nodes and a set of worker nodes. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: I'd like to discuss scaleout at PGCon
On 1 June 2018 at 16:56, Ashutosh Bapat wrote: > On Fri, Jun 1, 2018 at 11:10 AM, Simon Riggs wrote: >> >> Using a central coordinator also allows multi-node transaction >> control, global deadlock detection etc.. > > But that becomes an SPOF and then we have to configure a standby for > that. I am not saying that that's a bad design but it's not very good > for many work-loads. But it would be good if we could avoid any > "central server" in this configuration. > >> >> And that is why both XL and "FDW approach" rely on a central coordinator. > > I don't think we ever specified that "FDW approach" "relies" on a > central coordinator. One could configure and setup a cluster with > multiple coordinators using FDWs. Yes, of course. You're just misunderstanding me. I'm talking about a query coordinator "role". There can be many coordinator components and they can be spread out in variours ways, but for any one SQL query there needs to be one coordinator node. Not a SPOF. >> >> FDWs alone are not enough. It is clear that some more tight coupling >> is required to get things to work well. For example, supporting SQL >> query plans that allow for redistribution of data for joins. > > I think partitioning + FDW provide basic infrastructure for > distributing data, planning queries working with such data. We need > more glue to support node management, cluster configuration. So, I > agree with your statement. But I think it was clear from the beginning > that we need more than FDW and partitioning. No, it wasn't clear. But I'm glad to hear it. It might actually work then. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: I'd like to discuss scaleout at PGCon
On 1 June 2018 at 04:00, MauMau wrote: > The SQL processor should be one layer, not two layers. For OLTP, that would be best. But it would be restricted to single-node requests, leaving you the problem of how you know ahead of time whether an SQL statement was single node or not. Using a central coordinator node allows us to hide the decision of single-node/multi-node from the user which seems essential for general SQL. If you are able to restrict the types of requests users make then we can do direct access to partitions - so there is scope for a single-node API, as Mongo provides. Using a central coordinator also allows multi-node transaction control, global deadlock detection etc.. And that is why both XL and "FDW approach" rely on a central coordinator. FDWs alone are not enough. It is clear that some more tight coupling is required to get things to work well. For example, supporting SQL query plans that allow for redistribution of data for joins. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: I'd like to discuss scaleout at PGCon
On 1 June 2018 at 15:44, Ashutosh Bapat wrote: > On Thu, May 31, 2018 at 11:00 PM, MauMau wrote: >> 2018-05-31 22:44 GMT+09:00, Robert Haas : >>> On Thu, May 31, 2018 at 8:12 AM, MauMau wrote: >>>> Oh, I didn't know you support FDW approach mainly for analytics. I >>>> guessed the first target was OLTP read-write scalability. >>> >>> That seems like a harder target to me, because you will have an extra >>> hop involved -- SQL from the client to the first server, then via SQL >>> to a second server. The work of parsing and planning also has to be >>> done twice, once for the foreign table and again for the table. For >>> longer-running queries this overhead doesn't matter as much, but for >>> short-running queries it is significant. >> >> Yes, that extra hop and double parsing/planning were the killer for >> our performance goal when we tried to meet our customer's scaleout >> needs with XL. The application executes 82 DML statements in one >> transaction. Those DMLs consist of INSERT, UPDATE and SELECT that >> only accesses one row with a primary key. The target tables are only >> a few, so the application PREPAREs a few statements and EXECUTEs them >> repeatedly. We placed the coordinator node of XL on the same host as >> the application, and data nodes and GTM on other individual nodes. >> > > I agree that there's double parsing happening, but I am hesitant to > agree with the double planning claim. We do plan, let's say a join > between two foreign tables, on the local server, but that's only to > decide whether it's efficient to join locally or on the foreign > server. That means we create foreign paths for scan on the foreign > tables, may be as many parameterized plans as the number of join > conditions, and one path for the join pushdown that's it. We then > create local join paths but we need those to decide whether it's > efficient to join locally and if yes, which way. But don't create > paths as to how the foreign server would plan that join. That's not > double planning since we do not create same paths locally and on the > foreign server. > > In order to avoid double parsing, we might want to find a way to pass > a "normalized" parse tree down to the foreign server. We need to > normalize the OIDs in the parse tree since those may be different > across the nodes. Passing detailed info between servers is exactly what XL does. It requires us to define a cluster, exactly as XL does. And yes, its a good idea to replicate some tables to all nodes, as XL does. So it seems we have at last some agreement that some of the things XL does are the correct approaches. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: SHOW ALL does not honor pg_read_all_settings membership
On 17 April 2018 at 04:28, Michael Paquier wrote: > On Mon, Apr 16, 2018 at 02:32:10PM +0200, Laurenz Albe wrote: >> Now that the dust from the last commitfest is settling, I'll make a second >> attempt to attract attention for this small bug fix. >> >> The original commit was Simon's. > > Thanks for the ping. > > This was new as of v10, so this cannot be listed as an open item still I > have added that under the section for older bugs, because you are right > as far as I can see. OK, agreed, its a bug. Any objections to backpatch to v10? > GetConfigOption is wrong by the way, as restrict_superuser means that > all members of the group pg_read_all_settings can read > GUC_SUPERUSER_ONLY params, and not only superusers, so the comment at > least needs a fix, the variable ought to be renamed as well. OK -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: pg_replication_slot_advance to return NULL instead of 0/0 if slot not advanced
On 28 May 2018 at 09:57, Michael Paquier wrote: > On Fri, May 25, 2018 at 02:12:32PM +0200, Magnus Hagander wrote: >> I agree that returning 0/0 on this is wrong. >> >> However, can this actually occour for any case other than exactly the case >> of "moving the position to where the position already is"? If I look at the >> physical slot path at least that seems to be the only case, and in that >> case I think the correct thing to return would be the new position, and not >> NULL. If we actually *fail* to move the position, we give an error. > > Yes, this only returns InvalidXLogRecPtr if the location could not be > moved forward. Still, there is more going on here. For a physical > slot, confirmed_lsn is always 0/0, hence the backward check is never > applied for it. What I think should be used for value assigned to > startlsn is restart_lsn instead. Then what do we do if the position > cannot be moved: should we raise an error, as what my patch attached > does, or just report the existing position the slot is at? I don't see why an ERROR would be appropriate. > A second error that I can see is in pg_logical_replication_slot_advance, > which does not take the mutex lock of MyReplicationSlot, so concurrent > callers like those of ReplicationSlotsComputeRequiredLSN (applies to > physical slot actually) and pg_get_replication_slots() may read false > data. > > On top of that, it seems to me that StartLogicalReplication is reading a > couple of fields from a slot without taking a lock, so at least > pg_get_replication_slots() may read incorrect data. > ReplicationSlotReserveWal also is missing a lock.. Those are older than > v11 though. > >> Actually, isn't there also a race there? That is, if we try to move it, we >> check that we're not trying to move it backwards, and throw an error, but >> that's checked outside the lock. Then later we actually move it, and check >> *again* if we try to move it backwards, but if that one fails we return >> InvalidXLogRecPtr (which can happen in the case of concurrent activity on >> the slot, I think)? In this case, maybe we should just re-check that and >> raise an error appropriately? > > Er, isn't the take here that ReplicationSlotAcquire is used to lock any > replication slot update to happen from other backends? It seems to me > that what counts at the end if if a backend PID is associated to a slot > in memory. If you look at the code paths updating a logical or physical > slot then those imply that the slot is owned, still a spin lock needs to > be taken for concurrent readers. > >> (I haven't looked at the logical slot path, but I assume it would have >> something similar in it) > > Found one. All the things I have spotted are in the patch attached. I think the problem here is there are no comments explaining how to access the various fields in the structure, so there was no way to check whether the code was good or not. If we add corrective code we should clarify that in comments the .h file also, as is done in XlogCtl Your points look correct to me, well spotted. I'd like to separate the correction of these issues from the change of behavior patch. Those earlier issues can be backpatched, but the change of behavior only affects PG11. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Undo logs
On 24 May 2018 at 23:22, Thomas Munro wrote: > As announced elsewhere[1][2][3], at EnterpriseDB we are working on a > proposal to add in-place updates with undo logs to PostgreSQL. The > goal is to improve performance and resource usage by recycling space > better. Cool > The lowest level piece of this work is a physical undo log manager, > 1. Efficient appending of new undo data from many concurrent > backends. Like logs. > 2. Efficient discarding of old undo data that isn't needed anymore. > Like queues. > 3. Efficient buffered random reading of undo data. Like relations. Like an SLRU? > [4] > https://github.com/EnterpriseDB/zheap/tree/undo-log-storage/src/backend/access/undo > [5] > https://github.com/EnterpriseDB/zheap/tree/undo-log-storage/src/backend/storage/smgr I think there are quite a few design decisions there that need to be discussed, so lets crack on and discuss them please. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: pg_replication_slot_advance to return NULL instead of 0/0 if slot not advanced
On 25 May 2018 at 13:12, Magnus Hagander wrote: > > > On Fri, May 25, 2018 at 7:28 AM, Michael Paquier > wrote: >> >> Hi all, >> >> When attempting to use multiple times pg_replication_slot_advance on a >> slot, then the caller gets back directly InvalidXLogRecPtr as result, >> for example: >> =# select * from pg_replication_slot_advance('popo', 'FF/0'); >> slot_name | end_lsn >> ---+--- >> popo | 0/60021E0 >> (1 row) >> =# select * from pg_replication_slot_advance('popo', 'FF/0'); >> slot_name | end_lsn >> ---+- >> popo | 0/0 >> (1 row) >> >> Wouldn't it be more simple to return NULL to mean that the slot could >> not be moved forward? That would be easier to parse for clients. >> Please see the attached. > > > I agree that returning 0/0 on this is wrong. Agreed > However, can this actually occour for any case other than exactly the case > of "moving the position to where the position already is"? If I look at the > physical slot path at least that seems to eb the only case, and in that case > I think the correct thing to return would be the new position, and not NULL. Docs say "Returns name of the slot and real position to which it was advanced to." so agreed > If we actually *fail* to move the position, we give an error. > > Actually, isn't there also a race there? That is, if we try to move it, we > check that we're not trying to move it backwards, and throw an error, but > that's checked outside the lock. Then later we actually move it, and check > *again* if we try to move it backwards, but if that one fails we return > InvalidXLogRecPtr (which can happen in the case of concurrent activity on > the slot, I think)? In this case, maybe we should just re-check that and > raise an error appropriately? Agreed > (I haven't looked at the logical slot path, but I assume it would have > something similar in it) -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: PG 11 feature count
On 17 May 2018 at 18:29, Bruce Momjian wrote: > I regularly track the number of items documented in each major release. > I use the attached script. You might be surprised to learn that PG 11 > has the lowest feature count of any release back through 7.4: > > 7.4 280 > 8.0 238 > 8.1 187 > 8.2 230 > 8.3 237 > 8.4 330 > 9.0 252 > 9.1 213 > 9.2 250 > 9.3 187 > 9.4 217 > 9.5 200 > 9.6 220 > 10 194 > 11 167 It would be useful to combine that with the CF app data showing number of patches submitted and number of features rejected. Not available for all time, but certainly goes back a few years now. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Removing unneeded self joins
On 16 May 2018 at 15:10, Tom Lane wrote: > Simon Riggs writes: >> What I would add is that I've seen cases where the extra joins do NOT >> hurt performance, so the extra CPU used to remove the join hurts more >> than the benefit of removing it. Yes, we tried it. > > Interesting. The concern I had was more about the cost imposed on every > query to detect self-joins and try to prove them useless, even in queries > where no benefit ensues. It's possible that we can get that down to the > point where it's negligible; but this says that even the successful-proof > case has to be very cheap. What I was advocating was an approach that varies according to the query cost, so we don't waste time trying to tune the heck out of OLTP queries, but for larger queries we might take a more considered approach. For advanced optimizations that are costly to check for, skip the check if we are already below a cost threshold. The threshold would be a heuristic that varies according to the cost of the check. I realise that in this case we wouldn't know the full query cost until we've done join planning, so we would need some lower bound estimate to check whether its worth trying to remove joins. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Removing unneeded self joins
On 16 May 2018 at 11:26, Robert Haas wrote: > On Wed, May 16, 2018 at 12:08 PM, Tom Lane wrote: >> Alexander Kuzmenkov writes: >>> There is a join optimization we don't do -- removing inner join of a >>> table with itself on a unique column. Such joins are generated by >>> various ORMs, so from time to time our customers ask us to look into >>> this. Most recently, it was discussed on the list in relation to an >>> article comparing the optimizations that some DBMS make [1]. >> >> This is the sort of thing that I always wonder why the customers don't >> ask the ORM to stop generating such damfool queries. Its *expensive* >> for us to clean up after their stupidity; almost certainly, it would >> take far fewer cycles, net, for them to be a bit smarter in the first >> place. > > The trouble, of course, is that the customer didn't write the ORM, > likely has no idea how it works, and doesn't want to run a modified > version of it even if they do. If the queries run faster on other > systems than they do on PostgreSQL, we get dinged -- not unjustly. > > Also, I'm not sure that I believe that it's always easy to avoid > generating such queries. I mean, this case is trivial so it's easy to > say, well, just rewrite the query. But suppose that I have a fact > table over which I've created two views, each of which performs > various joins between the fact table and various lookup tables. My > queries are such that I normally need the joins in just one of these > two views and not the other to fetch the information I care about. > But every once in a while I need to run a report that involves pulling > every column possible. The obvious solution is to join the views on > the underlying table's primary key, but then you get this problem. Of > course there's a workaround: define a third view that does both sets > of joins-to-lookup-tables. But that starts to feel like you're > handholding the database; surely it's the database's job to optimize > queries, not the user's. > > It's been about 10 years since I worked as a web developer, but I do > remember hitting this kind of problem from time to time and I'd really > like to see us do something about it. I wish we could optimize away > inner joins, too, for similar reasons. I agree with everything you say. What I would add is that I've seen cases where the extra joins do NOT hurt performance, so the extra CPU used to remove the join hurts more than the benefit of removing it. Yes, we tried it. More advanced optimizations should only be applied when we've assessed that the likely run time is high enough to make it worth investing in further optimization. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] Surjective functional indexes
On 11 May 2018 at 16:37, Andres Freund wrote: > On 2018-05-11 14:56:12 +0200, Simon Riggs wrote: >> On 11 May 2018 at 05:32, Andres Freund wrote: >> > No. Simon just claimed it's not actually a concern: >> > https://www.postgresql.org/message-id/canp8+j+vtskphep_gmqmeqdwakst2kbotee0yz-my+agh0a...@mail.gmail.com >> > >> > And yes, it got committed without doing squat to address the >> > architectural concerns. >> >> "Squat" means "zero, nothing" to me. So that comment would be inaccurate. > > Yes, I know it means that. And I don't see how it is inaccurate. > > >> I have no problem if you want to replace this with an even better >> design in a later release. > > Meh. The author / committer should get a patch into the right shape They have done, at length. Claiming otherwise is just trash talk. As you pointed out, the design of the patch avoids layering violations that could have led to architectural objections. Are you saying I should have ignored your words and rewritten the patch to introduce a layering violation? What other objection do you think has not been addressed? -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] Surjective functional indexes
On 11 May 2018 at 05:32, Andres Freund wrote: > On 2018-05-10 23:25:58 -0400, Robert Haas wrote: >> On Thu, Mar 1, 2018 at 2:48 PM, Andres Freund wrote: >> > I still don't think, as commented upon by Tom and me upthread, that we >> > want this feature in the current form. >> >> Was this concern ever addressed, or did the patch just get committed anyway? > > No. Simon just claimed it's not actually a concern: > https://www.postgresql.org/message-id/canp8+j+vtskphep_gmqmeqdwakst2kbotee0yz-my+agh0a...@mail.gmail.com > > And yes, it got committed without doing squat to address the > architectural concerns. "Squat" means "zero, nothing" to me. So that comment would be inaccurate. I've spent a fair amount of time reviewing and grooming the patch, and I am happy that Konstantin has changed his patch significantly in response to those reviews, as well as explaining why the patch is fine as it is. It's a useful patch that PostgreSQL needs, so all good. I have no problem if you want to replace this with an even better design in a later release. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Indexes on partitioned tables and foreign partitions
On 9 May 2018 at 17:33, Robert Haas wrote: > On Wed, May 9, 2018 at 11:20 AM, Simon Riggs wrote: >> On 9 May 2018 at 16:15, Robert Haas wrote: >>> On Wed, May 9, 2018 at 11:14 AM, Simon Riggs wrote: >>>> On 9 May 2018 at 16:10, Tom Lane wrote: >>>>> Robert Haas writes: >>>>>> On Wed, May 9, 2018 at 9:08 AM, Simon Riggs >>>>>> wrote: >>>>>>> Shouldn't the fix be to allow creation of indexes on foreign tables? >>>>>>> (Maybe they would be virtual or foreign indexes??) >>>>> >>>>>> It might be useful to invent the concept of a foreign index, but not >>>>>> for v11 a month after feature freeze. >>>>> >>>>> Yeah. That's a can of worms we can *not* open at this stage. >>>> >>>> Lucky nobody suggested that then, eh? Robert's just making a joke. >>> >>> Someone did suggest that. It was you. >> >> Oh, you weren't joking. I think we're having serious problems with >> people putting words in my mouth again then. >> >> Please show me where I suggested doing anything for v11? > > Come on, Simon. It's in the quoted text. No, its not. > I realize you didn't say > v11 specifically, but this is the context of a patch that is proposed > a bug-fix for v11. If you meant that we should apply the patch as > proposed now, or some other one, and do the other thing later, you > could have said so. I think its reasonable to expect you interpret my words sensibly, rather than in some more dramatic form where I seem to break rules with every phrase. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Needless additional partition check in INSERT?
On 10 May 2018 at 05:33, David Rowley wrote: > On 10 May 2018 at 16:13, Amit Langote wrote: >> The patch to ExecInsert looks good, but I think we also need to do the >> same thing in CopyFrom. > > I think so too. > > Updated patch attached. Patch is good. The cause of this oversight is the lack of comments to explain the original coding, so we need to correct that in this patch, please. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Indexes on partitioned tables and foreign partitions
On 9 May 2018 at 16:15, Robert Haas wrote: > On Wed, May 9, 2018 at 11:14 AM, Simon Riggs wrote: >> On 9 May 2018 at 16:10, Tom Lane wrote: >>> Robert Haas writes: >>>> On Wed, May 9, 2018 at 9:08 AM, Simon Riggs wrote: >>>>> Shouldn't the fix be to allow creation of indexes on foreign tables? >>>>> (Maybe they would be virtual or foreign indexes??) >>> >>>> It might be useful to invent the concept of a foreign index, but not >>>> for v11 a month after feature freeze. >>> >>> Yeah. That's a can of worms we can *not* open at this stage. >> >> Lucky nobody suggested that then, eh? Robert's just making a joke. > > Someone did suggest that. It was you. Oh, you weren't joking. I think we're having serious problems with people putting words in my mouth again then. Please show me where I suggested doing anything for v11? -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Indexes on partitioned tables and foreign partitions
On 9 May 2018 at 16:10, Tom Lane wrote: > Robert Haas writes: >> On Wed, May 9, 2018 at 9:08 AM, Simon Riggs wrote: >>> Shouldn't the fix be to allow creation of indexes on foreign tables? >>> (Maybe they would be virtual or foreign indexes??) > >> It might be useful to invent the concept of a foreign index, but not >> for v11 a month after feature freeze. > > Yeah. That's a can of worms we can *not* open at this stage. Lucky nobody suggested that then, eh? Robert's just making a joke. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Indexes on partitioned tables and foreign partitions
On 9 May 2018 at 15:57, Robert Haas wrote: > For right now, I think the options are (1) throw an ERROR if we > encounter a foreign table or (2) silently skip the foreign table. I > think (2) is defensible for non-UNIQUE indexes, because the index is > just a performance optimization. However, for UNIQUE indexes, at > least, it seems like we'd better do (1), because a major point of such > an index is to enforce a constraint; we can't allege that we have such > a constraint if foreign tables are just silently skipped. If we can assume an index exists on a foreign table, why can we not just assume a unique index exists?? Why the difference? -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Indexes on partitioned tables and foreign partitions
On 9 May 2018 at 15:26, Arseny Sher wrote: > > Simon Riggs writes: > >> How much sense is it to have a partitioned table with a mix of local >> and foreign tables? > > Well, as much sense as fdw-based sharding has, for instance. It is > arguable, but it exists. > >> Shouldn't the fix be to allow creation of indexes on foreign tables? >> (Maybe they would be virtual or foreign indexes??) > > Similar ideas were discussed at [1]. There was no wide consensus of even > what problems such feature would solve. Since currently indexes on > foreign tables are just forbidden, it seems to me that the best what > partitioning code can do today is just not creating them. > > [1] > https://www.postgresql.org/message-id/flat/4F62FD69.2060007%40lab.ntt.co.jp#4f62fd69.2060...@lab.ntt.co.jp Indexes on foreign tables cause an ERROR, so yes, we already just don't create them. You're suggesting silently skipping the ERROR. I can't see a reason for that. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Indexes on partitioned tables and foreign partitions
On 9 May 2018 at 12:50, Arseny Sher wrote: > Hi, > > 8b08f7d4 added propagation of indexes on partitioned tables to > partitions, which is very cool. However, index creation also recurses > down to foreign tables. I doubt this is intentional, as such indexes are > forbidden as not making much sense; attempt to create index on > partitioned table with foreign partition leads to an error > now. Attached lines fix this. "Fix"? How much sense is it to have a partitioned table with a mix of local and foreign tables? Shouldn't the fix be to allow creation of indexes on foreign tables? (Maybe they would be virtual or foreign indexes??) -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Postgres, fsync, and OSs (specifically linux)
On 28 April 2018 at 09:15, Andres Freund wrote: > Hi, > > On 2018-04-28 08:25:53 -0700, Simon Riggs wrote: >> > - Use direct IO. Due to architectural performance issues in PG and the >> > fact that it'd not be applicable for all installations I don't think >> > this is a reasonable fix for the issue presented here. Although it's >> > independently something we should work on. It might be worthwhile to >> > provide a configuration that allows to force DIO to be enabled for WAL >> > even if replication is turned on. >> >> "Use DirectIO" is roughly same suggestion as "don't trust Linux filesystems". > > I want to emphasize that this is NOT a linux only issue. It's a problem > across a number of operating systems, including linux. Yes, of course. >> It would be a major admission of defeat for us to take that as our >> main route to a solution. > > Well, I think we were wrong to not engineer towards DIO. There's just > too many issues with buffered IO to not have a supported path for > DIO. But given that it's unrealistic to do so without major work, and > wouldn't be applicable for all installations (shared_buffer size becomes > critical), I don't think it matters that much for the issue discussed > here. > > >> The people I've spoken to so far have encouraged us to continue >> working with the filesystem layer, offering encouragement of our >> decision to use filesystems. > > There's a lot of people disagreeing with it too. Specific recent verbal feedback from OpenLDAP was that the project adopted DIO and found no benefit in doing so, with regret the other way from having tried. The care we need to use for any technique is the same. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Postgres, fsync, and OSs (specifically linux)
On 28 April 2018 at 08:25, Simon Riggs wrote: > On 27 April 2018 at 15:28, Andres Freund wrote: > >> - Add a pre-checkpoint hook that checks for filesystem errors *after* >> fsyncing all the files, but *before* logging the checkpoint completion >> record. Operating systems, filesystems, etc. all log the error format >> differently, but for larger installations it'd not be too hard to >> write code that checks their specific configuration. >> >> While I'm a bit concerned adding user-code before a checkpoint, if >> we'd do it as a shell command it seems pretty reasonable. And useful >> even without concern for the fsync issue itself. Checking for IO >> errors could e.g. also include checking for read errors - it'd not be >> unreasonable to not want to complete a checkpoint if there'd been any >> media errors. > > It seems clear that we need to evaluate our compatibility not just > with an OS, as we do now, but with an OS/filesystem. > > Although people have suggested some approaches, I'm more interested in > discovering how we can be certain we got it right. > > And the end result seems to be that PostgreSQL will be forced, in the > short term, to declare certain combinations of OS/filesystem > unsupported, with clear warning sent out to users. > > Adding a pre-checkpoint hook encourages people to fix this themselves > without reporting issues, so I initially oppose this until we have a > clearer argument as to why we need it. The answer is not to make this > issue more obscure, but to make it more public. Thinking some more, I think I understand, but please explain if not. We need behavior that varies according to OS and filesystem, which varies per tablespace. We could have that variable behavior using * a hook * a set of GUC parameters that can be set at tablespace level * a separate config file for each tablespace My preference would be to avoid a hook. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Postgres, fsync, and OSs (specifically linux)
On 27 April 2018 at 15:28, Andres Freund wrote: > - Add a pre-checkpoint hook that checks for filesystem errors *after* > fsyncing all the files, but *before* logging the checkpoint completion > record. Operating systems, filesystems, etc. all log the error format > differently, but for larger installations it'd not be too hard to > write code that checks their specific configuration. > > While I'm a bit concerned adding user-code before a checkpoint, if > we'd do it as a shell command it seems pretty reasonable. And useful > even without concern for the fsync issue itself. Checking for IO > errors could e.g. also include checking for read errors - it'd not be > unreasonable to not want to complete a checkpoint if there'd been any > media errors. It seems clear that we need to evaluate our compatibility not just with an OS, as we do now, but with an OS/filesystem. Although people have suggested some approaches, I'm more interested in discovering how we can be certain we got it right. And the end result seems to be that PostgreSQL will be forced, in the short term, to declare certain combinations of OS/filesystem unsupported, with clear warning sent out to users. Adding a pre-checkpoint hook encourages people to fix this themselves without reporting issues, so I initially oppose this until we have a clearer argument as to why we need it. The answer is not to make this issue more obscure, but to make it more public. > - Use direct IO. Due to architectural performance issues in PG and the > fact that it'd not be applicable for all installations I don't think > this is a reasonable fix for the issue presented here. Although it's > independently something we should work on. It might be worthwhile to > provide a configuration that allows to force DIO to be enabled for WAL > even if replication is turned on. "Use DirectIO" is roughly same suggestion as "don't trust Linux filesystems". It would be a major admission of defeat for us to take that as our main route to a solution. The people I've spoken to so far have encouraged us to continue working with the filesystem layer, offering encouragement of our decision to use filesystems. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Query is over 2x slower with jit=on
On 18 April 2018 at 16:50, Andres Freund wrote: > On 2018-04-18 17:35:31 +0200, Andreas Joseph Krogh wrote: >> With jit=on: >> https://explain.depesz.com/s/vYB >> Planning Time: 0.336 ms >> JIT: >> Functions: 716 >> Generation Time: 78.404 ms >> Inlining: false >> Inlining Time: 0.000 ms >> Optimization: false >> Optimization Time: 43.916 ms >> Emission Time: 600.031 ms > > Any chance this is a debug LLVM build? > > >> What's the deal with jit making it slower? > > JIT has cost, and sometimes it's not beneficial. Here our heuristics > when to JIT appear to be a bit off. In the parallel world it's worse > because the JITing is duplicated for parallel workers atm. Please change the name of the "JIT" parameter to something meaningful to humans before this gets too far into the wild. SSL is somewhat understandable because its not a Postgres-private term. geqo is regrettable and we really don't want any more too short/abbreviated parameter names. Think of our EOU if every GUC was a TLA. Thanks -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: reloption to prevent VACUUM from truncating empty pages at the end of relation
On 17 April 2018 at 20:09, Tom Lane wrote: > Alvaro Herrera writes: >> Andres was working on a radix tree structure to fix this problem, but >> that seems to be abandoned now, and it seems a major undertaking. While >> I agree that the proposed solution is a wart, it seems much better than >> no solution at all. Can we consider Fujii's proposal as a temporary >> measure until we fix shared buffers? I'm +1 on it myself. > > Once we've introduced a user-visible reloption it's going to be > practically impossible to get rid of it, so I'm -1. I'd much rather > see somebody put some effort into the radix-tree idea than introduce > a kluge that we'll be stuck with, and that doesn't even provide a > good user experience. Disabling vacuum truncation is *not* something > that I think we should recommend. The truncation at the end of VACUUM takes an AccessExclusiveLock, which is already user visible. Using a radix tree won't alter that. ISTM the user might be interested in having the *lock* NOT happen, so I am +1 for the suggestion regardless of whether radix tree ever happens. The lock itself can be cancelled, so the user would also be interested in explicitly requesting a retry with a separate command/function. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Bugs in TOAST handling, OID assignment and redo recovery
On 11 April 2018 at 19:57, Tom Lane wrote: > Pavan Deolasee writes: >> Ok. I propose attached patches, more polished this time. > > I'll take these, unless some other committer is hot to do so? Please go ahead. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: pgsql: New files for MERGE
On 7 April 2018 at 18:45, Tom Lane wrote: > Simon Riggs writes: >> On 6 April 2018 at 17:22, Bruce Momjian wrote: >>> My point was that people didn't ask you to work harder on fixing the >>> patch, but in reverting it. You can work harder on fixing things in the >>> hope they change their minds, but again, that isn't addressing their >>> request. > >> If Tom or Andres still feel that their concerns have not been >> addressed over the last few days, I am happy to revert the patch with >> no further discussion from me in this cycle. > > FWIW, I still vote to revert. Even if the patch were now perfect, > there is not time for people to satisfy themselves of that, and > we've got lots of other things on our plates. > > I'd be glad to participate in a proper review of this when v12 > opens. But right now it just seems too rushed, and I have little > confidence in it being right. > > regards, tom lane > > PS: If you do revert, please wrap it up as a single revert commit, > not a series of half a dozen. You've already put several > non-buildable states into the commit history as a result of this > patch, each one of which is a land mine for git bisect testing. > We don't need more of those. Also, looking at the reverse of the > reversion commit will provide a handy way of seeing the starting > point for future discussion of this patch. Will do. "Commence primary ignition." -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: pgsql: New files for MERGE
On 6 April 2018 at 17:22, Bruce Momjian wrote: > On Fri, Apr 6, 2018 at 09:21:54AM +0100, Simon Riggs wrote: >> On 5 April 2018 at 21:02, Bruce Momjian wrote: >> > Simon, you have three committers in this thread suggesting this patch be >> > reverted. Are you just going to barrel ahead with the fixes without >> > addressing their emails? >> >> PeterG confirms that the patch works and has the agreed concurrency >> semantics. Separating out the code allows us to see clearly that we >> have almost complete test coverage of the code and its features. > > My point was that people didn't ask you to work harder on fixing the > patch, but in reverting it. You can work harder on fixing things in the > hope they change their minds, but again, that isn't addressing their > request. I had understood Tom's post to be raising a discussion about whether to revert. In my understanding, Tom's complaints were addressed quickly, against what he expected - I was surprised myself at how quickly Pavan was able to address them. That left Andres' complaints, which as I explained hadn't been given in any detail. Once given, Pavan investigated Andres' complaints and explained in detail the results of his work and that he doesn't see how the proposed changes would improve anything. If there was anything unresolvable before commit I wouldn't have committed it, or unresolvable after commit I would already have reverted it. And as I explained, this commit was not rushed and various other negative points made are unfortunately not correct. Now I can see some people are annoyed, so I'm happy to apologize if I've done things that weren't understood or caused annoyance. Time is short at end of CF and tempers fray for us all. If Tom or Andres still feel that their concerns have not been addressed over the last few days, I am happy to revert the patch with no further discussion from me in this cycle. If request to revert occurs, I propose to do this on Wed 11 pm, to avoid getting in the way of other commits and because it will take some time to prepare to revert. Thanks to everyone for review comments and well done to Pavan, whatever we decide. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Add support for printing/reading MergeAction nodes
On 4 April 2018 at 18:08, Tom Lane wrote: > Simon Riggs writes: >> On 4 April 2018 at 17:19, Tom Lane wrote: >>> BTW, poking around in the grammar, I notice that MergeStmt did not >>> get added to RuleActionStmt. That seems like a rather serious >>> omission. > >> MERGE isn't a privilege, a trigger action or a policy action. Why >> would it be in RuleActionStmt? > > Because it seems likely that somebody would want to write a rule along > the lines of "ON UPDATE TO mytable DO INSTEAD MERGE ...". > > Looking a little further ahead, one can easily envision somebody > wanting to do "ON MERGE TO mytable DO INSTEAD something". I'd be > prepared to give a pass on that for the present, partly because > it's not very clear what stuff from the original MERGE needs to be > available to the rule. But the other case seems pretty fundamental. > MERGE is not supposed to be a utility command IMO, it's supposed to > be DML, and that means it needs to work anywhere that you could > write e.g. UPDATE. MERGE is important because it is implemented by other databases, making it an application portability issue and an important feature for PostgreSQL adoption with real users. Enhancing Rules to allow interoperation with MERGE doesn't fall into that same category, so I don't see it needs to work anywhere you can write UPDATE - that certainly isn't the case with triggers, row level security policies or privileges. With that said, we can still discuss it to see if it's possible. ... ON UPDATE TO foo DO INSTEAD MERGE... would look like this MERGE INTO foo USING (what?) ON (join condition) WHEN MATCHED THEN UPDATE which if we managed to achieve that is simply a much poorer version of UPDATE, since MERGE with a single WHEN clause is semantically similar but higher overhead than a single DML operation. So if we implemented it you wouldn't ever use it. Achieving the marriage between rules and merge is made complex in the source and join condition. MERGE is different from other DML in that it always has two tables, so is hard to see how that work with rules on just one table. So that leaves the thought of could we do a more complex version with some kind of exploitation of multi-DML features? Well possibly, but I think its months of work. Adding this requested feature doesn't enhance the original goal of application portability and standards compliance, so there's no way to know when you've got it right. Without a scope or a spec it would be difficult to see where that would go. I would be most glad to be proved wrong to allow us to implement something here and I have no objection to others adding that, though it seems clear that can't happen in this release. For this release, in discussion with Stephen and others at the Dev Meeting in Brussels the addition of features for RLS and partitioning were decided as the priorities for MERGE ahead of other items. This item wasn't mentioned by anyone before and not at all by any users I've spoken to. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] MERGE SQL Statement for PG11
On 5 April 2018 at 21:00, Andres Freund wrote: > And if somebody cleans it up Simon will > complain that things are needlessly being destabilized (hello xlog.c > with it's 1000+ LOC functions). Taking this comment as a special point... with other points addressed separately. ...If anybody wants to know what I think they can just ask... There are various parts of the code that don't have full test coverage. Changing things there is more dangerous and if its being done for no reason then that is risk for little reward. That doesn't block it, but it does make me think that people could spend their time better - and issue that concerns me also. But that certainly doesn't apply to parts of the code like this where we have full test coverage. It may not even apply to recovery now we have the ability to check in real-time the results of recovery and replication. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: pgsql: New files for MERGE
On 5 April 2018 at 21:02, Bruce Momjian wrote: > On Thu, Apr 5, 2018 at 11:15:20AM +0100, Simon Riggs wrote: >> On 4 April 2018 at 21:28, Simon Riggs wrote: >> > On 4 April 2018 at 21:14, Andres Freund wrote: >> > >> >>> The normal way is to make review comments that allow change. Your >> >>> request for change of the parser data structures is fine and can be >> >>> done, possibly by Saturday >> >> >> >> I did request changes, and you've so far ignored those requests. >> > >> > Pavan tells me he has replied to you and is working on specific changes. >> >> Specific changes requested have now been implemented by Pavan and >> committed by me. >> >> My understanding is that he is working on a patch for Tom's requested >> parser changes, will post on other thread. > > Simon, you have three committers in this thread suggesting this patch be > reverted. Are you just going to barrel ahead with the fixes without > addressing their emails? PeterG confirms that the patch works and has the agreed concurrency semantics. Separating out the code allows us to see clearly that we have almost complete test coverage of the code and its features. The fixes being applied are ones proposed by Andres, Tom, Andrew, Jesper. So their emails are being addressed in full, where there is anything to discuss and change. Yes, that work is ongoing since there is a CF deadline. So Pavan and I are very clearly and publicly addressing their emails and nobody is being ignored. The architecture of the executor has been described as wrong, but at the time that was made there was zero detail behind that claim, so there was nothing to comment upon. I'm surprised and somewhat suspicious that multiple people found anything with which to agree or disagree with that suggestion; I'm also suprised that nobody else has pointed out the lack of substance there. Given that the executor manifestly works and has been re-engineered according to PeterG's requests and that many performance concerns have already been addressed prior to commit, Pavan and I were happy with it. My proposal to commit the patch was given 5 days ahead of time and no comments were received by anyone, not even PeterG. There was no rush and I personally performed extensive reviews before final commit. And as Robert has reminded me often that committers do not possess a veto that they can simply use to remove patches, there have been no reasonable grounds to revoke anything. I do have sympathy with the need for good architecture and executor design. I expressed exactly the same last year with the missing executor elements in the partitioning patch. Notably nobody asked for that to be revoked, not even myself knowing that the executor would require major changes in PG11. The executor for MERGE is in radically better shape. I see that Andres has now posted something giving some details about his concerns (posted last night, so Pavan and I are only now reading it). AFAICS these are not blocking design issues, simply requests for some moving around of the code. We will of course answer in detail on that thread and discuss further. Given we have full test coverage it is reasonable to imagine that can be done relatively quickly. Given the extreme late notice of those review requests, the relative separation of the MERGE code and the fact this is a new feature not related to robustness, it seems reasonable to be granted an extension to complete changes. We're currently assessing how long that might be, but an additional week seems likely. The project is blessed with many great developers; I do not claim to be one myself but I am a diligent committer. It's been said that neither Tom or Andres thought the code would be committed so neither looked at this code, even though they both knew of my proposed commit of the feature in January, with some caveats at that time. Obviously, when great people see a new thing they will always see ways of doing it better, but that doesn't mean things are below project standards. It would be useful to have a way for all committers to signal ahead of time what they think is likely to happen to avoid this confusion, rather than a few private IMs between friends. I've been surprised before by people saying "that's not going in" when they clearly haven't even looked at the code and yet others think it is all good. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] MERGE SQL Statement for PG11
On 5 April 2018 at 17:07, Alvaro Herrera wrote: > Simon Riggs wrote: >> On 5 April 2018 at 16:09, Alvaro Herrera wrote: >> > Quick item: parse_clause.h fails cpluspluscheck because it has a C++ >> > keyword as a function argument name: >> > >> > ./src/include/parser/parse_clause.h:26:14: error: expected ‘,’ or ‘...’ >> > before ‘namespace’ >> >List **namespace); >> > ^ >> >> How's this as a fix? > > WFM Pushed -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] MERGE SQL Statement for PG11
On 5 April 2018 at 16:09, Alvaro Herrera wrote: > Quick item: parse_clause.h fails cpluspluscheck because it has a C++ > keyword as a function argument name: > > ./src/include/parser/parse_clause.h:26:14: error: expected ‘,’ or ‘...’ > before ‘namespace’ >List **namespace); > ^ How's this as a fix? -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services fnamespace.v1.patch Description: Binary data
Re: Excessive PostmasterIsAlive calls slow down WAL redo
On 5 April 2018 at 08:23, Heikki Linnakangas wrote: > That seems like an utter waste of time. I'm almost inclined to call that a > performance bug. As a straightforward fix, I'd suggest that we call > HandleStartupProcInterrupts() in the WAL redo loop, not on every record, but > only e.g. every 32 records. That would make the main redo loop less > responsive to shutdown, SIGHUP, or postmaster death, but that seems OK. > There are also calls to HandleStartupProcInterrupts() in the various other > loops, that wait for new WAL to arrive or recovery delay, so this would only > affect the case where we're actively replaying records. +1 -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] MERGE SQL Statement for PG11
On 5 April 2018 at 13:18, Teodor Sigaev wrote: >> The variable would become unused in non-assert builds. I see that. But >> simply removing it is not a solution and I don't think the code will compile >> that way. We should either rewrite that assertion or put it inside a #ifdef >> ASSERT_CHECKING block or simple remove that assertion because we already >> check for relkind in parse_merge.c. Will check. >> > > That is noted by Kyotaro HORIGUCHI > https://www.postgresql.org/message-id/20180405.181730.125855581.horiguchi.kyotaro%40lab.ntt.co.jp > > and his suggestion to use special macro looks better for me: > - charrelkind; > + charrelkind PG_USED_FOR_ASSERTS_ONLY; Thanks both, I already fixed that. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] MERGE SQL Statement for PG11
On 5 April 2018 at 13:19, Jesper Pedersen wrote: > Hi, > > On 04/05/2018 08:04 AM, Simon Riggs wrote: >> >> On 5 April 2018 at 12:56, Jesper Pedersen >> wrote: >>> >>> Updated for non-assert build. >> >> >> Thanks, pushed. Sorry to have you wait til v3 >> > > That patch was a but rushed, and cut off too much. Yes, noted, already fixed. Thanks -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] MERGE SQL Statement for PG11
On 5 April 2018 at 12:56, Jesper Pedersen wrote: > Hi, > > On 04/05/2018 07:48 AM, Simon Riggs wrote: >>> >>> Updated version due to latest refactoring. >> >> >> Thanks for your input. Removing that seems to prevent compilation. >> >> Did something change in between? >> > > Updated for non-assert build. Thanks, pushed. Sorry to have you wait til v3 -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] MERGE SQL Statement for PG11
On 5 April 2018 at 12:38, Jesper Pedersen wrote: > Hi Simon and Paven, > > On 04/04/2018 08:46 AM, Jesper Pedersen wrote: >> >> On 03/30/2018 07:10 AM, Simon Riggs wrote: >>> >>> No problems found, but moving proposed commit to 2 April pm >>> >> >> There is a warning for this, as attached. >> > > Updated version due to latest refactoring. Thanks for your input. Removing that seems to prevent compilation. Did something change in between? -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Add support for printing/reading MergeAction nodes
On 5 April 2018 at 11:31, Pavan Deolasee wrote: > Attached patch refactors the grammar/parser side per your comments. We no > longer use InsertStmt/UpdateStmt/DeleteStmt/SelectStmt as part of > MergeAction. Instead we only collect the necessary information for running > the INSERT/UPDATE/DELETE actions. Speaking of MergeAction itself, I decided > to use a new parser-only node named MergeWhenClause and removed unnecessary > members from the MergeAction node which now gets to planner/executor. That looks good to me. Simply separation of duty. > Regarding the original report by Marina I suspect she may have turned > debug_print_parse=on while running regression. I could reproduce the > failures in the isolation tests by doing same. The attached patch however > passes all tests with the following additional GUCs. So I am more confident > that we should have got the outfuncs.c support ok. > > debug_print_parse=on > debug_print_rewritten=on > debug_print_plan=on > > Also, I am now running tests with -DCOPY_PARSE_PLAN_TREES > -DRAW_EXPRESSION_COVERAGE_TEST since the buildfarm had earlier uncovered > some issues with those flags. No problem there too. OK, so $OP fixed. > This now also enforces single VALUES clause in the grammar itself instead of > doing that check at parse-analyse time. So that's a net improvement too. OK, that's good. I've updated the docs to show this restriction correctly. I'll commit this tomorrow morning unless further comments or edits. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: pgsql: New files for MERGE
On 4 April 2018 at 21:28, Simon Riggs wrote: > On 4 April 2018 at 21:14, Andres Freund wrote: > >>> The normal way is to make review comments that allow change. Your >>> request for change of the parser data structures is fine and can be >>> done, possibly by Saturday >> >> I did request changes, and you've so far ignored those requests. > > Pavan tells me he has replied to you and is working on specific changes. Specific changes requested have now been implemented by Pavan and committed by me. My understanding is that he is working on a patch for Tom's requested parser changes, will post on other thread. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] MERGE SQL Statement for PG11
On 5 April 2018 at 07:01, Pavan Deolasee wrote: >> +/* >> + * Given OID of the partition leaf, return the index of the leaf in the >> + * partition hierarchy. >> + */ >> +int >> +ExecFindPartitionByOid(PartitionTupleRouting *proute, Oid partoid) >> +{ >> + int i; >> + >> + for (i = 0; i < proute->num_partitions; i++) >> + { >> + if (proute->partition_oids[i] == partoid) >> + break; >> + } >> + >> + Assert(i < proute->num_partitions); >> + return i; >> +} >> >> Shouldn't we at least warn in a comment that this is O(N)? And document >> that it does weird stuff if the OID isn't found? > > > Yeah, added a comment. Also added a ereport(ERROR) if we do not find the > partition. There was already an Assert, but may be ERROR is better. > >> >> >> Perhaps just introduce a PARTOID syscache? >> > > Probably as a separate patch. Anything more than a handful partitions is > anyways known to be too slow and I doubt this code will add anything > material impact to that. There's a few others trying to change that now, so I think we should consider working on this now. PARTOID syscache sounds like a good approach. >> diff --git a/src/backend/executor/nodeMerge.c >> b/src/backend/executor/nodeMerge.c >> new file mode 100644 >> index 000..0e0d0795d4d >> --- /dev/null >> +++ b/src/backend/executor/nodeMerge.c >> @@ -0,0 +1,575 @@ >> >> +/*- >> + * >> + * nodeMerge.c >> + * routines to handle Merge nodes relating to the MERGE command >> >> Isn't this file misnamed and it should be execMerge.c? The rest of the >> node*.c files are for nodes that are invoked via execProcnode / >> ExecProcNode(). This isn't an actual executor node. > > > Makes sense. Done. (Now that the patch is committed, I don't know if there > would be a rethink about changing file names. May be not, just raising that > concern) My review notes suggest a file called execMerge.c. I didn't spot the filename change. I think it's important to do that because there is no executor node called Merge. That is especially confusing because there *is* an executor node called MergeAppend and we want some cognitive distance between those things. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: pgsql: New files for MERGE
On 4 April 2018 at 21:14, Andres Freund wrote: >> The normal way is to make review comments that allow change. Your >> request for change of the parser data structures is fine and can be >> done, possibly by Saturday > > I did request changes, and you've so far ignored those requests. Pavan tells me he has replied to you and is working on specific changes. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: pgsql: New files for MERGE
On 4 April 2018 at 20:09, Tom Lane wrote: > [ removing -committers from cc ] > > Pavan Deolasee writes: >> On Thu, Apr 5, 2018 at 12:16 AM, Andres Freund wrote: >>> Hows that an explanation for just going ahead and committing? Without >>> even commenting on why one thinks the pointed out issues are something >>> that can be resolved later or somesuch? This has an incredibly rushed >>> feel to it. > >> Anyways, I think your reviews comments are useful and I've incorporated >> most of those. Obviously certain things like creating a complete new >> executor machinery is not practical given where we're in the release cycle >> and I am not sure if that has any significant advantages over what we have >> today. > > Well, what's on the table is reverting this patch and asking you to try > again in the v12 cycle. Given Andres' concerns about the executor design, > and mine about the way the parsing end is built, there's certainly no way > that that's all getting fixed by Saturday. Given pretty much everybody's > unhappiness with the way this patch was forced through at the last minute, > I do not think you should expect that we'll say, "okay, we'll let you ship > a bad version of MERGE because there's no more time in this cycle". This version works, with agreed semantics, all fully tested and documented. It's also neat and tight. Look how easy it was for Peter to add WITH semantics on top of it. And it's isolated, so its not a threat to anybody that doesn't choose to use it. Users want it and will use this; if I didn't know that for certain I wouldn't spend time on it. > Personally, I didn't think we had consensus on whether the semantics > are right, let alone on whether this is a satisfactory implementation > code-wise. I know I've never looked at the patch before today; I did not > think it was close enough to being committed that I would need to. I have rejected patches in the past for clearly stated reasons that affect users. I regret that those people didn't discuss their designs before they posted patches and serious holes were found. And in response I provided clear design outlines of what was needed. That is not what is happening here. The normal way is to make review comments that allow change. Your request for change of the parser data structures is fine and can be done, possibly by Saturday If saying "I'm unhappy with something" is sufficient grounds for rejecting a patch, I'm surprised to hear it. There has been no discussion of what exactly would be better, only that what we have is somehow wrong, a point which both Pavan and I dispute, not least because the executor has already been rewritten once at Peter's request. I was under no pressure at all to commit this. In my opinion this is a good version of MERGE and that is why I committed it. If it were not, I would have no hesitation in pushing back a year or more, if I knew of technical reasons to do so. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Add support for printing/reading MergeAction nodes
On 4 April 2018 at 18:51, Tom Lane wrote: > Simon Riggs writes: >> On 4 April 2018 at 17:19, Tom Lane wrote: >>> If the MERGE patch has broken this, I'm going to push back on that >>> and push back on it hard, because it probably means there are a >>> whole bunch of other raw-grammar-output-only node types that can >>> now get past the parser stage as well, as children of these nodes. >>> The answer to that is not to add a ton of new infrastructure, it's >>> "you did MERGE wrong". > >> MERGE contains multiple actions of Insert, Update and Delete and these >> could be output in various debug modes. I'm not clear what meaning we >> might attach to them if we looked since that differs from normal >> INSERTs, UPDATEs, DELETEs, but lets see. > > What I'm complaining about is that that's a very poorly designed parsetree > representation. It may not be the worst one I've ever seen, but it's > got claims in that direction. You're repurposing InsertStmt et al to > do something that's *not* an INSERT statement, but by chance happens > to share some (not all) of the same fields. This is bad because it > invites confusion, and then bugs of commission or omission (eg, assuming > that some particular processing has happened or not happened to subtrees > of that parse node). The most egregious way in which it's a bad idea > is that, as I said, InsertStmt doesn't even exist in post-parse-analysis > trees so far as the normal type of INSERT is concerned. This just opens > a whole batch of ways to screw up. We have some types of raw parse nodes > that are replaced entirely during parse analysis, and we have some others > where it's convenient to use the same node type before and after parse > analysis, but we don't have any that are one way in one context and the > other way in other contexts. And this is not the place to start. > > I think you'd have been better off to fold all of those fields into > MergeAction, or perhaps make several variants of MergeAction. OK, that can be changed, will check and report back tomorrow. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Add support for printing/reading MergeAction nodes
On 4 April 2018 at 17:19, Tom Lane wrote: > Marina Polyakova writes: >> When debugging is enabled for server logging, isolation tests fail >> because there're no corresponding output functions for InsertStmt / >> DeleteStmt / UpdateStmt that are used in the output of the MergeAction >> nodes (see the attached regressions diffs and output). I also attached a >> try that makes the tests pass. Sorry if I missed that it was already >> discussed somewhere. > > Uh ... what? > > Those node types are supposed to appear in raw grammar output only; > they should never survive past parse analysis. So if I understand this correctly, it has nothing to do with the isolation tester, that's just the place where the report was from. Which debug mode are we talking about, please? > If the MERGE patch has broken this, I'm going to push back on that > and push back on it hard, because it probably means there are a > whole bunch of other raw-grammar-output-only node types that can > now get past the parser stage as well, as children of these nodes. > The answer to that is not to add a ton of new infrastructure, it's > "you did MERGE wrong". MERGE hasn't broken anything. Marina is saying that the debug output for MERGE isn't generated correctly. I accept it shouldn't give spurious messages and I'm sure we can fix. MERGE contains multiple actions of Insert, Update and Delete and these could be output in various debug modes. I'm not clear what meaning we might attach to them if we looked since that differs from normal INSERTs, UPDATEs, DELETEs, but lets see. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Add support for printing/reading MergeAction nodes
On 4 April 2018 at 17:19, Tom Lane wrote: > BTW, poking around in the grammar, I notice that MergeStmt did not > get added to RuleActionStmt. That seems like a rather serious > omission. MERGE isn't a privilege, a trigger action or a policy action. Why would it be in RuleActionStmt? Could you explain what command you think should be supported? -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [PATCH] Logical decoding of TRUNCATE
On 4 April 2018 at 03:31, Peter Eisentraut wrote: > I wonder why this approach was chosen. I looked at coding it that way and it seemed worse than the way chosen. > I'm going to try to hack up an alternative approach and see how it works > out. If you add a new filter for TRUNCATE it will mean compatibility problems and the need for pg_dump support. Need note in release notes to explain that people will need to add TRUNCATE filter to their existing publications. I was hoping to have that picked up automatically, which can't happen if we have an explicit filter for it. >> I know this has been discussed in the thread already, but it really >> strikes me as wrong to basically do some mini DDL replication feature >> via per-command WAL records. Don't really understand that comment. > I think TRUNCATE is special in some ways and it's reasonable to treat it > specially. Real DDL is being worked on in the 2PC decoding thread. > What we are discussing here isn't going to be applicable there and vice > versa, I think. In fact, one of the reasons for this effort is that in > BDR TRUNCATE is currently handled like a DDL command, which doesn't work > well. Agreed -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [PATCH] Logical decoding of TRUNCATE
On 2 April 2018 at 19:30, Peter Eisentraut wrote: >>> *** >>> *** 111,116 CREATE PUBLICATION >> class="parameter">name >>> --- 111,121 >>> and so the default value for this option is >>> 'insert, update, delete'. >>> >>> + >>> +TRUNCATE is treated as a form of >>> +DELETE for the purpose of deciding whether >>> +to publish, or not. >>> + >>> >>> >>> >> >> Why is this a good idea? > > I think it seemed like a good idea at the time, so to speak, but several > people have argued against it, so we should probably change this in the > final version. Who has argued against it? I see that Andres has asked twice about it and been answered twice, but expressed no opinion. Petr has said he thinks that's right. Nobody else has commented. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] logical decoding of two-phase transactions
On 29 March 2018 at 23:30, Petr Jelinek wrote: > On 29/03/18 23:58, Andres Freund wrote: >> On 2018-03-29 23:52:18 +0200, Tomas Vondra wrote: >>>> I have added details about this in src/backend/storage/lmgr/README as >>>> suggested by you. >>>> >>> >>> Thanks. I think the README is a good start, but I think we also need to >>> improve the comments, which is usually more detailed than the README. >>> For example, it's not quite acceptable that LogicalLockTransaction and >>> LogicalUnlockTransaction have about no comments, especially when it's >>> meant to be public API for decoding plugins. >> >> FWIW, for me that's ground to not accept the feature. Burdening output >> plugins with this will make their development painful (because they'll >> have to adapt regularly) and correctness doubful (there's nothing >> checking for the lock being skipped). Another way needs to be found. >> > > I have to agree with Andres here. It's also visible in the latter > patches. The pgoutput patch forgets to call these new APIs completely. > The test_decoding calls them, but it does so even when it's processing > changes for committed transaction.. I think that should be avoided as it > means potentially doing SLRU lookup for every change. So doing it right > is indeed not easy. Yet you spotted these problems easily enough. Similar to finding missing LWlocks. > I as wondering how to hide this. Best idea I had so far would be to put > it in heap_beginscan (and index_beginscan given that catalog scans use > is as well) behind some condition. That would also improve performance > because locking would not need to happen for syscache hits. The problem > is however how to inform the heap_beginscan about the fact that we are > in 2PC decoding. We definitely don't want to change all the scan apis > for this. I wonder if we could add some kind of property to Snapshot > which would indicate this fact - logical decoding is using it's own > snapshots it could inject the information about being inside the 2PC > decoding. Perhaps, but how do we know we've covered all the right places? We don't know what every plugin will require, do we? The plugin needs to take responsibility for its own correctness, whether we make it easier or not. It seems clear that we would need a generalized API (the proposed locking approach) to cover all requirements. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] logical decoding of two-phase transactions
On 29 March 2018 at 23:24, Andres Freund wrote: >> I agree with the former, of course - docs are a must. I disagree with >> the latter, though - there have been about no proposals how to do it >> without the locking. If there are, I'd like to hear about it. > > I don't care. Either another solution needs to be found, or the locking > needs to be automatically performed when necessary. That seems unreasonable. It's certainly a nice future goal to have it all happen automatically, but we don't know what the plugin will do. How can we ever make an unknown task happen automatically? We can't. We have a reasonable approach here. Locking shared resources before using them is not a radical new approach, its just standard development. If we find a better way in the future, we can use that, but requiring a better solution when there isn't one is unreasonable. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [PATCH] Logical decoding of TRUNCATE
On 1 April 2018 at 21:01, Andres Freund wrote: >> *** >> *** 111,116 CREATE PUBLICATION > class="parameter">name >> --- 111,121 >> and so the default value for this option is >> 'insert, update, delete'. >> >> + >> +TRUNCATE is treated as a form of >> +DELETE for the purpose of deciding whether >> +to publish, or not. >> + >> >> >> > > Why is this a good idea? TRUNCATE removes rows, just as DELETE does, so anybody that wants to publish the removal of rows will be interested in this. This avoids unnecessary overcomplication of the existing interface. >> + * Write a WAL record to allow this set of actions to be logically >> decoded. >> + * We could optimize this away when !RelationIsLogicallyLogged(rel) >> + * but that doesn't save much space or time. > > What you're saying isn't that you're not logging anything, but that > you're allocating the header regardless? Because this certainly sounds > like you unconditionally log a WAL record. It says that, yes, my idea - as explained. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: hot_standby_feedback vs excludeVacuum and snapshots
On 31 March 2018 at 14:21, Amit Kapila wrote: > On Thu, Mar 29, 2018 at 4:47 PM, Greg Stark wrote: >> I'm poking around to see debug a vacuuming problem and wondering if >> I've found something more serious. >> >> As far as I can tell the snapshots on HOT standby are built using a >> list of running xids that the primary builds and puts in the WAL and >> that seems to include all xids from transactions running in all >> databases. The HOT standby would then build a snapshot and eventually >> send the xmin of that snapshot back to the primary in the hot standby >> feedback and that would block vacuuming tuples that might be visible >> to the standby. >> >> Many ages ago Alvaro sweated blood to ensure vacuums could run for >> long periods of time without holding back the xmin horizon and >> blocking other vacuums from cleaning up tuples. That's the purpose of >> the excludeVacuum flag in GetCurrentVirtualXIDs(). That's possible >> because we know vacuums won't insert any tuples that queries might try >> to view and also vacuums won't try to perform any sql queries on other >> tables. >> >> I can't find anywhere that the standby snapshot building mechanism >> gets this same information about which xids are actually vacuums that >> can be ignored when building a snapshot. >> > > I think the vacuum assigns xids only if it needs to truncate some of > the pages in the relation which happens towards the end of vacuum. > So, it shouldn't hold back the xmin horizon for long. Yes, that's the reason. I recall VACUUMs giving lots of problems during development of Hot Standby. VACUUM FULL was the thing that needed to be excluded in the past because it needed an xid to move rows. Greg's concern is a good one and his noticing that we hadn't specifically excluded VACUUMs is valid, so we should exclude them. Well spotted, Greg. So although this doesn't have the dramatic effect it might have had, there is still the possibility of some effect and I think we should treat it as a bug. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services ignore_lazy_vacuums_in_RunningTransactionData.v1.patch Description: Binary data
Re: [HACKERS] MERGE SQL Statement for PG11
On 29 March 2018 at 10:50, Simon Riggs wrote: > On 28 March 2018 at 12:00, Pavan Deolasee wrote: > >> v27 attached, though review changes are in >> the add-on 0005 patch. > > This all looks good now, thanks for making all of those changes. > > I propose [v27 patch1+patch3+patch5] as the initial commit candidate > for MERGE, with other patches following later before end CF. > > I propose to commit this tomorrow, 30 March, about 26 hours from now. > That will allow some time for buildfarm fixing/reversion before the > Easter weekend, then other patches to follow starting 2 April. That > then gives reasonable time to follow up on other issues that we will > no doubt discover fairly soon after commit, such as additional runs by > SQLsmith and more eyeballs. No problems found, but moving proposed commit to 2 April pm -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Changing WAL Header to reduce contention during ReserveXLogInsertLocation()
On 29 March 2018 at 23:16, Tomas Vondra wrote: > > > On 03/29/2018 11:18 PM, Tom Lane wrote: >> Tomas Vondra writes: >>> If each WAL record has xl_curr, then we know to which position the >>> record belongs (after verifying the checksum). And we do know the size >>> of each WAL record, so we should be able to deduce if two records are >>> immediately after each other. >> >> Per my point earlier, XLOG_SWITCH is sufficient to defeat that argument. > > But the SWITCH record will be the last record in the WAL segment (and if > there happens to be a WAL record after it, it will have invalid xl_curr > pointer). And the next valid record will be the first one in the next > WAL segment. So why wouldn't that be enough information? > >> Also consider a timeline fork. It's really hard to be sure which record >> in the old timeline is the direct ancestor of the first one in the new >> if you lack xl_prev: >> >> A1 -> B1 -> C1 -> D1 >> \ >> B2 -> C2 -> D2 >> >> If you happened to get confused and think that C2 is the first in its >> timeline, diverging off the old line after B1 not A1, there would be >> nothing about C2 to disabuse you of your error. > > Doesn't that mean the B1 and B2 have to be exactly the same size? > Otherwise there would be a gap between B1/C2 or B2/C1, and the xl_curr > would be enough to detect this. > > And how could xl_prev detect it? AFAIK XLogRecPtr does not include > TimeLineID, so xl_prev would be the same for both B1 and B2. > > I admit WAL internals are not an are I'm particularly familiar with, > though, so I may be missing something utterly obvious. Timeline history files know the LSN at which they fork. I can imagine losing the timeline branch point metadata because of corruption or other disaster, but getting the LSN slightly wrong sounds strange. If it was slightly wrong and yet still a valid LSN, xl_prev provides no protection about that. The timeline branch point is already marked by specific records in WAL, so if you forgot the LSN completely you would search for those and then you'd know. xl_prev wouldn't assist you in that search. Agree with points about XLOG_SWITCH, adding again the observation that they are no longer used for replication and if they are used as well as replication, have a bad effect on performance. I think it would be easily possible to add some more detail to the WAL stream if needed. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Changing WAL Header to reduce contention during ReserveXLogInsertLocation()
On 29 March 2018 at 21:03, Tomas Vondra wrote: > > On 03/29/2018 08:02 PM, Robert Haas wrote: >> On Thu, Mar 29, 2018 at 1:13 PM, Tomas Vondra >> wrote: >>> On 03/29/2018 06:42 PM, Tom Lane wrote: >>>> The long and the short of it is that this is a very dangerous-looking >>>> proposal, we are at the tail end of a development cycle, and there are >>>> ~100 other patches remaining in the commitfest that also have claims >>>> on our attention in the short time that's left. If you're expecting >>>> people to spend more time thinking about this now, I feel you're being >>>> unreasonable. >>> >>> I agree. >> >> +1. >> >>>> Also, I will say it once more: this change DOES decrease robustness. >>>> It's like blockchain without the chain aspect, or git commits without >>>> a parent pointer. We are not only interested in whether individual >>>> WAL records are valid, but whether they form a consistent series. >>>> Cross-checking xl_prev provides some measure of confidence about that; >>>> xl_curr offers none. >>> >>> Not sure. >>> >>> If each WAL record has xl_curr, then we know to which position the >>> record belongs (after verifying the checksum). And we do know the size >>> of each WAL record, so we should be able to deduce if two records are >>> immediately after each other. Which I think is enough to rebuild the >>> chain of WAL records. >>> >>> To defeat this, this would need to happen: >>> >>> a) the WAL record gets written to a different location >>> b) the xl_curr gets corrupted in sync with (a) >>> c) the WAL checksum gets corrupted in sync with (b) >>> d) the record overwrites existing record (same size/boundaries) >>> >>> That seems very much like xl_prev. >> >> I don't think so, because this ignores, for example, timeline >> switches, or multiple clusters accidentally sharing an archive >> directory. >> > > I'm curious? How does xl_prev prevents either of those issues (in a way > that xl_curr would not)? > >> TBH, I'm not entirely sure how likely corruption would be under this >> proposal in practice, but I think Tom's got the right idea on a >> theoretical level. As he says, there's a reason why things like >> block chains and git commits include something about the previous >> record in what gets hashed to create the identifier for the new >> record. It ties those things together in a way that doesn't happen >> otherwise. If somebody proposed changing git so that the SHA >> identifier for a commit didn't hash the commit ID for the previous >> commit, that would break the security of the system in all kinds of >> ways: for example, an adversary could maliciously edit an old commit >> by just changing its SHA identifier and that of its immediate >> descendent. No other commit IDs would change and integrity checks >> would not fail -- there would be no easy way to notice that something >> bad had happened. >> > > That seems like a rather poor analogy, because we're not protected > against such adversarial modifications with xl_prev either. I can damage > the previous WAL record, update its checksum and you won't notice anything. > > My understanding is that xl_curr and xl_prev give us pretty much the > same amount of information, except that xl_prev provides it explicitly > while with xl_curr it's implicit and we need to deduce it. > >> Now, in our case, the chances of problems are clearly a lot more >> remote because of the way that WAL records are packed into files. >> Every WAL record has a place where it's supposed to appear in the WAL >> stream, and positions in the WAL stream are never intentionally >> recycled (except in PITR scenarios, I guess). Because of that, the >> proposed xl_curr check provides a pretty strong cross-check on the >> validity of a record even without xl_prev. I think there is an >> argument to be made - as Simon is doing - that this check is in fact >> strong enough that it's good enough for government work. He might be > > Is he? I think the claims in this thread were pretty much that xl_curr > and xl_prev provide the same level of protection. Yes, the blockchain analogy breaks down because we don't include previous CRC, only xl_prev. The existing WAL format is easy enough to repair, falsify etc. You've still got to keep your WAL data secure. So the image of a CSI-style chain-of-evidence thing happening isn't technical reality. You could change the existing format to include the previous CRC in the calculation of the current record's CRC. I'd be happy with that as an option, just as much as I'd support an option for high performance as I've proposed here. But bear in mind that including the CRC would mean that you couldn't repair data damaged by data corruption because it would mean all WAL after the corrupt page would be effectively scrambled. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Changing WAL Header to reduce contention during ReserveXLogInsertLocation()
On 29 March 2018 at 19:02, Robert Haas wrote: > >>> Also, I will say it once more: this change DOES decrease robustness. >>> It's like blockchain without the chain aspect, or git commits without >>> a parent pointer. We are not only interested in whether individual >>> WAL records are valid, but whether they form a consistent series. >>> Cross-checking xl_prev provides some measure of confidence about that; >>> xl_curr offers none. >> >> Not sure. >> >> If each WAL record has xl_curr, then we know to which position the >> record belongs (after verifying the checksum). And we do know the size >> of each WAL record, so we should be able to deduce if two records are >> immediately after each other. Which I think is enough to rebuild the >> chain of WAL records. >> >> To defeat this, this would need to happen: >> >> a) the WAL record gets written to a different location >> b) the xl_curr gets corrupted in sync with (a) >> c) the WAL checksum gets corrupted in sync with (b) >> d) the record overwrites existing record (same size/boundaries) >> >> That seems very much like xl_prev. > > I don't think so, because this ignores, for example, timeline > switches, or multiple clusters accidentally sharing an archive > directory. I'm not hearing any actual technical problems. > Given where we are > in the release cycle, I'd prefer to defer that question to next cycle. Accepted, on the basis of risk. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Changing WAL Header to reduce contention during ReserveXLogInsertLocation()
On 29 March 2018 at 18:13, Tomas Vondra wrote: > On 03/29/2018 06:42 PM, Tom Lane wrote: >> Simon Riggs writes: >>> I know the approach is new and surprising but I thought about it a lot >>> before proposing it and I couldn't see any holes; still can't. Please >>> give this some thought so we can get comfortable with this idea and >>> increase performance as a result. Thanks. >> >> The long and the short of it is that this is a very dangerous-looking >> proposal, we are at the tail end of a development cycle, and there are >> ~100 other patches remaining in the commitfest that also have claims >> on our attention in the short time that's left. If you're expecting >> people to spend more time thinking about this now, I feel you're being >> unreasonable. >> > > I agree. We require consensus on such things for good reason. Risk is an issue and everyone must be comfortable. If its a good idea now it will still be a good idea in the next cycle. >> Also, I will say it once more: this change DOES decrease robustness. >> It's like blockchain without the chain aspect, or git commits without >> a parent pointer. We are not only interested in whether individual >> WAL records are valid, but whether they form a consistent series. >> Cross-checking xl_prev provides some measure of confidence about that; >> xl_curr offers none. >> > > Not sure. > > If each WAL record has xl_curr, then we know to which position the > record belongs (after verifying the checksum). And we do know the size > of each WAL record, so we should be able to deduce if two records are > immediately after each other. Which I think is enough to rebuild the > chain of WAL records. > > To defeat this, this would need to happen: > > a) the WAL record gets written to a different location > b) the xl_curr gets corrupted in sync with (a) > c) the WAL checksum gets corrupted in sync with (b) > d) the record overwrites existing record (same size/boundaries) > > That seems very much like xl_prev. That's how I see it. But I also see that it feels different to what we had before and that takes time to either come up with a concrete, quantifiable reason not to, or accept that it is OK. So given my observation that no technical objection remains, I'm happy to move this to next CF to give us more time and less risk. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Changing WAL Header to reduce contention during ReserveXLogInsertLocation()
On 29 March 2018 at 01:50, Robert Haas wrote: > On Tue, Mar 27, 2018 at 11:53 PM, Pavan Deolasee > wrote: >> Yeah, we should not do that. The patch surely does not intend to replay any >> more WAL than what we do today. The idea is to just use a different >> mechanism to find the prior checkpoint. But we should surely find the latest >> prior checkpoint. In the rare scenario that Tom showed, we should just throw >> an error and fix the patch if it's not doing that already. > > It's not clear to me that there is any reasonable fix short of giving > up on this approach altogether. But I might be missing something. (Sorry to come back to this late) Yes, a few things have been been missed in this discussion. Michael's point is invalid. The earlier discussion said that *replay* of WAL earlier than the last checkpoint could lead to corruption, but this doesn't mean the records themselves are in some way damaged structurally, only that their content can lead to inconsistency in the database. The xl_curr proposal does not replay WAL records it only searches thru them to locate the last checkpoint. Tom also said "Well, the point of checkpoints is that WAL data before the last one should no longer matter anymore, isn't it?". WAL Files prior to the one containing the last checkpoint can be discarded, but checkpoints themselves do nothing to invalidate the WAL records before the checkpoint. Tom's point that a corruption in the WAL file could prevent pg_rewind from locating a record is valid, but misses something. If the WAL file contained corruption AFTER the last checkpoint this would already in the current code prevent pg_rewind from moving backwards to find the last checkpoint. So whether you go forwards or backwards it is possible to be blocked by corrupt WAL records. Most checkpoints are far enough apart that there is only one checkpoint record in any WAL file and it could be anywhere in the file, so the expected number of records to be searched by forwards or backwards scans is equal. And the probability that corruption occurs before or after the record is also equal. Given the above points, I don't see any reason to believe that the forwards search mechanism proposed is faulty, decreases robustness or is even slower. Overall, the loss of links between WAL records is not a loss of robustness. XLOG_SWITCH is a costly operation and mostly unusable these days, so shouldn't be a reason to shy away from this patch. We don't jump around in WAL in any other way, so sequential records are just fine. So IMHO the proposed approach is still very much viable. I know the approach is new and surprising but I thought about it a lot before proposing it and I couldn't see any holes; still can't. Please give this some thought so we can get comfortable with this idea and increase performance as a result. Thanks. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] MERGE SQL Statement for PG11
On 28 March 2018 at 12:00, Pavan Deolasee wrote: > v27 attached, though review changes are in > the add-on 0005 patch. This all looks good now, thanks for making all of those changes. I propose [v27 patch1+patch3+patch5] as the initial commit candidate for MERGE, with other patches following later before end CF. I propose to commit this tomorrow, 30 March, about 26 hours from now. That will allow some time for buildfarm fixing/reversion before the Easter weekend, then other patches to follow starting 2 April. That then gives reasonable time to follow up on other issues that we will no doubt discover fairly soon after commit, such as additional runs by SQLsmith and more eyeballs. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] A design for amcheck heapam verification
On 29 March 2018 at 01:49, Peter Geoghegan wrote: >>> IndexBuildHeapRangeScan() doesn't mention anything about CIC's heap >>> ShareUpdateExclusiveLock (it just says SharedLock), because that lock >>> strength doesn't have anything to do with IndexBuildHeapRangeScan() >>> when it operates with an MVCC snapshot. I think that this means that >>> this patch doesn't need to update comments within >>> IndexBuildHeapRangeScan(). Perhaps that's a good idea, but it seems >>> independent. >> >> >> Ok, I agree. But note that we are now invoking that code with >> AccessShareLock() whereas the existing callers either use ShareLock or >> ShareUpdateExclusiveLock. That's probably does not matter, but it's a change >> worth noting. > > Fair point, even though the ShareUpdateExclusiveLock case isn't > actually acknowledged by IndexBuildHeapRangeScan(). Can we leave this > one up to the committer, too? I find it very hard to argue either for > or against this, and I want to avoid "analysis paralysis" at this > important time. The above discussion doesn't make sense to me, hope someone will explain. I understand we are adding a check to verify heap against index and this will take longer than before. When it runs does it report progress of the run via pg_stat_activity, so we can monitor how long it will take? Locking is also an important concern. If we need a ShareLock to run the extended check and the check runs for a long time, when would we decide to run that? This sounds like it will cause a production outage, so what are the pre-conditions that would lead us to say "we'd better run this". For example, if running this is known to be signficantly faster than running CREATE INDEX, that might be an argument for someone to run this first if index corruption is suspected. If it detects an issue, can it fix the issue for the index by injecting correct entries? If not then we will have to run CREATE INDEX afterwards anyway, which makes it more likely that people would just run CREATE INDEX and not bother with the check. So my initial questions are about when we would run this and making sure that is documented. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Commit fest 2017-11
On 29 March 2018 at 09:19, Magnus Hagander wrote: > > > On Thu, Mar 29, 2018 at 10:06 AM, Alexander Korotkov > wrote: >> >> Hi, Fabien! >> >> On Fri, Dec 1, 2017 at 9:10 AM, Fabien COELHO wrote: >>>> >>>> And the last 21 patches have been classified as well. Here is the >>>> final score for this time: >>>> Committed: 55. >>>> Moved to next CF: 103. >>>> Rejected: 1. >>>> Returned with Feedback: 47. >>>> Total: 206. >>>> >>>> Thanks to all the contributors for this session! The CF is now closed. >>> >>> >>> Thanks for the CF management. >>> >>> Attached a small graph of the end status of patch at the end of each CF. >> >> >> Thank you for the graph! >> It would be interesting to see statistics not by patches count, but by >> their complexity. >> For rough measure of complexity we can use number of affected lines. I >> expect that >> statistics would be even more distressing: small patches can be committed >> faster, >> while large patches are traversing from one CF to another during long >> time. Interesting >> to check whether it's true... >> > > I think that's very hard to do given that we simply don't have the data > today. It's not that simple to analyze the patches in the archives, because > some are single file, some are spread across multiple etc. I fear that > anything trying to work off that would actually make the stats even more > inaccurate. This is the pattern I've seen whenever I've treid tha tbefore. > > I wonder if we should consider adding a field to the CF app *specifically* > to track things like this. What I'm thinking is a field that's set (or at > least verified) by the person who flags a patch as committed with choices > like Trivial/Simple/Medium/Complex (trivial being things like typo fixes > etc, which today can hugely skew the stats). > > Would people who update the CF app be willing to put in that effort (however > small it is, it has to be done consistently to be of any value) in order to > be able to track such statistics? > > It would only help for the future of course, unless somebody wants to go > back and backfill existing patches with such information (which they might > be). The focus of this is on the Committers, which seems wrong. I suggest someone does another analysis that shows how many patch reviews have been conducted by patch authors, so we can highlight people who are causing the problem yet not helping solve the problem. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] MERGE SQL Statement for PG11
On 29 March 2018 at 07:37, Pavan Deolasee wrote: > > > On Tue, Mar 27, 2018 at 5:00 PM, Simon Riggs wrote: >> >> >> In terms of further performance optimization, if there is just one >> WHEN AND condition and no unconditional WHEN clauses then we can add >> the WHEN AND easily to the join query. >> >> That seems like an easy thing to do for PG11 >> > > I think we need to be careful in terms of what can be pushed down to the > join, in presence of WHEN NOT MATCHED actions. If we push the WHEN AND qual > to the join then I am worried that some rows which should have been reported > "matched" and later filtered out as part of the WHEN quals, will get > reported as "not-matched", thus triggering WHEN NOT MATCHED action. > postgres=# EXPLAIN ANALYZE MERGE INTO target t USING source s ON t.a = s.a > WHEN MATCHED AND t.a < 2 THEN UPDATE SET b = s.b WHEN NOT MATCHED THEN > INSERT VALUES (s.a, -1); That has an unconditional WHEN clause, so would block the push down using my stated rule above. With something like this MERGE INTO target t USING source s ON t.a = s.a WHEN MATCHED AND t.a < 2 THEN UPDATE SET b = s.b; or this MERGE INTO target t USING source s ON t.a = s.a WHEN MATCHED AND t.a < 2 THEN UPDATE SET b = s.b WHEN NOT MATCHED DO NOTHING; or this MERGE INTO target t USING source s ON t.a = s.a WHEN MATCHED AND t.a < 2 THEN UPDATE SET b = s.b WHEN MATCHED DO NOTHING WHEN NOT MATCHED DO NOTHING; then we can push down "t.a < 2" into the WHERE clause of the join query. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] logical decoding of two-phase transactions
On 28 March 2018 at 16:28, Nikhil Sontakke wrote: > Simon, 0003-Add-GID-and-replica-origin-to-two-phase-commit-abort.patch > is the exact patch that you had posted for an earlier commit. 0003 Pushed -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] Surjective functional indexes
On 23 March 2018 at 15:54, Simon Riggs wrote: > So please could you make the change? Committed, but I still think that change would be good. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Changing WAL Header to reduce contention during ReserveXLogInsertLocation()
On 27 March 2018 at 14:58, Tom Lane wrote: > The point of the xl_prev links is, essentially, to allow verification > that we are reading a coherent series of WAL records, ie that record B > which appears to follow record A actually is supposed to follow it. > This throws away that cross-check just as effectively as the other patch > did, leaving us reliant solely on the per-record CRCs. CRCs are good, > but they do have a false-positive rate. You aren't reliant solely on the CRC. If you have a false-positive CRC then xl_curr allows you to validate the record, just as you would with xl_prev. The xl_curr value points to itself using just as many bits as the xl_prev field, so the probability of false validation is the same as now. xl_prev does allow you to verify that the WAL records form an unbroken chain. The question is whether that has value because it certainly has a very clear cost in terms of write performance. Is there some danger that we would skip records? Where does that risk come from? If someone *wanted* to skip records for whatever reason, they could achieve that with both xl_prev or xl_curr, so there is no difference there. > An example of a case where xl_prev makes one feel a lot more comfortable > is the WAL-segment-switch option. The fact that the first record of the > next WAL file links back to the XLOG_SWITCH record is evidence that > ignoring multiple megabytes of WAL was indeed the intended thing to do. > An xl_curr field cannot provide any evidence as to whether WAL records > were missed. That's a good point, though very frequently unused since streaming replication. We can handle XLOG_SWITCH as a special case. Perhaps we can make each file header point back to the previous WAL record. >> 2. Does the new logic in pg_rewind to search backward for a checkpoint >> work reliably, and will it be slow? > > If you have to search backwards, this breaks it. Full stop. You don't have to search backwards. We only need to locate the last checkpoint record. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] MERGE SQL Statement for PG11
On 27 March 2018 at 11:46, Simon Riggs wrote: > On 27 March 2018 at 10:31, Pavan Deolasee wrote: > >> Fixed in v26. > > More comments on v26 In terms of further performance optimization, if there is just one WHEN AND condition and no unconditional WHEN clauses then we can add the WHEN AND easily to the join query. That seems like an easy thing to do for PG11 -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] MERGE SQL Statement for PG11
On 27 March 2018 at 10:31, Pavan Deolasee wrote: > Fixed in v26. More comments on v26 * Change errmsg “Ensure that not more than one source rows match any one target row” should be “Ensure that not more than one source row matches any one target row” * I think we need an example in the docs showing a constant source query, so people can understand how to use MERGE for OLTP as well as large ELT * Long comment in ExecMerge() needs rewording, formatting and spell check I suggest not referring to an "order" since that concept doesn't exist anywhere else * Need tests for coverage of these ERROR messages Named security policy violation SELECT not allowed in MERGE INSERT... Multiple VALUES clauses not... MERGE is not supported for this... MERGE is not supported for relations with inheritance MERGE is not supported for relations with rules -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] MERGE SQL Statement for PG11
On 27 March 2018 at 10:28, Pavan Deolasee wrote: >> 1. In ExecMergeMatched() we have a line of code that does this... >> >> if (TransactionIdIsCurrentTransactionId(hufd.xmax)) >> then errcode(ERRCODE_CARDINALITY_VIOLATION) >> >> I notice this is correct, but too strong. It should be possible to run >> a sequence of MERGE commands inside a transaction, repeatedly updating >> the same set of rows, as is possible with UPDATE. >> >> We need to check whether the xid is the current subxid and the cid is >> the current commandid, rather than using >> TransactionIdIsCurrentTransactionId() > > AFAICS this is fine because we invoke that code only when > HeapTupleSatisfiesUpdate returns HeapTupleSelfUpdated i.e. for the case when > the tuple is updated by our transaction after the scan is started. > HeapTupleSatisfiesUpdate already checks for command id before returning > HeapTupleSelfUpdated. Cool. >> 5. I take it the special code for TransformMerge target relations is >> replaced by "right_rte"? Seems fragile to leave it like that. Can we >> add an Assert()? Do we care? > > I didn't get this point. Can you please explain? The code confused me at that point. More docs pls. >> 6. I didn't understand "Assume that the top-level join RTE is at the >> end. The source relation >> + * is just before that." >> What is there is no source relation? > > > Can that happen? I mean shouldn't there always be a source relation? It > could be a subquery or a function scan or just a plain relation, but > something, right? Yes, OK. So ordering is target, source, join. >> 10. Comment needs updating for changes in code below it - "In MERGE >> when and condition, no system column is allowed" >> > > Yeah, that's kinda half-true since the code below supports TABLEOID and OID > system columns. I am thinking about this in a larger context though. Peter > has expressed desire to support system columns in WHEN targetlist and quals. > I gave it a try and it seems if we remove that error block, all system > columns are supported readily. But only from the target side. There is a > problem if we try to refer a system column from the source side since the > mergrSourceTargetList only includes user columns and so set_plan_refs() > complains about a system column. > > I am not sure what's the best way to handle this. May be we can add system > columns to the mergrSourceTargetList. I haven't yet found a neat way to do > that. I was saying the comment needs changing, not the code. Cool, thanks -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] logical decoding of two-phase transactions
On 23 March 2018 at 15:26, Simon Riggs wrote: > Reviewing 0003-Add-support-for-logging-GID-in-commit-abort-WAL-reco > > Looks fine, reworked patch attached > * added changes to xact.h from patch 4 so that this is a whole, > committable patch > * added comments to make abort and commit structs look same > > Attached patch is proposed for a separate, early commit as part of this Looking to commit "logging GID" patch today, if no further objections. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] MERGE SQL Statement for PG11
On 26 March 2018 at 17:06, Simon Riggs wrote: > On 26 March 2018 at 15:39, Pavan Deolasee wrote: > > That's all I can see so far. * change comment “once to” to “once” in src/include/nodes/execnodes.h * change comment “and to run” to “and once to run” * change “result relation” to “target relation” * XXX we probably need to check plan output for CMD_MERGE also * Spurious line feed in src/backend/optimizer/prep/preptlist.c * No need to remove whitespace in src/backend/optimizer/util/relnode.c * README should note that the TABLEOID junk column is not strictly needed when joining to a non-partitioned table but we don't try to optimize that away. Is that an XXX to fix in future or do we just think the extra 4 bytes won't make much difference so we leave it? * Comment in rewriteTargetListMerge() should mention TABLEOID exists to allow us to find the correct relation, not the correct row, comment just copied from CTID above it. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] MERGE SQL Statement for PG11
On 26 March 2018 at 23:10, Peter Geoghegan wrote: > On Mon, Mar 26, 2018 at 12:17 PM, Simon Riggs wrote: >>> As far as I >>> know, the proposed MERGE patch has that issue an existing DML commands >>> don't; but someone else may have better information. >> >> I will look deeper and report back. > > It's quite clear that the problem exists with the MERGE patch Accepted, the only question is whether it affects UPDATE as well cos it looks like it should. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] MERGE SQL Statement for PG11
On 26 March 2018 at 17:52, Robert Haas wrote: > On Mon, Mar 26, 2018 at 12:16 PM, Simon Riggs wrote: >> On 26 March 2018 at 16:09, Robert Haas wrote: >>> On Mon, Mar 26, 2018 at 5:53 AM, Simon Riggs wrote: >>>> Since we now have MVCC catalog scans, all the name lookups are >>>> performed using the same snapshot so in the above scenario the newly >>>> created object would be invisible to the second name lookup. >>> >>> That's not true, because each lookup would be performed using a new >>> snapshot -- not all under one snapshot. >> >> You're saying we take a separate snapshot for each table we lookup? >> Sounds weird to me. > > I'm saying we take a separate snapshot for each and every catalog > lookup, except when we know that no catalog changes can have occurred. > See the commit message for 568d4138c646cd7cd8a837ac244ef2caf27c6bb8. > If you do a lookup in pg_class and 3 lookups in pg_attribute each of > the 4 can be done under a different snapshot, in the worst case. > You're not the first person to believe that the MVCC catalog scan > patch fixes that problem, but as the guy who wrote it, it definitely > doesn't. What that patch fixed was, prior to that patch, a catalog > scan might find the WRONG NUMBER OF ROWS, like you might do a lookup > against a unique index for an object that existed and, if the row was > concurrently updated, you might find 0 rows or 2 rows instead of 1 > row. IOW, it guaranteed that we used a consistent snapshot for each > individual lookup, not a consistent snapshot for the whole course of a > command. That all makes sense, thanks for explaining. I spent a few more minutes, going "but", "but" though I can now see good reasons for everything to work this way. >> So this error could happen in SELECT, UPDATE, DELETE or INSERT as well. >> >> Or you see this as something related specifically to MERGE, if so how? >> Please explain what you see. > > As I said before, the problem occurs if the same command looks up the > same table name in more than one place. There is absolutely nothing > to guarantee that we get the same answer every time. > As far as I > know, the proposed MERGE patch has that issue an existing DML commands > don't; but someone else may have better information. I will look deeper and report back. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] MERGE SQL Statement for PG11
On 26 March 2018 at 16:09, Robert Haas wrote: > On Mon, Mar 26, 2018 at 5:53 AM, Simon Riggs wrote: >> Since we now have MVCC catalog scans, all the name lookups are >> performed using the same snapshot so in the above scenario the newly >> created object would be invisible to the second name lookup. > > That's not true, because each lookup would be performed using a new > snapshot -- not all under one snapshot. You're saying we take a separate snapshot for each table we lookup? Sounds weird to me. So this error could happen in SELECT, UPDATE, DELETE or INSERT as well. Or you see this as something related specifically to MERGE, if so how? Please explain what you see. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] MERGE SQL Statement for PG11
On 26 March 2018 at 15:39, Pavan Deolasee wrote: > reviewer 1. In ExecMergeMatched() we have a line of code that does this... if (TransactionIdIsCurrentTransactionId(hufd.xmax)) then errcode(ERRCODE_CARDINALITY_VIOLATION) I notice this is correct, but too strong. It should be possible to run a sequence of MERGE commands inside a transaction, repeatedly updating the same set of rows, as is possible with UPDATE. We need to check whether the xid is the current subxid and the cid is the current commandid, rather than using TransactionIdIsCurrentTransactionId() On further analysis, I note that ON CONFLICT suffers from this problem as well, looks like I just refactored it from there. 2. EXPLAIN ANALYZE looks unchanged from some time back. The math is only correct as long as there are zero rows that do not cause an INS/UPD/DEL. We don't test for that. I think this is a regression from an earlier report of the same bug, or perhaps I didn't fix it at the time. 3. sp. depedning trigger.sgml 4. trigger.sgml replace "specific actions specified" with "events specified in actions" to avoid the double use of "specific" 5. I take it the special code for TransformMerge target relations is replaced by "right_rte"? Seems fragile to leave it like that. Can we add an Assert()? Do we care? 6. I didn't understand "Assume that the top-level join RTE is at the end. The source relation + * is just before that." What is there is no source relation? 7. The formatting of the SQL statement in transformMergeStmt that begins "Construct a query of the form" is borked, so the SQL layout is unclear, just needs pretty print 8. I didn't document why I thought this was true "XXX if we have a constant subquery, we can also skip join", but one of the explain analyze outputs shows this is already true - where we provide a constant query and it skips the join. So perhaps we can remove the comment. (Search for "Seq Scan on target t_1") 9. I think we need to mention that MERGE won't work with rules or inheritance (only partitioning) in the main doc page. The current text says that rules are ignored, which would be true if we didn't specifically throw ERROR feature not supported. 10. Comment needs updating for changes in code below it - "In MERGE when and condition, no system column is allowed" 11. In comment "Since the plan re-evaluated by EvalPlanQual uses the second RTE", suggest using "join RTE" to make it more explicit which RTE we are discussing 12. Missed out merge.sgml from v25 patch. 13. For triggers we say "No separate triggers are defined for MERGE" we should also state the same caveat for POLICY events That's all I can see so far. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] MERGE SQL Statement for PG11
On 26 March 2018 at 15:39, Pavan Deolasee wrote: > Now that ON CONFLICT patch is in, here are rebased patches. The second patch > is to add support for CTE (thanks Peter). > > Apart from rebase, the following things are fixed/improved: > > - Added test cases for column level privileges as suggested by Peter. One > problem got discovered during the process. Since we expand and track source > relation targetlist, the exiting code was demanding SELECT privileges on all > attributes, even though MERGE is only referencing a few attributes on which > the user has privilege. Fixed that by disassociating expansion from the > actual referencing. Good catch Peter. > - Added a test case for RLS where SELECT policy actually hides some rows, as > suggested by Stephen in the past > > - Added check to compare result relation's and merge target relation's OIDs, > as suggested by Robert. Simon thinks it's not necessary given that we now > scan catalog using MVCC snapshot. So will leave it to his discretion when he > takes it up for commit No problem with adding the test, its quick to compare two Oids. > - Improved explanation regarding why we need a second RTE for merge target > relation and general cleanup/improvements in that area > I think it will be a good idea to send any further patches as add-on patches > for reviewer/committer's sake. I will do that unless someone disagrees. +1 So v25 is the "commit candidate" and we can add other patches to it. Given recent bugfix/changes I don't plan to commit this tomorrow anymore. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] MERGE SQL Statement for PG11
On 24 March 2018 at 12:27, Robert Haas wrote: > On Sat, Mar 24, 2018 at 8:16 AM, Robert Haas wrote: >> On Thu, Mar 22, 2018 at 7:13 PM, Peter Geoghegan wrote: >>> While I think this this particular HINT buglet is pretty harmless, I >>> continue to be concerned about the unintended consequences of having >>> multiple RTEs for MERGE's target table. Each RTE comes from a >>> different lookup path -- the first one goes through setTargetTable()'s >>> parserOpenTable() + addRangeTableEntryForRelation() calls. The second >>> one goes through transformFromClauseItem(), for the join, which >>> ultimately ends up calling transformTableEntry()/addRangeTableEntry(). >>> Consider commit 5f173040, which fixed a privilege escalation security >>> bug around multiple name lookup. Could the approach taken by MERGE >>> here introduce a similar security issue? >> >> Yeah, that seems really bad. I don't think there is a huge problem >> with having multiple RTEs; for example, we very commonly end up with >> both rte->inh and !rte->inh RTEs for the same table, and that is OK. >> However, generating those RTEs by doing multiple name lookups for the >> same table is a big problem. Imagine, for example, that a user has a >> search_path of a, b and that there is a table b.foo. The user does a >> merge on foo. Between the first name lookup and the second, somebody >> creates a.foo. Now, potentially, half of the MERGE statement is going >> to be running against b.foo and the other half against a.foo. I don't >> know whether that will crash or bomb out with a strange error or just >> make some unexpected modification to one of those tables, but the >> behavior, even if not insecure, will certainly be wrong. > > If it's possible to identify the two OIDs that are supposed to match > and cross-check that the OIDs are the same, then we could just bomb > out with an error if they aren't. Since we now have MVCC catalog scans, all the name lookups are performed using the same snapshot so in the above scenario the newly created object would be invisible to the second name lookup. So I don't see anyway for the ERROR to occur and hence no need for a cross check, for UPDATE or MERGE. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Changing WAL Header to reduce contention during ReserveXLogInsertLocation()
On 24 March 2018 at 12:19, Robert Haas wrote: > On Sat, Mar 24, 2018 at 8:11 AM, Simon Riggs wrote: >> On 24 March 2018 at 11:58, Robert Haas wrote: >>> On Sat, Mar 24, 2018 at 7:42 AM, Simon Riggs wrote: >>>> I suggest we focus on the engineering. I've not discussed this patch >>>> with Pavan offline. >>> >>> Well then proposing to commit moments after it's been posted is >>> ridiculous. That's exactly the opposite of "focusing on the >>> engineering". >>> >>> This is a patch about which multiple people have expressed concerns. >>> You're trying to jam a heavily redesigned version in at the last >>> minute without adequate review. Please don't do that. >> >> All presumption on your part. Don't tell me what I think and then diss >> me for that. > > It is of course true that I can't know your internal mental state with > certainty, but I don't think it is presumptuous to suppose that a > person who says they're going to commit a patch that hasn't been > reviewed intends to commit a patch that hasn't been reviewed. It seems you really are more intent on talking about my actions than on discussing the patch. Probably would have been better to read what I've actually said beforehand. If you have anything further to say, make it a formal complaint offlist so we can discuss engineering here. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] MERGE SQL Statement for PG11
On 24 March 2018 at 12:16, Robert Haas wrote: > On Thu, Mar 22, 2018 at 7:13 PM, Peter Geoghegan wrote: >> While I think this this particular HINT buglet is pretty harmless, I >> continue to be concerned about the unintended consequences of having >> multiple RTEs for MERGE's target table. Each RTE comes from a >> different lookup path -- the first one goes through setTargetTable()'s >> parserOpenTable() + addRangeTableEntryForRelation() calls. The second >> one goes through transformFromClauseItem(), for the join, which >> ultimately ends up calling transformTableEntry()/addRangeTableEntry(). >> Consider commit 5f173040, which fixed a privilege escalation security >> bug around multiple name lookup. Could the approach taken by MERGE >> here introduce a similar security issue? > > Yeah, that seems really bad. I don't think there is a huge problem > with having multiple RTEs; for example, we very commonly end up with > both rte->inh and !rte->inh RTEs for the same table, and that is OK. > However, generating those RTEs by doing multiple name lookups for the > same table is a big problem. Imagine, for example, that a user has a > search_path of a, b and that there is a table b.foo. The user does a > merge on foo. Between the first name lookup and the second, somebody > creates a.foo. Now, potentially, half of the MERGE statement is going > to be running against b.foo and the other half against a.foo. I don't > know whether that will crash or bomb out with a strange error or just > make some unexpected modification to one of those tables, but the > behavior, even if not insecure, will certainly be wrong. MERGE uses multiple RTEs in exactly the same way UPDATE does. I don't see a reason for specific concern wrt to MERGE. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Changing WAL Header to reduce contention during ReserveXLogInsertLocation()
On 24 March 2018 at 11:58, Robert Haas wrote: > On Sat, Mar 24, 2018 at 7:42 AM, Simon Riggs wrote: >> I suggest we focus on the engineering. I've not discussed this patch >> with Pavan offline. > > Well then proposing to commit moments after it's been posted is > ridiculous. That's exactly the opposite of "focusing on the > engineering". > > This is a patch about which multiple people have expressed concerns. > You're trying to jam a heavily redesigned version in at the last > minute without adequate review. Please don't do that. All presumption on your part. Don't tell me what I think and then diss me for that. If you have engineering comments, please make them. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Changing WAL Header to reduce contention during ReserveXLogInsertLocation()
I suggest we focus on the engineering. I've not discussed this patch with Pavan offline. On 23 March 2018 at 23:32, Michael Paquier wrote: > On Fri, Mar 23, 2018 at 11:06:48AM +0000, Simon Riggs wrote: >> Your assumption that I would commit a new patch that was 29 mins old >> is frankly pretty ridiculous, so yes, lets keep calm. > > When a committer says that a patch is "ready for commit" and that he > calls for "last objections", I am understanding that you would be ready > to commit the patch from the moment such an email has been sent. Am I > the only one to think so? Now let's look at the numbers: > - The last patch sent is a v2, which implements a completely new > approach compared to v1. This is a non-trivial patch which touches > sensitive parts of the code. > - v2 has been sent exactly two weeks after the last email exchanged on > this thread. > - Per the data publicly available, it took less than 30 minutes to > review the patch, and there are zero comments about its contents. > I do patch review on a more-or-less daily basis, and look at threads on > hackers on a daily basis, but I really rarely see such an "efficient" > review pattern. You and Pavan have likely discussed the patch offline, > but nobody can guess what has been discussed and what have been the > arguments exchanged. > >> Enjoy your weekend and I'll be happy to read your review on Monday. > > Er. So this basically means that I need to do a commitment to look at > this patch in such a short time frame? If you are asking for reviews, > doing such requests by asking a proper question rather than by implying > it in an affirmation would seem more adapted to me, so this email bit is > making me uncomfortable. My apologies if I am not able to catch the > nuance in those words. > -- > Michael -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] Surjective functional indexes
On 23 March 2018 at 15:39, Konstantin Knizhnik wrote: > > > On 22.03.2018 23:37, Alvaro Herrera wrote: >> >> The rd_projidx (list of each nth element in the index list that is a >> projection index) thing looks odd. Wouldn't it make more sense to have >> a list of index OIDs that are projection indexes? >> > > rd_projidx is not a list, it is Bitmapset. It is just one of many bitmap > sets in RelationData: > > Bitmapset *rd_indexattr;/* identifies columns used in indexes */ > Bitmapset *rd_keyattr;/* cols that can be ref'd by foreign keys > */ > Bitmapset *rd_pkattr;/* cols included in primary key */ > Bitmapset *rd_idattr;/* included in replica identity index */ > > Yes, it is possible to construct Oid list of projectional indexes instead of > having bitmap which marks some indexes in projectional, but I am not sure > that > it will be more efficient both from CPU and memory footprint point of view > (in most indexes bitmap will consists of just one integer). In any case, I > do not think that it can have some measurable impact on performance or > memory size: > number of indexes and especially projectional indexes will very rarely > exceed 10. Alvaro's comment seems reasonable to me. The patch adds both rd_projindexattr and rd_projidx rd_projindexattr should be a Bitmapset, but rd_projidx looks strange now he mentions it. Above that in RelationData we have other structures that are List of OIDs, so Alvaro's proposal make sense. That would simplify the code in ProjectionIsNotChanged() by just looping over the list of projection indexes rather than the list of indexes So please could you make the change? -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] logical decoding of two-phase transactions
On 5 March 2018 at 16:37, Nikhil Sontakke wrote: >> >> I will re-submit with "git format-patch" soon. >> > PFA, patches in "format-patch" format. > > This patch set also includes changes in the test_decoding plugin along > with an additional savepoint related test case that was pointed out on > this thread, upstream. Reviewing 0001-Cleaning-up-and-addition-of-new-flags-in-ReorderBuff.patch Change from is_known_as_subxact to rbtxn_is_subxact loses some meaning, since rbtxn entries with this flag set false might still be subxacts, we just don't know yet. rbtxn_is_serialized refers to RBTXN_SERIALIZED so flag name should be RBTXN_IS_SERIALIZED so it matches Otherwise looks OK to commit Reviewing 0003-Add-support-for-logging-GID-in-commit-abort-WAL-reco Looks fine, reworked patch attached * added changes to xact.h from patch 4 so that this is a whole, committable patch * added comments to make abort and commit structs look same Attached patch is proposed for a separate, early commit as part of this -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services logging-GID-in-commit-abort-WAL.v2.patch Description: Binary data
Re: [HACKERS] MERGE SQL Statement for PG11
On 23 March 2018 at 11:26, Pavan Deolasee wrote: > > > On Fri, Mar 23, 2018 at 6:45 AM, Peter Geoghegan wrote: >> >> On Thu, Mar 22, 2018 at 6:02 PM, Alvaro Herrera >> wrote: >> > Incremental development is a good thing. Trying to do everything in a >> > single commit is great when time is infinite or even merely very long, >> > but if you run out of it, which I'm sure is common, leaving some things >> > out that can be reasonable implemented in a separate patch is perfectly >> > acceptable. >> >> We're talking about something that took me less than an hour to get >> working. AFAICT, it's just a matter of tweaking the grammar, and >> adding a bit of transformWithClause() boilerplate to the start of >> transformMergeStmt(). >>> >>> >>> I quickly implemented CTE support myself (not wCTE support, since >>> MERGE doesn't use RETURNING), and it wasn't tricky. It seems to work >>> when I mechanically duplicate the approach taken with other types of >>> DML statement in the parser. I have written a few tests, and so far it >>> holds up. Peter, if you have the code and you consider it important that this subfeature is in PostgreSQL, why not post the code so we can commit it? Why would we repeat what has already been done? -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Changing WAL Header to reduce contention during ReserveXLogInsertLocation()
On 23 March 2018 at 09:22, Michael Paquier wrote: > On Fri, Mar 23, 2018 at 09:04:55AM +0000, Simon Riggs wrote: >> So it shows clear benefit for both bulk actions and OLTP, with no >> regressions. >> >> No objection exists to the approach used in the patch, so I'm now >> ready to commit this. >> >> Last call for objections? > > Please hold on. It is Friday evening here. First I cannot take the > time to articulate an answer you are requesting on the other thread > part. Second, the latest version of the patch has been sent from Pavan > a couple of hours ago, and you are proposing to commit it in a message > sent 29 minutes after the last version has been sent. > > Let's cool down a bit and take some time for reviews, please. You still > have one week until the end of the commit fest anyway. Your assumption that I would commit a new patch that was 29 mins old is frankly pretty ridiculous, so yes, lets keep calm. Enjoy your weekend and I'll be happy to read your review on Monday. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Changing WAL Header to reduce contention during ReserveXLogInsertLocation()
On 23 March 2018 at 08:35, Pavan Deolasee wrote: > > > On Fri, Mar 9, 2018 at 8:49 PM, Peter Eisentraut > wrote: >> >> On 2/1/18 19:21, Simon Riggs wrote: >> > If we really can't persuade you of that, it doesn't sink the patch. We >> > can have the WAL pointer itself - it wouldn't save space but it would >> > at least alleviate the spinlock. >> >> Do you want to send in an alternative patch that preserves the WAL >> pointer and only changes the locking? >> > > Sorry for the delay. Here is an updated patch which now replaces xl_prev > with xl_curr, thus providing similar safeguards against corrupted or torn > WAL pages, but still providing benefits of atomic operations. > > I repeated the same set of tests and the results are almost similar. These > tests are done on a different AWS instance though and hence not comparable > to previous tests. What we do in these tests is essentially call > ReserveXLogInsertLocation() 1M times to reserve 256 bytes each time, in a > single select statement (the function is exported and a SQL-callable routine > is added for the purpose of the tests) > > Patched: > == > #clients #tps > 1 34.195311 > 2 29.001584 > 4 31.712009 > 8 35.489272 > 16 41.950044 > > Master: > == > #clients #tps > 1 24.128004 > 2 12.326135 > 4 8.334143 > 8 16.035699 > 16 8.502794 > > So that's pretty good improvement across the spectrum. So it shows clear benefit for both bulk actions and OLTP, with no regressions. No objection exists to the approach used in the patch, so I'm now ready to commit this. Last call for objections? -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Changing WAL Header to reduce contention during ReserveXLogInsertLocation()
On 2 February 2018 at 02:17, Michael Paquier wrote: > On Fri, Feb 02, 2018 at 12:21:49AM +0000, Simon Riggs wrote: >> Yes, it would be about 99% of the time. > > When it comes to recovery, I don't think that 99% is a guarantee > sufficient. (Wondering about the maths behind such a number as well.) > >> But you have it backwards - we are not assuming that case. That is the >> only case that has risk - the one where an old WAL record starts at >> exactly the place the latest one stops. Otherwise the rest of the WAL >> record will certainly fail the CRC check, since it will effectively >> have random data in it, as you say. > > Your patch assumes that a single WAL segment recycling is fine to > escape based on the file name, but you need to think beyond that. It > seems to me that your assumption is wrong if the tail of a segment gets > reused after more cycles than a single one, which could happen when > doing recovery from an archive, where segments used could have junk in > them. So you actually *increase* the odds of problems if a segment is > forcibly switched and archived, then reused in recovery after being > fetched from an archive. This seems to be a pivotal point in your argument, yet it is just an assertion. Please explain for the archive why you think the odds increase in the way you describe. Thanks -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] Surjective functional indexes
Please excuse my absence from this thread. On 2 March 2018 at 12:34, Konstantin Knizhnik wrote: > > > On 01.03.2018 22:48, Andres Freund wrote: >> >> Hi, >> >> I still don't think, as commented upon by Tom and me upthread, that we >> want this feature in the current form. The performance benefit of this patch is compelling. I don't see a comment from Tom along those lines, just a queston as to whether the code is in the right place. >> Your arguments about layering violations seem to be counteracted by the >> fact that ProjectionIsNotChanged() basically appears to do purely >> executor work inside inside heapam.c. > > > ProjectionIsNotChanged doesn't perform evaluation itslef, is calls > FormIndexDatum. > FormIndexDatum is also called in 13 other places in Postgres: > analyze.c, constraint.c, index.c, tuplesort, execIndexing.c > > I still think that trying to store somewhere result odf index expression > evaluation to be able to reuse in index update is not so good idea: > it complicates code and add extra dependencies between different modules. > And benefits of such optimization are not obvious: is index expression > evaluation is so expensive, then recheck_on_update should be prohibited for > such index at all. Agreed. What we are doing is trying to avoid HOT updates. We make the check for HOT update dynamically when it could be made statically in some cases. Simplicity says just test it dynamically. We could also do work to check for HOT update for functional indexes earlier but for the same reason, I don't think its worth adding. The patch itself is about as simple as it can be, so the code is in the right place. My problems with it have been around the actual user interface to request it. Index option handling has changed (and this needs rebase!), but other than that I think we want this and am planning to commit something early next week. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services