Re: Usage of epoch in txid_current
On Sun, Jun 30, 2019 at 9:07 AM Robert Haas wrote: > On Mon, Jun 24, 2019 at 1:43 PM Alvaro Herrera > wrote: > > I think enlarging multixacts to 64 bits is a terrible idea. I would > > rather get rid of multixacts completely; zheap is proposing not to use > > multixacts at all, for example. > > But zedstore, at least as of the Saturday after PGCon, is proposing to > keep using them, after first widening them to 64 bits: > > https://www.postgresql.org/message-id/CA+TgmoYeTeQSmALox0PmSm5Gh03oe=unjhml+k+btofy_u2...@mail.gmail.com > > I think we all have a visceral reaction against MultiXacts at this > point; they've just caused us too much pain, and the SLRU > infrastructure is a bunch of crap.[1] However, the case where N > sessions all do "SELECT * FROM sometab FOR SHARE" is pretty wretched > without multixacts. You'll just have to keep reducing the tuple > density per page to fit all the locks, whereas the current heap is > totally fine and neither the heap nor the multixact space bloats all > that terribly much. > > I currently think it's likely that zheap will use something > multixact-like rather than actually using multixacts per se. I am > having trouble seeing how we can have some sort of fixed-bit-width ID > number that represents as set of XIDs for purposes of lock-tracking > without creating some nasty degenerate cases that don't exist > today.[2] The new thing described over here is intended to support something a bit like multixacts: https://www.postgresql.org/message-id/CA%2BhUKGKni7EEU4FT71vZCCwPeaGb2PQOeKOFjQJavKnD577UMQ%40mail.gmail.com For example, here is some throw-away demo code that puts arbitrary data into an undo log, in this case a list of xids given with SELECT test_multixact(ARRAY[1234, 2345, 3456]), and provides a callback to say when the data can be discarded, in this case when all of those xids are no longer running. You can see the record with SELECT * FROM undoinspect('shared'), until it gets eaten by a background worker. The reason it has to be an in-core demo is just because we don't have a way to do extensions that have an RMGR ID and callback functions yet. Although this demo throws the return value away, the function PrepareUndoInsert() returns a 64 bit UndoRecPtr which could be stored in any page and can be used to retrieve the records (via the buffer pool) including the binary payload which can be whatever struct you like (though there is a size limit so you might need more than one record for a huge list of multi-lockers). When you try to load the record, you might be told that it's gone, which means that the lockers are gone, whcih means that your callback must have said that was OK. This probably makes most sense for a system that is already planning to use UndoRecPtr for other things, like MVCC, since it probably already has per page or per tuple space to store UndoRecPtr, and updates and explicit locks are so closely related. https://github.com/EnterpriseDB/zheap/commit/c92fdfd1f1178cbb44557a7c630b366d1539c332 -- Thomas Munro https://enterprisedb.com
Re: Usage of epoch in txid_current
On Sat, Jun 22, 2019 at 11:01 AM Alexander Korotkov wrote: > 3. Change SLRU on-disk format to handle 64-bit xids and multixacts. > In particular make SLRU page numbers 64-bit too. Add SLRU upgrade > procedure to pg_upgrade. +1 for doing this for the xid-indexed SLRUs so the segment file names are never recycled. Having yet another level of wraparound code to feed and water and debug is not nice: https://www.postgresql.org/message-id/flat/20190202083822.GC32531%40gust.leadboat.com -- Thomas Munro https://enterprisedb.com
Re: Usage of epoch in txid_current
On Mon, Jun 24, 2019 at 1:43 PM Alvaro Herrera wrote: > I think enlarging multixacts to 64 bits is a terrible idea. I would > rather get rid of multixacts completely; zheap is proposing not to use > multixacts at all, for example. But zedstore, at least as of the Saturday after PGCon, is proposing to keep using them, after first widening them to 64 bits: https://www.postgresql.org/message-id/CA+TgmoYeTeQSmALox0PmSm5Gh03oe=unjhml+k+btofy_u2...@mail.gmail.com I think we all have a visceral reaction against MultiXacts at this point; they've just caused us too much pain, and the SLRU infrastructure is a bunch of crap.[1] However, the case where N sessions all do "SELECT * FROM sometab FOR SHARE" is pretty wretched without multixacts. You'll just have to keep reducing the tuple density per page to fit all the locks, whereas the current heap is totally fine and neither the heap nor the multixact space bloats all that terribly much. I currently think it's likely that zheap will use something multixact-like rather than actually using multixacts per se. I am having trouble seeing how we can have some sort of fixed-bit-width ID number that represents as set of XIDs for purposes of lock-tracking without creating some nasty degenerate cases that don't exist today.[2] -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company [1] At least in comparison to other parts of the PostgreSQL infrastructure which are more awesome. [2] I'd like to credit Andres Freund for helping me understand this issue better.
Re: Usage of epoch in txid_current
On Mon, Jun 24, 2019 at 8:43 PM Alvaro Herrera wrote: > > On 2019-Jun-22, Alexander Korotkov wrote: > > > 2. Also introduce FullMultixactId, and apply to MultixactId the > > similar change as #1. > > 3. Change SLRU on-disk format to handle 64-bit xids and multixacts. > > In particular make SLRU page numbers 64-bit too. Add SLRU upgrade > > procedure to pg_upgrade. > > I think enlarging multixacts to 64 bits is a terrible idea. I would > rather get rid of multixacts completely; zheap is proposing not to use > multixacts at all, for example. The amount of bloat caused by > pg_multixact data is already pretty bad ... because of which requiring > pg_multixact to be rewritten by pg_upgrade would cause a severe slowdown > for upgrades. (It worked for FSM because the volume is tiny, but that's > not the case for multixact.) > > I think the pg_upgrade problem can be worked around by creating a new > dir pg_multixact64 (an example) which is populated from the upgrade > point onwards; so you keep the old data unchanged, and new multixacts > use the new location, but the system knows to read the old one when > reading old tuples. But, as I said above, I would rather not have > multixacts at all. > > Another idea: create a new table AM that mimics heapam (I think ß-heap > "eszett-heap" is a good name), except that it reuses zheap's idea of > keeping "transaction locks" separately for tuple locks rather than > multixacts; heapam continues to use 32bits multixact. Tables can be > migrated from heapam to ß-heap (alter table ... set access method) to > incrementally reduce reliance on multixacts going forward. No need for > pg_upgrade-time disruption. We need multixacts to store row-level locks information. I remember they weren't crash-safe some time ago; because we basically don't need lock information about previous server run: all that locks are for sure released. Due to some difficulties we finally made them crash-safe (I didn't follow that in much details). But what about discarding mulixacts on pg_upgrade? Is it feasible goal? -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: Usage of epoch in txid_current
Hi! On Mon, Jun 24, 2019 at 6:27 PM Andres Freund wrote: > On June 24, 2019 8:19:13 AM PDT, Robert Haas wrote: > >On Fri, Jun 21, 2019 at 7:01 PM Alexander Korotkov > > wrote: > >> On Thu, Mar 28, 2019 at 8:30 AM Thomas Munro > >wrote: > >> > Thanks for the reviews! Pushed. > >> > >> Any ideas we should move towards 64-bit xids in more places? That > >has > >> been discussed several times already. I think last time it was > >> discussed in person during FOSDEM PGDay 2018 Developer Meeting [1]. > >> There we've discussed that it probably doesn't worth it to change > >> 32-bit on-disk xids in heap. It's better to leave existing heap "as > >> is", but allow other pluggable table access methods to support 64-bit > >> xids. Given now we have pluggable table access methods, we may build > >> a plan on full support of 64-bit xids in core. > >> > >> In my vision sketchy plan may look like this. > >> > >> 1. Change all non-heap types of xids from TransactionId to > >> FullTransactionId. > > > >I think it's fine to replace TransactionId with FullTransactionId > >without stressing about it too much in code that's not that heavily > >trafficked. However, I'm not sure we can do that across the board. For > >example, converting snapshots to use 64-bit XIDs would mean that in > >the worst case a snapshot will use twice as many cache lines, and that > >might have performance implications on some workloads. > > We could probably expand the transaction IDs on access or when computing them > for most of these, as usually they'll largely be about currently running > transactions. It e.g. seems sensible to keep the procarray at 32 bit xids, > expand xmin/xmax to 64 when computing snapshots, and leave the snapshot's > transaction ID array at 32xids. That ought to be an negligible overhead. I see, replace TransactionId with FullTransactionId just everywhere doesn't look like good idea. Given now we have pluggable table AMs, new table AMs may not have data wraparound problem. For instance, table AM could store xids 64-bit internally, and convert them to 32-bit on-the-fly. If xid is too old for conversion, just replace it with FrozenTransactionId. So, the restriction we really have now is that running xacts and active snapshots should fit 2^31 range. Turning Snapshot xmin/xmas into 64-bit will soften this restriction, then just running xacts should fit 2^31 range while snapshots could be older. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: Usage of epoch in txid_current
On 2019-Jun-22, Alexander Korotkov wrote: > 2. Also introduce FullMultixactId, and apply to MultixactId the > similar change as #1. > 3. Change SLRU on-disk format to handle 64-bit xids and multixacts. > In particular make SLRU page numbers 64-bit too. Add SLRU upgrade > procedure to pg_upgrade. I think enlarging multixacts to 64 bits is a terrible idea. I would rather get rid of multixacts completely; zheap is proposing not to use multixacts at all, for example. The amount of bloat caused by pg_multixact data is already pretty bad ... because of which requiring pg_multixact to be rewritten by pg_upgrade would cause a severe slowdown for upgrades. (It worked for FSM because the volume is tiny, but that's not the case for multixact.) I think the pg_upgrade problem can be worked around by creating a new dir pg_multixact64 (an example) which is populated from the upgrade point onwards; so you keep the old data unchanged, and new multixacts use the new location, but the system knows to read the old one when reading old tuples. But, as I said above, I would rather not have multixacts at all. Another idea: create a new table AM that mimics heapam (I think ß-heap "eszett-heap" is a good name), except that it reuses zheap's idea of keeping "transaction locks" separately for tuple locks rather than multixacts; heapam continues to use 32bits multixact. Tables can be migrated from heapam to ß-heap (alter table ... set access method) to incrementally reduce reliance on multixacts going forward. No need for pg_upgrade-time disruption. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Usage of epoch in txid_current
Hi, On June 24, 2019 8:19:13 AM PDT, Robert Haas wrote: >On Fri, Jun 21, 2019 at 7:01 PM Alexander Korotkov > wrote: >> On Thu, Mar 28, 2019 at 8:30 AM Thomas Munro >wrote: >> > Thanks for the reviews! Pushed. >> >> Any ideas we should move towards 64-bit xids in more places? That >has >> been discussed several times already. I think last time it was >> discussed in person during FOSDEM PGDay 2018 Developer Meeting [1]. >> There we've discussed that it probably doesn't worth it to change >> 32-bit on-disk xids in heap. It's better to leave existing heap "as >> is", but allow other pluggable table access methods to support 64-bit >> xids. Given now we have pluggable table access methods, we may build >> a plan on full support of 64-bit xids in core. >> >> In my vision sketchy plan may look like this. >> >> 1. Change all non-heap types of xids from TransactionId to >> FullTransactionId. > >I think it's fine to replace TransactionId with FullTransactionId >without stressing about it too much in code that's not that heavily >trafficked. However, I'm not sure we can do that across the board. For >example, converting snapshots to use 64-bit XIDs would mean that in >the worst case a snapshot will use twice as many cache lines, and that >might have performance implications on some workloads. We could probably expand the transaction IDs on access or when computing them for most of these, as usually they'll largely be about currently running transactions. It e.g. seems sensible to keep the procarray at 32 bit xids, expand xmin/xmax to 64 when computing snapshots, and leave the snapshot's transaction ID array at 32xids. That ought to be an negligible overhead. Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Re: Usage of epoch in txid_current
On Fri, Jun 21, 2019 at 7:01 PM Alexander Korotkov wrote: > On Thu, Mar 28, 2019 at 8:30 AM Thomas Munro wrote: > > Thanks for the reviews! Pushed. > > Any ideas we should move towards 64-bit xids in more places? That has > been discussed several times already. I think last time it was > discussed in person during FOSDEM PGDay 2018 Developer Meeting [1]. > There we've discussed that it probably doesn't worth it to change > 32-bit on-disk xids in heap. It's better to leave existing heap "as > is", but allow other pluggable table access methods to support 64-bit > xids. Given now we have pluggable table access methods, we may build > a plan on full support of 64-bit xids in core. > > In my vision sketchy plan may look like this. > > 1. Change all non-heap types of xids from TransactionId to > FullTransactionId. I think it's fine to replace TransactionId with FullTransactionId without stressing about it too much in code that's not that heavily trafficked. However, I'm not sure we can do that across the board. For example, converting snapshots to use 64-bit XIDs would mean that in the worst case a snapshot will use twice as many cache lines, and that might have performance implications on some workloads. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Usage of epoch in txid_current
On Thu, Mar 28, 2019 at 8:30 AM Thomas Munro wrote: > Thanks for the reviews! Pushed. Any ideas we should move towards 64-bit xids in more places? That has been discussed several times already. I think last time it was discussed in person during FOSDEM PGDay 2018 Developer Meeting [1]. There we've discussed that it probably doesn't worth it to change 32-bit on-disk xids in heap. It's better to leave existing heap "as is", but allow other pluggable table access methods to support 64-bit xids. Given now we have pluggable table access methods, we may build a plan on full support of 64-bit xids in core. In my vision sketchy plan may look like this. 1. Change all non-heap types of xids from TransactionId to FullTransactionId. But in shared memory change TransactionId to pg_atomic_uint64 in order to guarantee atomicity of access, which we require in multiple places. 2. Also introduce FullMultixactId, and apply to MultixactId the similar change as #1. 3. Change SLRU on-disk format to handle 64-bit xids and multixacts. In particular make SLRU page numbers 64-bit too. Add SLRU upgrade procedure to pg_upgrade. 4. Change relfrozenxid and relminmxid to something like rellimitxid and rellimitmxid. So, we don't imply there is restriction of 32-bit xids. Instead, we let table AM place (or don't place) a limit to advancing nextXid, nextMultixact. 5. Table AM API would be switched to 64-bit xids/multixacts, while heap will remain using 32-bit. So, heap should convert 32-bit xid value to 64-bit basing on nextXid/nextMultixact. Assuming we set rellimitxid and rellimitmxid for relation as oldestxid + 2^32 and oldestmxid + 2^32, we may safely assume the 32-bit values of xids/multixacts are in 2^32 range before nextXid/nextMultixact. Thanks to this even in heap we would be able to operate 2^32 xids/multixacts simultaneously instead of 2^31 we have now. So, this is how I see the possible plan. I would be glad to see a feedback. Unfortunately, I don't have enough time to implement all of this. But I think there are many interested parties in finally having 64-bit xids in core. Especially assuming we now have pluggable table AMs, and it would be ridiculous to spear limitation of 32-bit xids to new table AMs. Links 1. https://wiki.postgresql.org/wiki/FOSDEM/PGDay_2018_Developer_Meeting -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: Usage of epoch in txid_current
Round pegs fit into square holes when the peg is smaller.,.. Hey why did the chicken follow Simon off the ledge? Just because he told him to?!! Was he really trying to get home, did he find out the hard way things no one else knows but what he had learnt on the other side, just to let you know you don't ever really die. Plan to stay. I know enough to now The grass is not greener. There are not ten virgins for each man. The women are equals and love to suck Dick just as much as ya mother. Don't laugh just yet,,.. at this 13th generation level we are all pretty much related. And i am the original original, along with my brothers and your God, a princess to a King, stolen long long ago, Now doing a shit tonne better thank you, for asking. Thanks for the flowers, I never got. Thanks for the love though... I did feel that. I'm one month sober. It was only when I died again did I get all the secrets that only the dead get and forget before living again just to tell you.,, im alive and i have insider knowledge, you'd pay zillions for! I jumped into that portal and landed right back where you put me. Now not so glitchy,,, though money alludes me, I see all the numbers in a matrix like Neo, only on my computer screen,,, this world is real, and so am I. Flowers smell beautiful, and look precious crystals are purposeful and magic... the sky is an ever changing scenescape that words can only ever scrape the surface of in trying to describe it's magnificence, whatever the weather. I'm solo. In need of some close up cuddling with Jonathan aka Andrew or whatever you chose, you know who you are. Hurry up, cos I miss you like crazy in this secured medicated snow globe I'm safe in right now. All I need is you. Love your guts, Em XO Aka: Surety On Wed, 27 Mar. 2019, 22:45 Thomas Munro, wrote: > On Tue, Mar 26, 2019 at 12:58 PM Thomas Munro > wrote: > > On Tue, Mar 26, 2019 at 3:23 AM Heikki Linnakangas > wrote: > > > Looks good. > > I did some testing and proof-reading and made a few minor changes: > > * I tidied up the code that serialises transaction state. It was > already hammering round pegs into square holes, and the previous patch > made that even worse, so I added a new struct > SerializedTransactionState to do this properly. > > * I open-coded Get{Current,Top}TransactionId[IfAny](), rather than > having them call the "Full" variants, so that nobody could accuse me > of adding an extra function call that might not be inlined. It's just > a couple of lines anyway. > > * I kept the name GetNewTransactionId(), since it's referred to in > many places in comments etc. Previously I had called it > GetNewFullTransactionId() and had GetNewTransactionId() just call that > and truncate to 32 bits, but there wasn't much point without an > in-tree caller for the narrow version. If there is any out-of-tree > code calling this, it will now fail to compile thanks to our > non-convertible return type. > > These are the patches I'm planning to push tomorrow. > > I still need to look into Andres's suggestion about getting rid of > epoch from various user interfaces and showing 64 bit numbers. I > should probably also find a place in the relevant README to explain > this new scheme. I will post follow-up patches for those. > > -- > Thomas Munro > https://enterprisedb.com >
Re: Usage of epoch in txid_current
On Sat, May 4, 2019 at 1:34 PM Jeff Janes wrote: > On Thu, Mar 28, 2019 at 1:30 AM Thomas Munro > wrote: > >> On Thu, Mar 28, 2019 at 1:48 AM Heikki Linnakangas >> wrote: >> > Once we have the FullTransactionId type and basic macros in place, I'm >> > sure we could tidy up a bunch of code by using them. > > > Thanks for the reviews! Pushed. >> > > I think that this might be broken. > > We have this change: > > @@ -73,7 +75,8 @@ GetNewTransactionId(bool isSubXact) > > LWLockAcquire(XidGenLock, LW_EXCLUSIVE); > > - xid = XidFromFullTransactionId(ShmemVariableCache->nextFullXid); > + full_xid = ShmemVariableCache->nextFullXid; > + xid = XidFromFullTransactionId(full_xid); > > But then later on in an little-used code path around line 164: > > /* Re-acquire lock and start over */ > LWLockAcquire(XidGenLock, LW_EXCLUSIVE); > xid = XidFromFullTransactionId(ShmemVariableCache->nextFullXid); > > full_xid does not get updated, but then later on full_xid gets returned in > lieu of xid. > > Is there a reason that this is OK? > > Ah, sorry for the noise. I see that this has been fixed already. I wasn't looking at HEAD, or at the other email thread, when I "discovered" this. Sorry for the noise. Cheers, Jeff
Re: Usage of epoch in txid_current
On Thu, Mar 28, 2019 at 1:30 AM Thomas Munro wrote: > On Thu, Mar 28, 2019 at 1:48 AM Heikki Linnakangas > wrote: > > Once we have the FullTransactionId type and basic macros in place, I'm > > sure we could tidy up a bunch of code by using them. Thanks for the reviews! Pushed. > I think that this might be broken. We have this change: @@ -73,7 +75,8 @@ GetNewTransactionId(bool isSubXact) LWLockAcquire(XidGenLock, LW_EXCLUSIVE); - xid = XidFromFullTransactionId(ShmemVariableCache->nextFullXid); + full_xid = ShmemVariableCache->nextFullXid; + xid = XidFromFullTransactionId(full_xid); But then later on in an little-used code path around line 164: /* Re-acquire lock and start over */ LWLockAcquire(XidGenLock, LW_EXCLUSIVE); xid = XidFromFullTransactionId(ShmemVariableCache->nextFullXid); full_xid does not get updated, but then later on full_xid gets returned in lieu of xid. Is there a reason that this is OK? Cheers, Jeff
Re: Usage of epoch in txid_current
On Thu, Mar 28, 2019 at 1:48 AM Heikki Linnakangas wrote: > Once we have the FullTransactionId type and basic macros in place, I'm > sure we could tidy up a bunch of code by using them. For example, > TransactionIdInRecentPast() in walsender.c would be simpler, if the > caller dealt with FullTransactionIds rather than xid+epoch. But it makes > sense to do that separately. +1 > > + /* > > + * It is safe to read nextFullXid without a lock, because this is only > > + * called from the startup process, meaning that no other process can > > + * modify it. > > + */ > > + Assert(AmStartupProcess()); > > + > > This assertion fails on WAL replay in single-user mode: Fixed. (Embarrassingly I had that working in v7 but broke it in v8). I decided to do some testing on a 32 bit system, and ran into weird new problem in heap_compute_xid_horizon_for_tuples() which I assumed to be somehow my fault due to the mention of xid horizons, but I eventually realised that master was broken on that machine and followed that up elsewhere. Phew. Thanks for the reviews! Pushed. -- Thomas Munro https://enterprisedb.com
Re: Usage of epoch in txid_current
On 27/03/2019 13:44, Thomas Munro wrote: * I tidied up the code that serialises transaction state. It was already hammering round pegs into square holes, and the previous patch made that even worse, so I added a new struct SerializedTransactionState to do this properly. Ah, good, it used to be confusing. I still need to look into Andres's suggestion about getting rid of epoch from various user interfaces and showing 64 bit numbers. I should probably also find a place in the relevant README to explain this new scheme. I will post follow-up patches for those. Once we have the FullTransactionId type and basic macros in place, I'm sure we could tidy up a bunch of code by using them. For example, TransactionIdInRecentPast() in walsender.c would be simpler, if the caller dealt with FullTransactionIds rather than xid+epoch. But it makes sense to do that separately. +/* + * Advance nextFullXid to the value after a given xid. The epoch is inferred. + * This must only be called during recovery or from two-phase start-up code. + */ +void +AdvanceNextFullTransactionIdPastXid(TransactionId xid) +{ + FullTransactionId newNextFullXid; + TransactionId next_xid; + uint32 epoch; + + /* +* It is safe to read nextFullXid without a lock, because this is only +* called from the startup process, meaning that no other process can +* modify it. +*/ + Assert(AmStartupProcess()); + This assertion fails on WAL replay in single-user mode: $ bin/postgres --single -D data postgres 2019-03-27 14:32:35.058 EET [32359] LOG: database system was interrupted; last known up at 2019-03-27 14:32:18 EET 2019-03-27 14:32:35.144 EET [32359] LOG: database system was not properly shut down; automatic recovery in progress 2019-03-27 14:32:35.148 EET [32359] LOG: redo starts at 0/15BB7B0 TRAP: FailedAssertion("!((MyAuxProcType == StartupProcess))", File: "varsup.c", Line: 269) Aborted - Heikki
Re: Usage of epoch in txid_current
On Tue, Mar 26, 2019 at 12:58 PM Thomas Munro wrote: > On Tue, Mar 26, 2019 at 3:23 AM Heikki Linnakangas wrote: > > Looks good. I did some testing and proof-reading and made a few minor changes: * I tidied up the code that serialises transaction state. It was already hammering round pegs into square holes, and the previous patch made that even worse, so I added a new struct SerializedTransactionState to do this properly. * I open-coded Get{Current,Top}TransactionId[IfAny](), rather than having them call the "Full" variants, so that nobody could accuse me of adding an extra function call that might not be inlined. It's just a couple of lines anyway. * I kept the name GetNewTransactionId(), since it's referred to in many places in comments etc. Previously I had called it GetNewFullTransactionId() and had GetNewTransactionId() just call that and truncate to 32 bits, but there wasn't much point without an in-tree caller for the narrow version. If there is any out-of-tree code calling this, it will now fail to compile thanks to our non-convertible return type. These are the patches I'm planning to push tomorrow. I still need to look into Andres's suggestion about getting rid of epoch from various user interfaces and showing 64 bit numbers. I should probably also find a place in the relevant README to explain this new scheme. I will post follow-up patches for those. -- Thomas Munro https://enterprisedb.com 0001-Add-basic-infrastructure-for-64-bit-transaction-I-v8.patch Description: Binary data 0002-Use-FullTransactionId-for-the-transaction-stack-v8.patch Description: Binary data
Re: Usage of epoch in txid_current
On Tue, Mar 26, 2019 at 12:58 PM Thomas Munro wrote: > ... I think you could probably reclaim that space by > using a more compact representation of vacuumFlags, overflowed, > delayChkpt, nxids (it's funny, the comment says "as tightly as > possible", which clearly isn't the case). Woops, I take that back. I was thinking of sizeof(bool) == 4, but it's usually 1, so you could probably only squeeze a couple of bytes out by moving those flags. allPgXact elements are currently 12 bytes apart on this system. It's possible that expanding it to 16 bytes would be OK, not sure, and I see there was a whole thread investigating that a couple of years back: https://www.postgresql.org/message-id/flat/CAPpHfdtJY4zOEDsjad6J5AyZMqZcv6gSY9AkKpA7qN3jyQ2%2B1Q%40mail.gmail.com -- Thomas Munro https://enterprisedb.com
Re: Usage of epoch in txid_current
On Tue, Mar 26, 2019 at 3:23 AM Heikki Linnakangas wrote: > Looks good. > > I started to write a patch to use XID & epoch in dealing with GiST page > deletions [1], and I really could've used an epoch to go with > RecentGlobalXmin. I presume that would be pretty straightforward to have > with this, too. Yeah. A simple way to compute that would be to use FullTransactionId in PGXACT, so that GetSnapshotData() could find the minimum value and put that into GlobalRecentFullXmin, and set GlobalRecentXmin with the truncated version (or perhaps #define it as (XidFromFullTransactionId(GlobalRecentFullXmin))). Increasing the number of cachelines that GetSnapshotData() touches may not be popular, though. I think you could probably reclaim that space by using a more compact representation of vacuumFlags, overflowed, delayChkpt, nxids (it's funny, the comment says "as tightly as possible", which clearly isn't the case). You'd probably also need atomic 64 bit reads for the FullTransactionIds, which would be ugly on ancient systems but otherwise no problem. Instead of all that you might want to just infer the epoch instead. I'm not sure of the exact logic required for that off-hand. You'd probably want nextFullXid as a reference to compute the epoch, but you'd either need to acquire a lock to read it while you already hold ProcArrayLock (!), or read it atomically after releasing ProcArrayLock ... but then a Deathstation 9000 might allocate 8 billion more xids with all the necessary auto-vacuuming to allow that before scheduling you back onto the CPU. Admittedly, I haven't though about this very deeply at all, there may be a simple and correct way to do it. -- Thomas Munro https://enterprisedb.com
Re: Usage of epoch in txid_current
On 25/03/2019 12:49, Thomas Munro wrote: On Mon, Mar 25, 2019 at 5:01 PM Thomas Munro wrote: New version attached. I'd like to commit this for PG12. Here is a follow-up sketch patch that shows FullTransactionId being used in the transaction stack, so you can call eg GetCurrentFullTransactionId(). A table access method could use this to avoid the need to freeze stuff later (eg zheap). I suppose it's not strictly necessary, since you could use GetCurrentTransactionId() and infer the epoch by comparing with ReadNextFullTransactionId() (now that the epoch counting is reliable, due to patch 0001 which I'm repeating again here just for cfbot). But I suppose we want to get away from that sort of thing. Thoughts? Looks good. I started to write a patch to use XID & epoch in dealing with GiST page deletions [1], and I really could've used an epoch to go with RecentGlobalXmin. I presume that would be pretty straightforward to have with this, too. [1] https://www.postgresql.org/message-id/5f7ed675-d1fc-66ef-f316-645080ff9...@iki.fi - Heikki
Re: Usage of epoch in txid_current
On Mon, Mar 25, 2019 at 5:01 PM Thomas Munro wrote: > New version attached. I'd like to commit this for PG12. Here is a follow-up sketch patch that shows FullTransactionId being used in the transaction stack, so you can call eg GetCurrentFullTransactionId(). A table access method could use this to avoid the need to freeze stuff later (eg zheap). I suppose it's not strictly necessary, since you could use GetCurrentTransactionId() and infer the epoch by comparing with ReadNextFullTransactionId() (now that the epoch counting is reliable, due to patch 0001 which I'm repeating again here just for cfbot). But I suppose we want to get away from that sort of thing. Thoughts? -- Thomas Munro https://enterprisedb.com From 1f513f48a603c663753ddb5fa1d8e62abf500db9 Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Mon, 25 Mar 2019 12:29:23 +1300 Subject: [PATCH 1/2] Track the next transaction ID using 64 bits. Instead of inferring epoch advancement from xids and checkpoints, introduce a 64 bit FullTransactionId type and use it to track xid generation. This fixes an unlikely bug where epoch information is corrupted if you somehow manage to make TransactionId wrap around more than once between checkpoints. This commit creates some very basic infrastructure allowing for later patches to adopt 64 bit transaction IDs in more places, potentially including table access methods that don't need to freeze tables. Author: Thomas Munro Reported-by: Amit Kapila Reviewed-by: Andres Freund, Tom Lane Discussion: https://postgr.es/m/CAA4eK1%2BMv%2Bmb0HFfWM9Srtc6MVe160WFurXV68iAFMcagRZ0dQ%40mail.gmail.com --- src/backend/access/rmgrdesc/xlogdesc.c | 4 +- src/backend/access/transam/clog.c | 8 +- src/backend/access/transam/commit_ts.c | 4 +- src/backend/access/transam/multixact.c | 20 + src/backend/access/transam/subtrans.c | 8 +- src/backend/access/transam/twophase.c | 40 +++-- src/backend/access/transam/varsup.c | 76 src/backend/access/transam/xact.c | 35 +--- src/backend/access/transam/xlog.c | 113 ++-- src/backend/replication/walreceiver.c | 5 +- src/backend/replication/walsender.c | 5 +- src/backend/storage/ipc/procarray.c | 34 ++- src/backend/storage/ipc/standby.c | 2 +- src/backend/storage/lmgr/predicate.c| 2 +- src/backend/utils/adt/txid.c| 13 ++- src/backend/utils/misc/pg_controldata.c | 5 +- src/bin/pg_controldata/pg_controldata.c | 5 +- src/bin/pg_resetwal/pg_resetwal.c | 20 +++-- src/include/access/transam.h| 52 ++- src/include/access/xlog.h | 1 - src/include/catalog/pg_control.h| 6 +- src/include/storage/standby.h | 2 +- src/include/storage/standbydefs.h | 2 +- src/tools/pgindent/typedefs.list| 1 + 24 files changed, 225 insertions(+), 238 deletions(-) diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c index bfad284be08..33060f30429 100644 --- a/src/backend/access/rmgrdesc/xlogdesc.c +++ b/src/backend/access/rmgrdesc/xlogdesc.c @@ -14,6 +14,7 @@ */ #include "postgres.h" +#include "access/transam.h" #include "access/xlog.h" #include "access/xlog_internal.h" #include "catalog/pg_control.h" @@ -52,7 +53,8 @@ xlog_desc(StringInfo buf, XLogReaderState *record) checkpoint->ThisTimeLineID, checkpoint->PrevTimeLineID, checkpoint->fullPageWrites ? "true" : "false", - checkpoint->nextXidEpoch, checkpoint->nextXid, + EpochFromFullTransactionId(checkpoint->nextFullXid), + XidFromFullTransactionId(checkpoint->nextFullXid), checkpoint->nextOid, checkpoint->nextMulti, checkpoint->nextMultiOffset, diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c index aa089d83fa8..3bd55fbdd33 100644 --- a/src/backend/access/transam/clog.c +++ b/src/backend/access/transam/clog.c @@ -749,12 +749,12 @@ ZeroCLOGPage(int pageno, bool writeXlog) /* * This must be called ONCE during postmaster or standalone-backend startup, - * after StartupXLOG has initialized ShmemVariableCache->nextXid. + * after StartupXLOG has initialized ShmemVariableCache->nextFullXid. */ void StartupCLOG(void) { - TransactionId xid = ShmemVariableCache->nextXid; + TransactionId xid = XidFromFullTransactionId(ShmemVariableCache->nextFullXid); int pageno = TransactionIdToPage(xid); LWLockAcquire(CLogControlLock, LW_EXCLUSIVE); @@ -773,7 +773,7 @@ StartupCLOG(void) void TrimCLOG(void) { - TransactionId xid = ShmemVariableCache->nextXid; + TransactionId xid = XidFromFullTransactionId(ShmemVariableCache->nextFullXid); int pageno = TransactionIdToPage(xid); LWLockAcquire(CLogControlLock, LW_EXCLUSIVE); @@ -792,7 +792,7 @@ TrimCLOG(void) * but makes no WAL entry). Let's just be safe. (We need not worry about * pages beyond the current one, since those will
Re: Usage of epoch in txid_current
On Mon, Feb 4, 2019 at 8:41 PM Andres Freund wrote: > On 2018-09-19 13:58:36 +1200, Thomas Munro wrote: > > > +/* > > + * Advance nextFullXid to the value after a given xid. The epoch is > > inferred. > > + * If lock_free_check is true, then the caller must be sure that it's safe > > to > > + * read nextFullXid without holding XidGenLock (ie during recovery). > > + */ > > +void > > +AdvanceNextFullTransactionIdPastXid(TransactionId xid, bool > > lock_free_check) > > +{ > > + TransactionId current_xid; > > + uint32 epoch; > > + > > + if (lock_free_check && > > + !TransactionIdFollowsOrEquals(xid, > > + > > XidFromFullTransactionId(ShmemVariableCache->nextFullXid))) > > + return; > > + > > + LWLockAcquire(XidGenLock, LW_EXCLUSIVE); > > + current_xid = > > XidFromFullTransactionId(ShmemVariableCache->nextFullXid); > > + if (TransactionIdFollowsOrEquals(xid, current_xid)) > > + { > > + epoch = > > EpochFromFullTransactionId(ShmemVariableCache->nextFullXid); > > + if (xid < current_xid) > > + ++epoch; /* epoch wrapped */ > > + ShmemVariableCache->nextFullXid = > > MakeFullTransactionId(epoch, xid); > > + FullTransactionIdAdvance(&ShmemVariableCache->nextFullXid); > > + } > > + LWLockRelease(XidGenLock); > > } > > Is this really a good idea? Seems like it's going to perpetuate the > problematic epoch logic we had before? We could get rid of this in future, if certain WAL records and 2PC state start carrying full xids. That would be much bigger surgery than I wanted to do in this patch. This is the only place that converts 32 bit -> 64 bit with an inferred epoch component. I have explained why I think that's correct in new comments along with a new assertion. The theory is that the xids encountered in recovery and 2PC startup can never be too far out of phase with the current nextFullXid. In contrast, the original epoch tracking from commit 35af5422 wasn't bounded in the same sort of way. Certainly no other code should be allowed to do this sort of thing without very careful consideration of how the epoch is bounded. The patch deliberately provides no general purpose make-me-a-FullTransactionId-by-guessing-the-epoch facility. While reviewing this, I also realised that the "lock_free_check" function argument was unnecessary. The only place that was setting that to false, namely RecordKnownAssignedTransactionIds(), might as well just use the same behaviour. I've now rewritten AdvanceNextFullTransactionIdPastXid() completely in light of all that; please review. > > printf(_("Latest checkpoint's full_page_writes: %s\n"), > > ControlFile.checkPointCopy.fullPageWrites ? _("on") : > > _("off")); > > printf(_("Latest checkpoint's NextXID: %u:%u\n"), > > -ControlFile.checkPointCopy.nextXidEpoch, > > -ControlFile.checkPointCopy.nextXid); > > + > > EpochFromFullTransactionId(ControlFile.checkPointCopy.nextFullXid), > > + > > XidFromFullTransactionId(ControlFile.checkPointCopy.nextFullXid)); > I still think it's a mistake to not display the full xid here, and > rather perpetuate the epoch stuff. I'm ok with splitting that into a > separate commit, but this ought to be fixed in the same release imo. Ok. > > +/* > > + * A 64 bit value that contains an epoch and a TransactionId. This is > > + * wrapped in a struct to prevent implicit conversion to/from > > TransactionId. > > + * Allowing such conversions seems likely to be error-prone. > > + */ > > +typedef struct FullTransactionId > > +{ > > + uint64 value; > > +} FullTransactionId; > > Probably good to note here that not all values are valid normal xids. Done. > > +static inline FullTransactionId > > +MakeFullTransactionId(uint32 epoch, TransactionId xid) > > +{ > > + FullTransactionId result; > > + > > + result.value = ((uint64) epoch) << 32 | xid; > > + > > + return result; > > +} > > Make sounds a bit like it's allocating... Changed to FullTransactionIdFromEpochAndXid(). > > + dest->value++; > > + if (XidFromFullTransactionId(*dest) < FirstNormalTransactionId) > > + *dest = > > MakeFullTransactionId(EpochFromFullTransactionId(*dest), > Hm, this seems pretty odd coding to me. Why not do something like > > dest->value++; > while (XidFromFullTransactionId(*dest) < FirstNormalTransactionId) > dest->value++; > > That'd a) be a bit more readable, b) possible to do in a lockfree way at > a later point. Done. > > - TransactionId nextXid; /* copy of > > ShmemVariableCache->nextXid */ > > + TransactionId nextXid; /* xid from ShmemVariableCache->nextFullXid */ > > Probably worthwhile to pgindent partially. Done. I also added FullTransactionId to typedefs.list, bumped PG_CONTROL_VERSION and did
Re: Usage of epoch in txid_current
Hi, On 2018-09-19 13:58:36 +1200, Thomas Munro wrote: > +/* > + * Advance nextFullXid to the value after a given xid. The epoch is > inferred. > + * If lock_free_check is true, then the caller must be sure that it's safe to > + * read nextFullXid without holding XidGenLock (ie during recovery). > + */ > +void > +AdvanceNextFullTransactionIdPastXid(TransactionId xid, bool lock_free_check) > +{ > + TransactionId current_xid; > + uint32 epoch; > + > + if (lock_free_check && > + !TransactionIdFollowsOrEquals(xid, > + > XidFromFullTransactionId(ShmemVariableCache->nextFullXid))) > + return; > + > + LWLockAcquire(XidGenLock, LW_EXCLUSIVE); > + current_xid = XidFromFullTransactionId(ShmemVariableCache->nextFullXid); > + if (TransactionIdFollowsOrEquals(xid, current_xid)) > + { > + epoch = > EpochFromFullTransactionId(ShmemVariableCache->nextFullXid); > + if (xid < current_xid) > + ++epoch; /* epoch wrapped */ > + ShmemVariableCache->nextFullXid = MakeFullTransactionId(epoch, > xid); > + FullTransactionIdAdvance(&ShmemVariableCache->nextFullXid); > + } > + LWLockRelease(XidGenLock); > } Is this really a good idea? Seems like it's going to perpetuate the problematic epoch logic we had before? > --- a/src/bin/pg_resetwal/pg_resetwal.c > +++ b/src/bin/pg_resetwal/pg_resetwal.c > @@ -431,11 +431,15 @@ main(int argc, char *argv[]) >* if any, includes these values.) >*/ > if (set_xid_epoch != -1) > - ControlFile.checkPointCopy.nextXidEpoch = set_xid_epoch; > + ControlFile.checkPointCopy.nextFullXid = > + MakeFullTransactionId(set_xid_epoch, > + > XidFromFullTransactionId(ControlFile.checkPointCopy.nextFullXid)); > > if (set_xid != 0) > { > - ControlFile.checkPointCopy.nextXid = set_xid; > + ControlFile.checkPointCopy.nextFullXid = > + > MakeFullTransactionId(EpochFromFullTransactionId(ControlFile.checkPointCopy.nextFullXid), > + set_xid); > > /* >* For the moment, just set oldestXid to a value that will force > @@ -705,8 +709,7 @@ GuessControlValues(void) > ControlFile.checkPointCopy.ThisTimeLineID = 1; > ControlFile.checkPointCopy.PrevTimeLineID = 1; > ControlFile.checkPointCopy.fullPageWrites = false; > - ControlFile.checkPointCopy.nextXidEpoch = 0; > - ControlFile.checkPointCopy.nextXid = FirstNormalTransactionId; > + ControlFile.checkPointCopy.nextFullXid = MakeFullTransactionId(0, > FirstNormalTransactionId); > ControlFile.checkPointCopy.nextOid = FirstBootstrapObjectId; > ControlFile.checkPointCopy.nextMulti = FirstMultiXactId; > ControlFile.checkPointCopy.nextMultiOffset = 0; > @@ -786,8 +789,8 @@ PrintControlValues(bool guessed) > printf(_("Latest checkpoint's full_page_writes: %s\n"), > ControlFile.checkPointCopy.fullPageWrites ? _("on") : > _("off")); > printf(_("Latest checkpoint's NextXID: %u:%u\n"), > -ControlFile.checkPointCopy.nextXidEpoch, > -ControlFile.checkPointCopy.nextXid); > + > EpochFromFullTransactionId(ControlFile.checkPointCopy.nextFullXid), > + > XidFromFullTransactionId(ControlFile.checkPointCopy.nextFullXid)); > printf(_("Latest checkpoint's NextOID: %u\n"), > ControlFile.checkPointCopy.nextOid); > printf(_("Latest checkpoint's NextMultiXactId: %u\n"), > @@ -879,7 +882,7 @@ PrintNewControlValues(void) > if (set_xid != 0) > { > printf(_("NextXID: %u\n"), > -ControlFile.checkPointCopy.nextXid); > + > XidFromFullTransactionId(ControlFile.checkPointCopy.nextFullXid)); > printf(_("OldestXID:%u\n"), > ControlFile.checkPointCopy.oldestXid); > printf(_("OldestXID's DB: %u\n"), > @@ -889,7 +892,7 @@ PrintNewControlValues(void) > if (set_xid_epoch != -1) > { > printf(_("NextXID epoch:%u\n"), > -ControlFile.checkPointCopy.nextXidEpoch); > + > EpochFromFullTransactionId(ControlFile.checkPointCopy.nextFullXid)); > } I still think it's a mistake to not display the full xid here, and rather perpetuate the epoch stuff. I'm ok with splitting that into a separate commit, but this ought to be fixed in the same release imo. > diff --git a/src/include/access/transam.h b/src/include/access/transam.h > index 83ec3f19797..814becf96d7 10064
Re: Usage of epoch in txid_current
On February 4, 2019 6:43:44 AM GMT+01:00, Michael Paquier wrote: >On Sun, Feb 03, 2019 at 10:58:02PM +1100, Thomas Munro wrote: >> If there are no objections, I'm planning to do a round of testing and >> commit this shortly. > >Hm. That looks sane to me at quick glance. I am a bit on the edge >regaring the naming "FullTransactionId", which is actually a 64-bit >value with a 32-bit XID and a 32-bit epoch. Something like >TransactionIdWithEpoch or EpochTransactionId sounds a bit better to >me. My point is that "Full" is too generic for that. I'm not a fan of names with epoch in it - these are the real transaction IDs now. Conflating them with the until-now inferred epochs sounds like a bad idea to me. We IMO should just treat the new type as a 64bit uint, and the 32bit as a truncated version. Like, we could just add 64 as a prefix. Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Re: Usage of epoch in txid_current
Michael Paquier writes: > Hm. That looks sane to me at quick glance. I am a bit on the edge > regaring the naming "FullTransactionId", which is actually a 64-bit > value with a 32-bit XID and a 32-bit epoch. Something like > TransactionIdWithEpoch or EpochTransactionId sounds a bit better to > me. My point is that "Full" is too generic for that. WideTransactionId, maybe? I agree that "Full" seems like a poor adjective here. regards, tom lane
Re: Usage of epoch in txid_current
On Sun, Feb 03, 2019 at 10:58:02PM +1100, Thomas Munro wrote: > If there are no objections, I'm planning to do a round of testing and > commit this shortly. Hm. That looks sane to me at quick glance. I am a bit on the edge regaring the naming "FullTransactionId", which is actually a 64-bit value with a 32-bit XID and a 32-bit epoch. Something like TransactionIdWithEpoch or EpochTransactionId sounds a bit better to me. My point is that "Full" is too generic for that. Moved to next CF for now. -- Michael signature.asc Description: PGP signature
Re: Usage of epoch in txid_current
On Fri, Nov 30, 2018 at 12:12 AM Dmitry Dolgov <9erthali...@gmail.com> wrote: > > On Tue, Jul 24, 2018 at 7:25 AM Thomas Munro > > wrote: > > Here's a new version. I did some cosmetic clean-up, and I dropped the > > changes to pg_controldata output, replication epoch/xid processing > > code and various similar non-essential changes. For this patch I want > > just the new type + next xid generator + appropriate conversions. > > > > I propose that we get this committed early in the cycle. That'd > > maximise testing time in the tree, fix the bug identified by Amit, and > > leave plenty of time for later patches to use FullTransactionId in > > more places as appropriate. > > Then probably it's the good time to do so. Any opinions or more reviews here? > I'll move it to the next CF for now. If there are no objections, I'm planning to do a round of testing and commit this shortly. -- Thomas Munro http://www.enterprisedb.com
Re: Usage of epoch in txid_current
> On Tue, Jul 24, 2018 at 7:25 AM Thomas Munro > wrote: > > Here's a new version. I did some cosmetic clean-up, and I dropped the > changes to pg_controldata output, replication epoch/xid processing > code and various similar non-essential changes. For this patch I want > just the new type + next xid generator + appropriate conversions. > > I propose that we get this committed early in the cycle. That'd > maximise testing time in the tree, fix the bug identified by Amit, and > leave plenty of time for later patches to use FullTransactionId in > more places as appropriate. Then probably it's the good time to do so. Any opinions or more reviews here? I'll move it to the next CF for now.
Re: Usage of epoch in txid_current
On Tue, Jul 24, 2018 at 5:24 PM Thomas Munro wrote: > Here's a new version. Bitrot, rebased. -- Thomas Munro http://www.enterprisedb.com 0001-Track-the-next-xid-using-64-bits-v6.patch Description: Binary data
Re: Usage of epoch in txid_current
On Tue, Jul 17, 2018 at 1:55 AM, Tom Lane wrote: > Andres Freund writes: >> On 2018-07-15 16:41:35 -0400, Tom Lane wrote: >>> Andres Freund writes: On 2018-07-09 19:56:25 -0400, Tom Lane wrote: > Or, perhaps, use a struct in assert builds and int64 otherwise? > You could hide the ensuing notational differences in macros. > >>> [ bunch of test results ] >>> Offhand it would seem that we can get away with struct wrappers >>> on any platform where performance is really of concern today. > >> Cool, thanks for checking! +1 Here's a new version. I did some cosmetic clean-up, and I dropped the changes to pg_controldata output, replication epoch/xid processing code and various similar non-essential changes. For this patch I want just the new type + next xid generator + appropriate conversions. I propose that we get this committed early in the cycle. That'd maximise testing time in the tree, fix the bug identified by Amit, and leave plenty of time for later patches to use FullTransactionId in more places as appropriate. Does anyone have specific kinds of validation or testing they'd like to see? -- Thomas Munro http://www.enterprisedb.com 0001-Track-the-next-xid-using-64-bits-v5.patch Description: Binary data
Re: Usage of epoch in txid_current
Andres Freund writes: > On 2018-07-15 16:41:35 -0400, Tom Lane wrote: >> Andres Freund writes: >>> On 2018-07-09 19:56:25 -0400, Tom Lane wrote: Or, perhaps, use a struct in assert builds and int64 otherwise? You could hide the ensuing notational differences in macros. >> [ bunch of test results ] >> Offhand it would seem that we can get away with struct wrappers >> on any platform where performance is really of concern today. > Cool, thanks for checking! BTW, independently of any performance questions, these results show that my idea above was untenable anyway. On those platforms where there is a codegen difference, doing it like that would have resulted in an ABI difference between regular and assert builds. That's something to avoid, at least in any API that's visible to extension modules --- we've had project policy for some time that it should be possible to use non-assert extensions with assert-enabled core and vice versa. Conceivably we could have used the struct API only under a special devel flag that few people use except a buildfarm animal or two. But that's just a pain in the rear. regards, tom lane
Re: Usage of epoch in txid_current
Hi, On 2018-07-15 16:41:35 -0400, Tom Lane wrote: > Andres Freund writes: > > On 2018-07-09 19:56:25 -0400, Tom Lane wrote: > >> Or, perhaps, use a struct in assert builds and int64 otherwise? > >> You could hide the ensuing notational differences in macros. > > > That should be doable. But I'd like to check if it's necessary > > first. Optimizing passing an 8 byte struct shouldn't be hard for > > compilers these days - especially when most things dealing with them are > > inline functions. If we find that it's not a problem on contemporary > > compilers, it might be worthwhile to use a bit more type safety in other > > places too. > > [ bunch of test results ] > Offhand it would seem that we can get away with struct wrappers > on any platform where performance is really of concern today. Cool, thanks for checking! Greetings, Andres Freund
Re: Usage of epoch in txid_current
Andres Freund writes: > On 2018-07-09 19:56:25 -0400, Tom Lane wrote: >> Or, perhaps, use a struct in assert builds and int64 otherwise? >> You could hide the ensuing notational differences in macros. > That should be doable. But I'd like to check if it's necessary > first. Optimizing passing an 8 byte struct shouldn't be hard for > compilers these days - especially when most things dealing with them are > inline functions. If we find that it's not a problem on contemporary > compilers, it might be worthwhile to use a bit more type safety in other > places too. I checked your example program on hardware I have laying around: x86_64, gcc 4.4.7 (RHEL6): identical code, confirms your result x86_64, LLVM 9.1.0 (macOS High Sierra): also identical x86, gcc 4.2.1 (old macOS --- dromedary's host): also identical code; this surprised me a bit. It looks like the ABI convention is that 64-bit values must be passed on the stack but can be returned in a register pair, and it doesn't matter whether scalar or struct. PPC, gcc 4.0.1 (ancient macOS --- prairiedog's host): *not* identical. It looks like 64-bit arguments are passed in registers either way, but a struct function result is returned in memory not a register. ARM64, gcc 8.1.1 (Fedora 28): identical code ARM64, clang 6.0.0 (FreeBSD 12): identical code ARMv7, gcc 6.3.0 (Raspbian): *not* identical. Looks like both pass and return conventions are memory-based for structs. Offhand it would seem that we can get away with struct wrappers on any platform where performance is really of concern today. regards, tom lane
Re: Usage of epoch in txid_current
On Tue, Jul 10, 2018 at 2:15 PM, Thomas Munro wrote: > On Tue, Jul 10, 2018 at 1:30 PM, Andres Freund wrote:, >>> (errmsg_internal("next transaction ID: %u:%u; next >>> OID: %u", >>> - >>> checkPoint.nextXidEpoch, checkPoint.nextXid, >>> + >>> EpochFromFullTransactionId(checkPoint.nextFullXid), >>> + >>> XidFromFullTransactionId(checkPoint.nextFullXid), >>> >>> checkPoint.nextOid))); >> >> I don't think we should continue to expose epochs, but rather go to full >> xids where appropriate. > > OK, done. Ugh. When I changed pg_resetwal.c to print out the 64 bit value, I broke controldata.c's code that reads those strings back in, as revealed by a failing pg_upgrade test in check-world that I should have waited for. Perhaps we shouldn't change that output format... though I think it does make sense to move forwards here (and probably eventually for the other values too). So here is a version that fixes the parsing problem. Unfortunately pg_upgrade is not allowed to use pg_strtouint64() so for now I jammed the same macrology into controldata.c, but I assume that could be sorted out. -- Thomas Munro http://www.enterprisedb.com 0001-Track-the-next-xid-using-64-bits-v4.patch Description: Binary data
Re: Usage of epoch in txid_current
On 10 July 2018 at 10:40, Andres Freund wrote: > On 2018-07-10 10:32:44 +0800, Craig Ringer wrote: > > On 10 July 2018 at 07:35, Thomas Munro > > wrote: > > > > > > > > I played around with this idea yesterday. Experiment-grade patch > > > attached. Approach: > > > > > > 1. Introduce a new type BigTransactionId (better names welcome). > > > > > > > txid_current() should be changed to BigTransactionId too. It's currently > > bigint. > > That's quite a bit more work though, no? We'd have to introduce a new > SQL level type (including operators). As I said nearby, I think that's > a good idea, but I don't see any sort of benefit of forcing those > patches to be done at once. > Yeah. You're right. One step at a time. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Usage of epoch in txid_current
On 2018-07-10 10:32:44 +0800, Craig Ringer wrote: > On 10 July 2018 at 07:35, Thomas Munro > wrote: > > > > > I played around with this idea yesterday. Experiment-grade patch > > attached. Approach: > > > > 1. Introduce a new type BigTransactionId (better names welcome). > > > > txid_current() should be changed to BigTransactionId too. It's currently > bigint. That's quite a bit more work though, no? We'd have to introduce a new SQL level type (including operators). As I said nearby, I think that's a good idea, but I don't see any sort of benefit of forcing those patches to be done at once. > Users are currently left in a real mess when it comes to pg_stat_activity > xids, etc, which are epoch-unqualified. txid_current() reports an > epoch-qualified xid, but there's no sensible and safe conversion from > txid_current()'s bigint to an epoch-qualified ID. I am confused what you mean by "to an epoch-qualified ID". You want to split txid_current()'s return value into epoch and xid? Isn't that fairly trivial? SELECT bigxid & x'ff'::int8 AS xid, bigxid >> 32 epoch FROM txid_current() bigxid; Now I'm not saying that's pretty and couldn't be made easier. Greetings, Andres Freund
Re: Usage of epoch in txid_current
On 10 July 2018 at 10:32, Craig Ringer wrote: > > >> I think it's probably a good idea to make it very explicit when moving >> between big and small transaction IDs, hence the including of the word >> 'big' in variable and function names and the use of a function-like >> macro (rather than implicit conversion, which C doesn't give me a good >> way to prevent). > > > The only way I know of to prevent it is to use a wrapper by-value struct. > > ... and that's what I get for not finishing the thread before replying. Anyway, please consider the other point re txid_current() and the user facing xid type. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Usage of epoch in txid_current
On 10 July 2018 at 07:35, Thomas Munro wrote: > > I played around with this idea yesterday. Experiment-grade patch > attached. Approach: > > 1. Introduce a new type BigTransactionId (better names welcome). > txid_current() should be changed to BigTransactionId too. It's currently bigint. Users are currently left in a real mess when it comes to pg_stat_activity xids, etc, which are epoch-unqualified. txid_current() reports an epoch-qualified xid, but there's no sensible and safe conversion from txid_current()'s bigint to an epoch-qualified ID. age() takes 'xid', everything takes 'xid', but is completely oblivious to epoch. IMO all user-facing xids should be moved to BigTransactionId, with the 'xid' type ceasing to appear in any user facing role. > I think it's probably a good idea to make it very explicit when moving > between big and small transaction IDs, hence the including of the word > 'big' in variable and function names and the use of a function-like > macro (rather than implicit conversion, which C doesn't give me a good > way to prevent). The only way I know of to prevent it is to use a wrapper by-value struct. I've used this technique in the past where it's critical that implicit conversions don't occurr, but it sure isn't pretty. typedef struct BigTransactionId { uint64 value; } BigTransactionId; instead of typedef uint64 BigTransactionId; It's annoying having to get the value member, but it's definitely highly protective against errors. Pass-by-value isn't a problem, neither is return-by-value. BigTransactionId add_one(BigTransactionId xid) { BigTransactionId ret; ret.value = xid.value + 1; return ret; } Personally I think it's potentially worth the required gymnastics if older compilers do a sane job of code generation with this. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Usage of epoch in txid_current
On Tue, Jul 10, 2018 at 1:30 PM, Andres Freund wrote: > On 2018-07-10 13:20:52 +1200, Thomas Munro wrote: >> I don't know what to think about the encoding or meaning of non-normal >> xids in this thing. > > I'm not following what you mean by this? While defining FullTransactionIdPrecedes() in the image of TransactionIdPrecedes(), I wondered whether the xid component of a FullTransactionId would ever be one of the special values below FirstNormalTransactionId that require special treatment. The question doesn't actually arise in this code, however, so I ignored that thought and simply used x < y. >> diff --git a/src/backend/access/transam/subtrans.c >> b/src/backend/access/transam/subtrans.c >> index 4faa21f5aef..fa0847afc81 100644 >> --- a/src/backend/access/transam/subtrans.c >> +++ b/src/backend/access/transam/subtrans.c >> @@ -261,7 +261,7 @@ StartupSUBTRANS(TransactionId oldestActiveXID) >> LWLockAcquire(SubtransControlLock, LW_EXCLUSIVE); >> >> startPage = TransactionIdToPage(oldestActiveXID); >> - endPage = TransactionIdToPage(ShmemVariableCache->nextXid); >> + endPage = >> TransactionIdToPage(XidFromFullTransactionId(ShmemVariableCache->nextFullXid)); > > These probably need an intermediate variable. Ok, did that in a couple of places. >> diff --git a/src/backend/access/transam/varsup.c >> b/src/backend/access/transam/varsup.c >> index 394843f7e91..13020f54d98 100644 >> --- a/src/backend/access/transam/varsup.c >> +++ b/src/backend/access/transam/varsup.c >> @@ -73,7 +73,7 @@ GetNewTransactionId(bool isSubXact) >> >> LWLockAcquire(XidGenLock, LW_EXCLUSIVE); >> >> - xid = ShmemVariableCache->nextXid; >> + xid = XidFromFullTransactionId(ShmemVariableCache->nextFullXid); > > It's a bit over the top, I know, but I'd move the conversion to outside > the lock. Less because it makes a meaningful efficiency difference, and > more because I find it clearer, and easier to reason about if we ever go > to atomically incrementing nextFullXid. Erm -- I can't read nextFullXid until I have the lock, and then I need it in a 32 bit format for the code that follows immediately (I don't currently have xidVacLimit in FullTransactionId format). I'm not sure what you mean. >> @@ -6700,7 +6698,8 @@ StartupXLOG(void) >>wasShutdown ? "true" >> : "false"))); >> ereport(DEBUG1, >> (errmsg_internal("next transaction ID: %u:%u; next >> OID: %u", >> - >> checkPoint.nextXidEpoch, checkPoint.nextXid, >> + >> EpochFromFullTransactionId(checkPoint.nextFullXid), >> + >> XidFromFullTransactionId(checkPoint.nextFullXid), >>checkPoint.nextOid))); > > I don't think we should continue to expose epochs, but rather go to full > xids where appropriate. OK, done. Hmm, that's going to hurt some eyeballs, because other fields are shown in 32 bit xid format. Should we also go to hex everywhere so that we feeble humans can see the epoch component and compare the xid component by eye? Here's what I see on my test cluster: Latest checkpoint's NextXID: 184683724519 Latest checkpoint's oldestXID:4294901760 In hexadecimal money that'd be (with extra spaces, to line up with a monospace font): Latest checkpoint's NextXID: 002b0001fee7 Latest checkpoint's oldestXID: I left pg_resetwal's arguments -x and -e alone, but I suppose someone might want a new switch that does both together... >> +/* advance a FullTransactionId lvalue, handling wraparound correctly */ >> +#define FullTransactionIdAdvance(dest) \ >> + do { \ >> + (dest).value++; \ >> + if (XidFromFullTransactionId(dest) < FirstNormalTransactionId) >> \ >> + (dest) = >> MakeFullTransactionId(EpochFromFullTransactionId(dest), \ >> + >> FirstNormalTransactionId); \ >> + } while (0) > > Yikes. Why isn't this an inline function? Because it assigns to dest? Following existing malpractice, and yeah because it requires an lvalue so can't be changed without a different convention at the call site. Fixed. -- Thomas Munro http://www.enterprisedb.com 0001-Track-the-next-xid-using-64-bits-v3.patch Description: Binary data
Re: Usage of epoch in txid_current
On 2018-07-10 13:20:52 +1200, Thomas Munro wrote: > defined in transam.h because c.h didn't feel right. Yea, that looks better. > Client code lost the ability to use operator < directly. I needed to > use a static inline function as a constructor. I lost the > interchangeability with the wide xids in txid.c, so I provided > U64FromFullTransactionId() (I think that'll be useful for > serialisation over the write too). Should probably, at a later point, introduce a SQL type for it too. > I don't know what to think about the encoding or meaning of non-normal > xids in this thing. I'm not following what you mean by this? > diff --git a/src/backend/access/transam/subtrans.c > b/src/backend/access/transam/subtrans.c > index 4faa21f5aef..fa0847afc81 100644 > --- a/src/backend/access/transam/subtrans.c > +++ b/src/backend/access/transam/subtrans.c > @@ -261,7 +261,7 @@ StartupSUBTRANS(TransactionId oldestActiveXID) > LWLockAcquire(SubtransControlLock, LW_EXCLUSIVE); > > startPage = TransactionIdToPage(oldestActiveXID); > - endPage = TransactionIdToPage(ShmemVariableCache->nextXid); > + endPage = > TransactionIdToPage(XidFromFullTransactionId(ShmemVariableCache->nextFullXid)); These probably need an intermediate variable. > diff --git a/src/backend/access/transam/varsup.c > b/src/backend/access/transam/varsup.c > index 394843f7e91..13020f54d98 100644 > --- a/src/backend/access/transam/varsup.c > +++ b/src/backend/access/transam/varsup.c > @@ -73,7 +73,7 @@ GetNewTransactionId(bool isSubXact) > > LWLockAcquire(XidGenLock, LW_EXCLUSIVE); > > - xid = ShmemVariableCache->nextXid; > + xid = XidFromFullTransactionId(ShmemVariableCache->nextFullXid); It's a bit over the top, I know, but I'd move the conversion to outside the lock. Less because it makes a meaningful efficiency difference, and more because I find it clearer, and easier to reason about if we ever go to atomically incrementing nextFullXid. > @@ -6700,7 +6698,8 @@ StartupXLOG(void) >wasShutdown ? "true" : > "false"))); > ereport(DEBUG1, > (errmsg_internal("next transaction ID: %u:%u; next OID: > %u", > - > checkPoint.nextXidEpoch, checkPoint.nextXid, > + > EpochFromFullTransactionId(checkPoint.nextFullXid), > + > XidFromFullTransactionId(checkPoint.nextFullXid), >checkPoint.nextOid))); I don't think we should continue to expose epochs, but rather go to full xids where appropriate. > diff --git a/src/bin/pg_controldata/pg_controldata.c > b/src/bin/pg_controldata/pg_controldata.c > index 895a51f89d5..f6731bfd28d 100644 > --- a/src/bin/pg_controldata/pg_controldata.c > +++ b/src/bin/pg_controldata/pg_controldata.c > @@ -20,6 +20,7 @@ > > #include > > +#include "access/transam.h" > #include "access/xlog.h" > #include "access/xlog_internal.h" > #include "catalog/pg_control.h" > @@ -256,8 +257,8 @@ main(int argc, char *argv[]) > printf(_("Latest checkpoint's full_page_writes: %s\n"), > ControlFile->checkPointCopy.fullPageWrites ? _("on") : > _("off")); > printf(_("Latest checkpoint's NextXID: %u:%u\n"), > -ControlFile->checkPointCopy.nextXidEpoch, > -ControlFile->checkPointCopy.nextXid); > + > EpochFromFullTransactionId(ControlFile->checkPointCopy.nextFullXid), > + > XidFromFullTransactionId(ControlFile->checkPointCopy.nextFullXid)); > printf(_("Latest checkpoint's NextOID: %u\n"), > ControlFile->checkPointCopy.nextOid); > printf(_("Latest checkpoint's NextMultiXactId: %u\n"), See above re exposing epoch. > --- a/src/include/access/transam.h > +++ b/src/include/access/transam.h > @@ -44,6 +44,32 @@ > #define TransactionIdStore(xid, dest)(*(dest) = (xid)) > #define StoreInvalidTransactionId(dest) (*(dest) = InvalidTransactionId) > > +#define EpochFromFullTransactionId(x)((uint32) ((x).value >> 32)) > +#define XidFromFullTransactionId(x) ((uint32) (x).value) > +#define U64FromFullTransactionId(x) ((x).value) > +#define FullTransactionIdPrecedes(a, b) ((a).value < (b).value) > +#define FullTransactionIdPrecedesOrEquals(a, b) ((a).value <= (b).value) > + > +/* > + * A 64 bit value that contains an epoch and a TransactionId. This is > + * wrapped in a struct to prevent implicit conversion to/from TransactionId. > + * Allowing such conversions seems likely to be error-prone. > + */ > +typedef struct FullTransactionId > +{ > +uint64 value; > +} FullTransactionId; > + > +static inline FullTransactionId > +MakeFullTransactionId(uint32 epoch, TransactionId xid) > +{ > + FullTransactionId result; > + > + result.va
Re: Usage of epoch in txid_current
On Tue, Jul 10, 2018 at 12:08 PM, Andres Freund wrote: > On 2018-07-09 19:56:25 -0400, Tom Lane wrote: >> Andres Freund writes: >> > On 2018-07-10 11:35:59 +1200, Thomas Munro wrote: >> >> I think it's probably a good idea to make it very explicit when moving >> >> between big and small transaction IDs, hence the including of the word >> >> 'big' in variable and function names and the use of a function-like >> >> macro (rather than implicit conversion, which C doesn't give me a good >> >> way to prevent). Otherwise there is a class of bug that is hidden for >> >> the first 2^32 transactions. >> >> > You could have BigTransactionId (maybe renamed to FullTransactionId?) be >> > a struct type. That'd prevent such issues. Most compilers these days >> > should be more than good enough to optimize passing around an 8byte >> > struct by value... >> >> Or, perhaps, use a struct in assert builds and int64 otherwise? >> You could hide the ensuing notational differences in macros. > > That should be doable. But I'd like to check if it's necessary > first. Optimizing passing an 8 byte struct shouldn't be hard for > compilers these days - especially when most things dealing with them are > inline functions. If we find that it's not a problem on contemporary > compilers, it might be worthwhile to use a bit more type safety in other > places too. > > ... > > IOW, exactly the same code generated. Note that the compiler does *not* > see the callsites in this case, i.e. this is platform ABI conformant. I like it. Here's a version that uses a struct named FullTransactionId (yeah, that's a better name, thanks), defined in transam.h because c.h didn't feel right. Client code lost the ability to use operator < directly. I needed to use a static inline function as a constructor. I lost the interchangeability with the wide xids in txid.c, so I provided U64FromFullTransactionId() (I think that'll be useful for serialisation over the write too). I don't know what to think about the encoding or meaning of non-normal xids in this thing. -- Thomas Munro http://www.enterprisedb.com 0001-Track-the-next-xid-using-64-bits-v2.patch Description: Binary data
Re: Usage of epoch in txid_current
On 2018-07-09 17:08:34 -0700, Andres Freund wrote: > Hi, > > On 2018-07-09 19:56:25 -0400, Tom Lane wrote: > > Andres Freund writes: > > > On 2018-07-10 11:35:59 +1200, Thomas Munro wrote: > > >> I think it's probably a good idea to make it very explicit when moving > > >> between big and small transaction IDs, hence the including of the word > > >> 'big' in variable and function names and the use of a function-like > > >> macro (rather than implicit conversion, which C doesn't give me a good > > >> way to prevent). Otherwise there is a class of bug that is hidden for > > >> the first 2^32 transactions. > > > > > You could have BigTransactionId (maybe renamed to FullTransactionId?) be > > > a struct type. That'd prevent such issues. Most compilers these days > > > should be more than good enough to optimize passing around an 8byte > > > struct by value... > > > > Or, perhaps, use a struct in assert builds and int64 otherwise? > > You could hide the ensuing notational differences in macros. > > That should be doable. But I'd like to check if it's necessary > first. Optimizing passing an 8 byte struct shouldn't be hard for > compilers these days - especially when most things dealing with them are > inline functions. If we find that it's not a problem on contemporary > compilers, it might be worthwhile to use a bit more type safety in other > places too. > > Here's what gcc does on O1: > > #include > > typedef struct foo > { > uint64_t id; > } foo; > > extern foo take_foo_struct(foo f, int i); > extern uint64_t take_foo_int(uint64_t id, int i); > > foo take_foo_struct(foo f, int i) > { > f.id += i; > return f; > } > > uint64_t take_foo_int(uint64_t id, int i) > { > id += i; > return id; > } > > results in: > > .file "test.c" > .text > .globl take_foo_struct > .type take_foo_struct, @function > take_foo_struct: > .LFB0: > .cfi_startproc > movslq %esi, %rax > addq%rdi, %rax > ret > .cfi_endproc > .LFE0: > .size take_foo_struct, .-take_foo_struct > .globl take_foo_int > .type take_foo_int, @function > take_foo_int: > .LFB1: > .cfi_startproc > movslq %esi, %rax > addq%rdi, %rax > ret > .cfi_endproc > .LFE1: > .size take_foo_int, .-take_foo_int > .ident "GCC: (Debian 7.3.0-24) 7.3.0" > .section.note.GNU-stack,"",@progbits > > IOW, exactly the same code generated. Note that the compiler does *not* > see the callsites in this case, i.e. this is platform ABI conformant. FWIW, this is required by the x86-64 SYSV ABI. See https://software.intel.com/sites/default/files/article/402129/mpx-linux64-abi.pdf 3.2.3 Parameter Passing. Aggregates with scalar types up to "two eightbytes" are passed via registers. It's also the case for MSVC / windows https://docs.microsoft.com/en-us/cpp/cpp/argument-passing-and-naming-conventions https://docs.microsoft.com/en-us/cpp/build/parameter-passing Small aggregates (8, 16, 32, or 64 bits) are passed in registers. Greetings, Andres Freund
Re: Usage of epoch in txid_current
Hi, On 2018-07-09 19:56:25 -0400, Tom Lane wrote: > Andres Freund writes: > > On 2018-07-10 11:35:59 +1200, Thomas Munro wrote: > >> I think it's probably a good idea to make it very explicit when moving > >> between big and small transaction IDs, hence the including of the word > >> 'big' in variable and function names and the use of a function-like > >> macro (rather than implicit conversion, which C doesn't give me a good > >> way to prevent). Otherwise there is a class of bug that is hidden for > >> the first 2^32 transactions. > > > You could have BigTransactionId (maybe renamed to FullTransactionId?) be > > a struct type. That'd prevent such issues. Most compilers these days > > should be more than good enough to optimize passing around an 8byte > > struct by value... > > Or, perhaps, use a struct in assert builds and int64 otherwise? > You could hide the ensuing notational differences in macros. That should be doable. But I'd like to check if it's necessary first. Optimizing passing an 8 byte struct shouldn't be hard for compilers these days - especially when most things dealing with them are inline functions. If we find that it's not a problem on contemporary compilers, it might be worthwhile to use a bit more type safety in other places too. Here's what gcc does on O1: #include typedef struct foo { uint64_t id; } foo; extern foo take_foo_struct(foo f, int i); extern uint64_t take_foo_int(uint64_t id, int i); foo take_foo_struct(foo f, int i) { f.id += i; return f; } uint64_t take_foo_int(uint64_t id, int i) { id += i; return id; } results in: .file "test.c" .text .globl take_foo_struct .type take_foo_struct, @function take_foo_struct: .LFB0: .cfi_startproc movslq %esi, %rax addq%rdi, %rax ret .cfi_endproc .LFE0: .size take_foo_struct, .-take_foo_struct .globl take_foo_int .type take_foo_int, @function take_foo_int: .LFB1: .cfi_startproc movslq %esi, %rax addq%rdi, %rax ret .cfi_endproc .LFE1: .size take_foo_int, .-take_foo_int .ident "GCC: (Debian 7.3.0-24) 7.3.0" .section.note.GNU-stack,"",@progbits IOW, exactly the same code generated. Note that the compiler does *not* see the callsites in this case, i.e. this is platform ABI conformant. Greetings, Andres Freund
Re: Usage of epoch in txid_current
Andres Freund writes: > On 2018-07-10 11:35:59 +1200, Thomas Munro wrote: >> I think it's probably a good idea to make it very explicit when moving >> between big and small transaction IDs, hence the including of the word >> 'big' in variable and function names and the use of a function-like >> macro (rather than implicit conversion, which C doesn't give me a good >> way to prevent). Otherwise there is a class of bug that is hidden for >> the first 2^32 transactions. > You could have BigTransactionId (maybe renamed to FullTransactionId?) be > a struct type. That'd prevent such issues. Most compilers these days > should be more than good enough to optimize passing around an 8byte > struct by value... Or, perhaps, use a struct in assert builds and int64 otherwise? You could hide the ensuing notational differences in macros. regards, tom lane
Re: Usage of epoch in txid_current
On 2018-07-10 11:35:59 +1200, Thomas Munro wrote: > I played around with this idea yesterday. Experiment-grade patch > attached. Cool! > I think it's probably a good idea to make it very explicit when moving > between big and small transaction IDs, hence the including of the word > 'big' in variable and function names and the use of a function-like > macro (rather than implicit conversion, which C doesn't give me a good > way to prevent). Otherwise there is a class of bug that is hidden for > the first 2^32 transactions. You could have BigTransactionId (maybe renamed to FullTransactionId?) be a struct type. That'd prevent such issues. Most compilers these days should be more than good enough to optimize passing around an 8byte struct by value... Greetings, Andres Freund
Re: Usage of epoch in txid_current
On Fri, Dec 8, 2017 at 3:36 PM, Amit Kapila wrote: > On Wed, Dec 6, 2017 at 11:26 PM, Andres Freund wrote: >>> Either way, it is not clear to me how we will keep it >>> updated after recovery. Right now, the mechanism is quite simple, at >>> the beginning of a recovery we take the value of nextXid from >>> checkpoint record and then if any xlog record indicates xid that >>> follows nextXid, we advance it. Here, the point to note is that we >>> take the xid from the WAL record (which means that it assumes xids are >>> non-contiguous or some xids are consumed without being logged) and >>> increment it. Unless we plan to change something in that logic (like >>> storing 64-bit xids in WAL records), it is not clear to me how to make >>> it work. OTOH, recovering value of epoch which increments only at >>> wraparound seems fairly straightforward as described in my previous >>> email. >> >> I think it should be fairly simple if simply add the 64bit xid to the >> existing clog extension WAL records. > > IIUC, you mean to say that we should log the 64bit xid value in > CLOG_ZEROPAGE record while extending clog and that too we can do only > at wraparound. Now, maybe doing it every time also doesn't hurt, but > I think doing it at wraparound should be sufficient. Can you please elaborate on why clog redo in particular would need to use 64 bit xids? Is the 64 bit xid not derivable from the 32 bit xid in the WAL + the current value of a new 64 bit next xid? > Just to be clear, I am not planning to pursue writing a patch for this > at the moment. So, if anybody else is interested or if Andres wants > to write it, I can help in the review. I played around with this idea yesterday. Experiment-grade patch attached. Approach: 1. Introduce a new type BigTransactionId (better names welcome). 2. Change ShmemVariableCache->nextXid to ShmemVariableCache->nextBigXid. 3. Change checkpoints to use nextBigXid too. 4. Change ReadNewTransactionId() to ReadNextBigTransactionId(). 5. Remove GetNextXidAndEpoch() as it's now redundant. 6. Everywhere that was reading ShmemVariableCache->nextXid or calling ReadNewTransactionId() but actually needs an xid now uses an explicit conversion macro XidFromBigTransactionId(). 7. Everywhere that was writing ShmemVariableCache->nextXid but only has an xid now goes through a new function AdvanceNextBigTransactionIdPast(xid), to handle recovery (since WAL records have xids in them as mentioned by Amit). I think it's probably a good idea to make it very explicit when moving between big and small transaction IDs, hence the including of the word 'big' in variable and function names and the use of a function-like macro (rather than implicit conversion, which C doesn't give me a good way to prevent). Otherwise there is a class of bug that is hidden for the first 2^32 transactions. The logic in CreateCheckPoint() that assumed that there could be only one wraparound per checkpoint is gone (the problem reported in this thread). Note that AdvanceNextBigTransactionIdPast() contains logic that infers an epoch, which is in some ways similar to what the removed code was doing, but in this case I think all callers must have an xid from the same or next epoch. I'm probably missing a few details... this is only a first swing at the problem. It does pass check-world and various xid wraparound tests I came up with. Clearly big xids could spread to more places than I show here. Do you see problems with this approach or have better ideas? -- Thomas Munro http://www.enterprisedb.com 0001-Track-the-next-xid-using-64-bits.patch Description: Binary data
Re: Usage of epoch in txid_current
On Wed, Dec 6, 2017 at 11:26 PM, Andres Freund wrote: > >> Either way, it is not clear to me how we will keep it >> updated after recovery. Right now, the mechanism is quite simple, at >> the beginning of a recovery we take the value of nextXid from >> checkpoint record and then if any xlog record indicates xid that >> follows nextXid, we advance it. Here, the point to note is that we >> take the xid from the WAL record (which means that it assumes xids are >> non-contiguous or some xids are consumed without being logged) and >> increment it. Unless we plan to change something in that logic (like >> storing 64-bit xids in WAL records), it is not clear to me how to make >> it work. OTOH, recovering value of epoch which increments only at >> wraparound seems fairly straightforward as described in my previous >> email. > > I think it should be fairly simple if simply add the 64bit xid to the > existing clog extension WAL records. > IIUC, you mean to say that we should log the 64bit xid value in CLOG_ZEROPAGE record while extending clog and that too we can do only at wraparound. Now, maybe doing it every time also doesn't hurt, but I think doing it at wraparound should be sufficient. Just to be clear, I am not planning to pursue writing a patch for this at the moment. So, if anybody else is interested or if Andres wants to write it, I can help in the review. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: Usage of epoch in txid_current
Hi, On 2017-12-06 17:39:09 +0530, Amit Kapila wrote: > We are using ShmemVariableCache->nextXid at many places so always > converting/truncating it to 32-bit number before using seems slightly > awkward, so we can think of using a separate nextBigXid 64bit number > as well. -many It's not actually that many places. And a lot of them would should be updated anyway, to be epoch aware. Let's not add this kind of crummyness to avoid a few truncating casts here and there. > Either way, it is not clear to me how we will keep it > updated after recovery. Right now, the mechanism is quite simple, at > the beginning of a recovery we take the value of nextXid from > checkpoint record and then if any xlog record indicates xid that > follows nextXid, we advance it. Here, the point to note is that we > take the xid from the WAL record (which means that it assumes xids are > non-contiguous or some xids are consumed without being logged) and > increment it. Unless we plan to change something in that logic (like > storing 64-bit xids in WAL records), it is not clear to me how to make > it work. OTOH, recovering value of epoch which increments only at > wraparound seems fairly straightforward as described in my previous > email. I think it should be fairly simple if simply add the 64bit xid to the existing clog extension WAL records. Greetings, Andres Freund
Re: Usage of epoch in txid_current
On Tue, Dec 5, 2017 at 11:15 PM, Andres Freund wrote: > On 2017-12-05 16:21:27 +0530, Amit Kapila wrote: > >> Okay, it is quite strange that we haven't discovered this problem till >> now. I think we should do something to fix it. One idea is that we >> track epoch change in shared memory (probably in the same data >> structure (VariableCacheData) where we track nextXid). We need to >> increment it when the xid wraparound during xid allocation (in >> GetNewTransactionId). Also, we need to make it persistent as which >> means we need to log it in checkpoint xlog record and we need to write >> a separate xlog record for the epoch change. > > I think it makes a fair bit of sense to not do the current crufty > tracking of xid epochs. I don't really how we got there, but it doesn't > make terribly much sense. Don't think we need additional WAL logging > though - we should be able to piggyback this onto the already existing > clog logging. > > I kinda wonder if we shouldn't just track nextXid as a 64bit integer > internally, instead of bothering with tracking the epoch > separately. Then we can "just" truncate it in the cases where it's > stored in space constrained places etc. > We are using ShmemVariableCache->nextXid at many places so always converting/truncating it to 32-bit number before using seems slightly awkward, so we can think of using a separate nextBigXid 64bit number as well. Either way, it is not clear to me how we will keep it updated after recovery. Right now, the mechanism is quite simple, at the beginning of a recovery we take the value of nextXid from checkpoint record and then if any xlog record indicates xid that follows nextXid, we advance it. Here, the point to note is that we take the xid from the WAL record (which means that it assumes xids are non-contiguous or some xids are consumed without being logged) and increment it. Unless we plan to change something in that logic (like storing 64-bit xids in WAL records), it is not clear to me how to make it work. OTOH, recovering value of epoch which increments only at wraparound seems fairly straightforward as described in my previous email. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: Usage of epoch in txid_current
Stephen Frost writes: > * Andres Freund (and...@anarazel.de) wrote: >> I kinda wonder if we shouldn't just track nextXid as a 64bit integer >> internally, instead of bothering with tracking the epoch >> separately. Then we can "just" truncate it in the cases where it's >> stored in space constrained places etc. > This sounds reasonable to me, at least, but I've not been in these > depths much. +1 ... I think the reason it's like that is simply that nobody's revisited the XID generator since we decided to require 64-bit integer support. We'd need this for support of true 64-bit XIDs, too, though I'm unsure whether that project is going anywhere anytime soon. In any case it seems like a separable subset of that work. regards, tom lane
Re: Usage of epoch in txid_current
On December 5, 2017 10:01:43 AM PST, Stephen Frost wrote: >Andres, > >* Andres Freund (and...@anarazel.de) wrote: >> I think it makes a fair bit of sense to not do the current crufty >> tracking of xid epochs. I don't really how we got there, but it >doesn't >> make terribly much sense. Don't think we need additional WAL logging >> though - we should be able to piggyback this onto the already >existing >> clog logging. > >Don't you mean xact logging? ;) No. We log a WAL record at clog boundaries. Wraparounds have to be at one. We could just include the 64 bit xid there and would have reliable tracking. Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Re: Usage of epoch in txid_current
Andres, * Andres Freund (and...@anarazel.de) wrote: > On 2017-12-05 16:21:27 +0530, Amit Kapila wrote: > > On Tue, Dec 5, 2017 at 2:49 PM, Alexander Korotkov > > wrote: > > > On Tue, Dec 5, 2017 at 6:19 AM, Amit Kapila > > > wrote: > > >> > > >> Currently, txid_current and friends export a 64-bit format of > > >> transaction id that is extended with an “epoch” counter so that it > > >> will not wrap around during the life of an installation. The epoch > > >> value it uses is based on the epoch that is maintained by checkpoint > > >> (aka only checkpoint increments it). > > >> > > >> Now if epoch changes multiple times between two checkpoints > > >> (practically the chances of this are bleak, but there is a theoretical > > >> possibility), then won't the computation of xids will go wrong? > > >> Basically, it can give the same value of txid after wraparound if the > > >> checkpoint doesn't occur between the two calls to txid_current. > > > > > > > > > AFAICS, yes, if epoch changes multiple times between two checkpoints, then > > > computation will go wrong. And it doesn't look like purely theoretical > > > possibility for me, because I think I know couple of instances of the edge > > > of this... > > I think it's not terribly likely principle, due to the required WAL > size. You need at least a commit record for each of 4 billion > transactions. Each commit record is at least 24bytes long, and in a > non-artificial scenario you additionally would have a few hundred bytes > of actual content of WAL. So we're talking about a distance of at least > 0.5-2TB within a single checkpoint here. Not impossible, but not likely > either. At the bottom end, with a 30-minute checkpoint, that's about 300MB/s. Certainly quite a bit and we might have trouble getting there for other reasons, but definitely something that can be accomplished with even a single SSD these days. > > Okay, it is quite strange that we haven't discovered this problem till > > now. I think we should do something to fix it. One idea is that we > > track epoch change in shared memory (probably in the same data > > structure (VariableCacheData) where we track nextXid). We need to > > increment it when the xid wraparound during xid allocation (in > > GetNewTransactionId). Also, we need to make it persistent as which > > means we need to log it in checkpoint xlog record and we need to write > > a separate xlog record for the epoch change. > > I think it makes a fair bit of sense to not do the current crufty > tracking of xid epochs. I don't really how we got there, but it doesn't > make terribly much sense. Don't think we need additional WAL logging > though - we should be able to piggyback this onto the already existing > clog logging. Don't you mean xact logging? ;) > I kinda wonder if we shouldn't just track nextXid as a 64bit integer > internally, instead of bothering with tracking the epoch > separately. Then we can "just" truncate it in the cases where it's > stored in space constrained places etc. This sounds reasonable to me, at least, but I've not been in these depths much. Thanks! Stephen signature.asc Description: Digital signature
Re: Usage of epoch in txid_current
On 2017-12-05 16:21:27 +0530, Amit Kapila wrote: > On Tue, Dec 5, 2017 at 2:49 PM, Alexander Korotkov > wrote: > > On Tue, Dec 5, 2017 at 6:19 AM, Amit Kapila wrote: > >> > >> Currently, txid_current and friends export a 64-bit format of > >> transaction id that is extended with an “epoch” counter so that it > >> will not wrap around during the life of an installation. The epoch > >> value it uses is based on the epoch that is maintained by checkpoint > >> (aka only checkpoint increments it). > >> > >> Now if epoch changes multiple times between two checkpoints > >> (practically the chances of this are bleak, but there is a theoretical > >> possibility), then won't the computation of xids will go wrong? > >> Basically, it can give the same value of txid after wraparound if the > >> checkpoint doesn't occur between the two calls to txid_current. > > > > > > AFAICS, yes, if epoch changes multiple times between two checkpoints, then > > computation will go wrong. And it doesn't look like purely theoretical > > possibility for me, because I think I know couple of instances of the edge > > of this... I think it's not terribly likely principle, due to the required WAL size. You need at least a commit record for each of 4 billion transactions. Each commit record is at least 24bytes long, and in a non-artificial scenario you additionally would have a few hundred bytes of actual content of WAL. So we're talking about a distance of at least 0.5-2TB within a single checkpoint here. Not impossible, but not likely either. > Okay, it is quite strange that we haven't discovered this problem till > now. I think we should do something to fix it. One idea is that we > track epoch change in shared memory (probably in the same data > structure (VariableCacheData) where we track nextXid). We need to > increment it when the xid wraparound during xid allocation (in > GetNewTransactionId). Also, we need to make it persistent as which > means we need to log it in checkpoint xlog record and we need to write > a separate xlog record for the epoch change. I think it makes a fair bit of sense to not do the current crufty tracking of xid epochs. I don't really how we got there, but it doesn't make terribly much sense. Don't think we need additional WAL logging though - we should be able to piggyback this onto the already existing clog logging. I kinda wonder if we shouldn't just track nextXid as a 64bit integer internally, instead of bothering with tracking the epoch separately. Then we can "just" truncate it in the cases where it's stored in space constrained places etc. Greetings, Andres Freund
Re: Usage of epoch in txid_current
On Tue, Dec 5, 2017 at 2:49 PM, Alexander Korotkov wrote: > On Tue, Dec 5, 2017 at 6:19 AM, Amit Kapila wrote: >> >> Currently, txid_current and friends export a 64-bit format of >> transaction id that is extended with an “epoch” counter so that it >> will not wrap around during the life of an installation. The epoch >> value it uses is based on the epoch that is maintained by checkpoint >> (aka only checkpoint increments it). >> >> Now if epoch changes multiple times between two checkpoints >> (practically the chances of this are bleak, but there is a theoretical >> possibility), then won't the computation of xids will go wrong? >> Basically, it can give the same value of txid after wraparound if the >> checkpoint doesn't occur between the two calls to txid_current. > > > AFAICS, yes, if epoch changes multiple times between two checkpoints, then > computation will go wrong. And it doesn't look like purely theoretical > possibility for me, because I think I know couple of instances of the edge > of this... > Okay, it is quite strange that we haven't discovered this problem till now. I think we should do something to fix it. One idea is that we track epoch change in shared memory (probably in the same data structure (VariableCacheData) where we track nextXid). We need to increment it when the xid wraparound during xid allocation (in GetNewTransactionId). Also, we need to make it persistent as which means we need to log it in checkpoint xlog record and we need to write a separate xlog record for the epoch change. Thoughts? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: Usage of epoch in txid_current
Hi! On Tue, Dec 5, 2017 at 6:19 AM, Amit Kapila wrote: > Currently, txid_current and friends export a 64-bit format of > transaction id that is extended with an “epoch” counter so that it > will not wrap around during the life of an installation. The epoch > value it uses is based on the epoch that is maintained by checkpoint > (aka only checkpoint increments it). > > Now if epoch changes multiple times between two checkpoints > (practically the chances of this are bleak, but there is a theoretical > possibility), then won't the computation of xids will go wrong? > Basically, it can give the same value of txid after wraparound if the > checkpoint doesn't occur between the two calls to txid_current. > AFAICS, yes, if epoch changes multiple times between two checkpoints, then computation will go wrong. And it doesn't look like purely theoretical possibility for me, because I think I know couple of instances of the edge of this... Am I missing something which ensures that epoch gets incremented at or > after wraparound? > I've checked the code, and it doesn't look for me that there is something like this. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Usage of epoch in txid_current
Hi, Currently, txid_current and friends export a 64-bit format of transaction id that is extended with an “epoch” counter so that it will not wrap around during the life of an installation. The epoch value it uses is based on the epoch that is maintained by checkpoint (aka only checkpoint increments it). Now if epoch changes multiple times between two checkpoints (practically the chances of this are bleak, but there is a theoretical possibility), then won't the computation of xids will go wrong? Basically, it can give the same value of txid after wraparound if the checkpoint doesn't occur between the two calls to txid_current. Am I missing something which ensures that epoch gets incremented at or after wraparound? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com