On Sun, 8 Feb 2026 at 20:21, Andres Freund <[email protected]> wrote:
> > The patch also adds a pointer to the open relation to IO handles, to be > able > to know what relkind a buffer being read into is. But that's nonsensical, > the The previous rebase was not done correctly. I have a fix, but I did not provide it since I wanted to remake the patch set. Let me clear the situation with TransacionID and FullTransactionID types. Let me clear the situation with TransacionID and FullTransactionID types. The patch is designed in a way to minimize the difference, as it is already huge. Of course, after making TransacionID 64-bit, type FullTransactionID lost it's meaning. We only need 32-bit type for "on-disk" storage; everything else should be 64-bit. AFAICS, the FullTransactionID type was added to minimize problems with transaction counter wraparound in situations where it is not acceptable. However, if we had 64-bit xids from the start, we wouldn't have needed to do this. As a result, it is unclear to me why continuing to invest in this type is important, despite the fact that it is not. Maybe it's time to cut your losses and put them behind. While I am not against utilizing the FullTransactionID type everywhere, it will significantly complicate the patch by requiring all xids to be transformed to FullTransactionID. On Sun, 8 Feb 2026 at 04:30, Robert Haas <[email protected]> wrote: > > I don't think the page header is the right thing, because that applies > to every AM, including both table AMs and index AMs. I'd say that some > of the things we already have in the page header don't really make > sense there -- in particular, pd_prune_xid, which is heap-specific. We > can't change that at this point, but we shouldn't make it worse. > > Correct. This is why we utilize the existing mechanism of allocating a special area for heap pages. There have been many comments over the past few days. Thank you very much. Let's structure things a bit, shall we? The must items we want to get: 1) The cluster must be upgradable. We coundn't add 64-bit XIDs and force users to make dump/restore. 2) Thus the lazy convertion is the way to go. And now I would like to figure this out a fundamental question: how do we store 64-bit transactions identifiers on disk. === OPT #1 - "real 64-bit XIDs" === After many years of working with this patch, I come to the conclusion that the best implementation option is to use full 64-bit transaction identifiers in tuples. And I do understand that this is an unpopular opinion. But the truth is, this option was always rejected without further investigation, as it automatically leads to increased disk space consumption. Yes, the header of each tuple will become larger, but if they have a meaningful size, the relative increase shouldn't be significant. Pros: + More understandable logic compared to other options. + No need to re-calculate "real" XID after acquiring page lock. + There is no limit on the XIDs epoch on a single page. + After adding the AIO, the page conversion code must be run in critical section. So, we need to find all the XIDs of the tuples on a given page and calculate appropriate base or epoch, no matter how you gonna call it. But some update transactions may be hidded inside multitransactions. Thus, we have to dig down int mxids, but it can not been done in a critical section. Cons: - Increase disk usage. - Theoretically, there may be some performance degradation, since total amount of IO would be bigger compare to the opt #2. Overall, I'm willing to try writing this patch. I'd be interested to see how much of a real performance improvement it would make. What's holding me back is that I wouldn't want to do this kind of work just for fun without any chance of committing it. === OPT #2 - "64-bit XIDs with base/epoch" === Here we somehow split 64-bit transaction into parts to store in optimal (in terms of disk usage) way. I think, that the existing mechanism with page special area is well suited here. Pros: + Reduce disk space usage. Cons: - Significantly increases the code complexity. We have to sync tuple XID with the page base on every lock aqure. - Where the complexity, where is a bugs. I assure you, I have dealt with them a lot. - Epoch limited page. - When converting, you need to go through all the transaction IDs on the page, access multi-transactions if necessary, and do this in the critical section. Finally, option 2 splits into two 1) Use 64-bit base. 2) Use 32-bit epoch for every page. We can do this relatively easily if we limit growth of XIDs by 2^63. To sum it up: 1) Does anyone else think that the full-fledged 64-bit XIDs approach is possible? 2) If not, what type of base should we use? 64 or 32 base? -- Best regards, Maxim Orlov.
