On Wed, Jul 18, 2018 at 10:08 AM, Tomas Vondra <tomas.von...@2ndquadrant.com> wrote: > The problem is you don't know if a transaction does DDL sometime later, in > the part that you might not have decoded yet (or perhaps concurrently with > the decoding). So I don't see how you could easily exclude such transactions > from the decoding ...
One idea is that maybe the running transaction could communicate with the decoding process through shared memory. For example, suppose that before you begin decoding an ongoing transaction, you have to send some kind of notification to the process saying "hey, I'm going to start decoding you" and wait for that process to acknowledge receipt of that message (say, at the next CFI). Once it acknowledges receipt, you can begin decoding. Then, we're guaranteed that the foreground process knows when that it must be careful about catalog changes. If it's going to make one, it sends a note to the decoding process and says, hey, sorry, I'm about to do catalog changes, please pause decoding. Once it gets an acknowledgement that decoding has paused, it continues its work. Decoding resumes after commit (or maybe earlier if it's provably safe). > But isn't this (delaying the catalog cleanup etc.) pretty much the original > approach, implemented by the original patch? Which you also claimed to be > unworkable, IIRC? Or how is this addressing the problems with broken HOT > chains, for example? Those issues were pretty much the reason why we started > looking at alternative approaches, like delaying the abort ... I don't think so. The original approach, IIRC, was to decode after the abort had already happened, and my objection was that you can't rely on the state of anything at that point. The approach here is to wait until the abort is in progress and then basically pause it while we try to read stuff, but that seems similarly riddled with problems. The newer approach could be considered an improvement in that you've tried to get your hands around the problem at an earlier point, but it's not early enough. To take a very rough analogy, the original approach was like trying to install a sprinkler system after the building had already burned down, while the new approach is like trying to install a sprinkler system when you notice that the building is on fire. But we need to install the sprinkler system in advance. That is, we need to make all of the necessary preparations for a possible abort before the abort occurs. That could perhaps be done by arranging things so that decoding after an abort is actually still safe (e.g. by making it look to certain parts of the system as though the aborted transaction is still in progress until decoding no longer cares about it) or by making sure that we are never decoding at the point where a problematic abort happens (e.g. as proposed above, pause decoding before doing dangerous things). > I wonder if disabling HOT on catalogs with wal_level=logical would be an > option here. I'm not sure how important HOT on catalogs is, in practice (it > surely does not help with the typical catalog bloat issue, which is > temporary tables, because that's mostly insert+delete). I suppose we could > disable it only when there's a replication slot indicating support for > decoding of in-progress transactions, so that you still get HOT with plain > logical decoding. Are you talking about HOT updates, or HOT pruning? Disabling the former wouldn't help, and disabling the latter would break VACUUM, which assumes that any tuple not removed by HOT pruning is not a dead tuple (cf. 1224383e85eee580a838ff1abf1fdb03ced973dc, which was caused by a case where that wasn't true). > I'm sure there will be other obstacles, not just the HOT chain stuff, but it > would mean one step closer to a solution. Right. Here's a crazy idea. Instead of disabling HOT pruning or anything like that, have the decoding process advertise the XID of the transaction being decoded as its own XID in its PGPROC. Also, using magic, acquire a lock on that XID even though the foreground transaction already holds that lock in exclusive mode. Fix the code (and I'm pretty sure there is some) that relies on an XID appearing in the procarray only once to no longer make that assumption. Then, if the foreground process aborts, it will appear to the rest of the system that the it's still running, so HOT pruning won't remove the XID, CLOG won't get truncated, people who are waiting to update a tuple updated by the aborted transaction will keep waiting, etc. We know that we do the right thing for running transactions, so if we make this aborted transaction look like it is running and are sufficiently convincing about the way we do that, then it should also work. That seems more likely to be able to be made robust than addressing specific problems (e.g. a tuple might get removed!) one by one. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company