On Tuesday, November 17, 2015 12:43 AM, konstantin knizhnik <k.knizh...@postgrespro.ru> wrote: > On Nov 16, 2015, at 11:21 PM, Kevin Grittner wrote:
>> If you are saying that DTM tries to roll back a transaction after >> any participating server has entered the RecordTransactionCommit() >> critical section, then IMO it is broken. Full stop. That can't >> work with any reasonable semantics as far as I can see. > > DTM is not trying to rollback committed transaction. > What he tries to do is to hide this commit. > As I already wrote, the idea was to implement "lightweight" 2PC > because prepared transactions mechanism in PostgreSQL adds too much > overhead and cause soe problems with recovery. The point remains that there must be *some* "point of no return" beyond which rollback (or "hiding" is not possible). Until this point, all heavyweight locks held by the transaction must be maintained without interruption, data modification of the transaction must not be visible, and any attempt to update or delete data updated or deleted by the transaction must block or throw an error. It sounds like you are attempting to move the point at which this "point of no return" is, but it isn't as clear as I would like. It seems like all participating nodes are responsible for notifying the arbiter that they have completed, and until then the arbiter gets involved in every visibility check, overriding the "normal" value? > The transaction is normally committed in xlog, so that it can > always be recovered in case of node fault. > But before setting correspondent bit(s) in CLOG and releasing > locks we first contact arbiter to get global status of transaction. > If it is successfully locally committed by all nodes, then > arbiter approves commit and commit of transaction normally > completed. > Otherwise arbiter rejects commit. In this case DTM marks > transaction as aborted in CLOG and returns error to the client. > XLOG is not changed and in case of failure PostgreSQL will try to > replay this transaction. > But during recovery it also tries to restore transaction status > in CLOG. > And at this placeDTM contacts arbiter to know status of > transaction. > If it is marked as aborted in arbiter's CLOG, then it wiull be > also marked as aborted in local CLOG. > And according to PostgreSQL visibility rules no other transaction > will see changes made by this transaction. If a node goes through crash and recovery after it has written its commit information to xlog, how are its heavyweight locks, etc., maintained throughout? For example, does each arbiter node have the complete set of heavyweight locks? (Basically, all the information which can be written to files in pg_twophase must be held somewhere by all arbiter nodes, and used where appropriate.) If a participating node is lost after some other nodes have told the arbiter that they have committed, and the lost node will never be able to indicate that it is committed or rolled back, what is the mechanism for resolving that? >>> We can not just call elog(ERROR,...) in SetTransactionStatus >>> implementation because inside critical section it cause Postgres >>> crash with panic message. So we have to remember that transaction is >>> rejected and report error later after exit from critical section: >> >> I don't believe that is a good plan. You should not enter the >> critical section for recording that a commit is complete until all >> the work for the commit is done except for telling the all the >> servers that all servers are ready. > > It is good point. > May be it is the reason of performance scalability problems we > have noticed with DTM. Well, certainly the first phase of two-phase commit can take place in parallel, and once that is complete then the second phase (commit or rollback of all the participating prepared transactions) can take place in parallel. There is no need to serialize that. > Sorry, some clarification. > We get 10x slowdown of performance caused by 2pc on very heavy > load on the IBM system with 256 cores. > At "normal" servers slowdown of 2pc is smaller - about 2x. That suggests some contention point, probably on spinlocks. Were you able to identify the particular hot spot(s)? On Tuesday, November 17, 2015 3:09 AM, konstantin knizhnik <k.knizh...@postgrespro.ru> wrote: > On Nov 17, 2015, at 10:44 AM, Amit Kapila wrote: >> I think the general idea is that if Commit is WAL logged, then the >> operation is considered to committed on local node and commit should >> happen on any node, only once prepare from all nodes is successful. >> And after that transaction is not supposed to abort. But I think you are >> trying to optimize the DTM in some way to not follow that kind of protocol. > > DTM is still following 2PC protocol: > First transaction is saved in WAL at all nodes and only after it > commit is completed at all nodes. So, essentially you are treating the traditional commit point as phase 1 in a new approach to two-phase commit, and adding another layer to override normal visibility checking and record locks (etc.) past that point? > We try to avoid maintaining of separate log files for 2PC (as now > for prepared transactions) and do not want to change logic of > work with WAL. > > DTM approach is based on the assumption that PostgreSQL CLOG and > visibility rules allows to "hide" transaction even if it is > committed in WAL. I see where you could get a performance benefit from not recording (and cleaning up) persistent state for a transaction in the pg_twophase directory between the time the transaction is prepared and when it is committed (which should normally be a very short period of time, but must survive crashes, and communication failures). Essentially you are trying to keep that in RAM instead, and counting on multiple processes at different locations redundantly (and synchronously) storing this data to ensure persistence, rather than writing the data to disk files when are deleted as soon as the prepared transaction is committed or rolled back. I wonder whether it might not be safer to just do that -- rather than trying to develop a whole new way of implementing two-phase commit, just come up with a new way to persist the information which must survive between the prepared and the later commit or rollback of the prepared transaction. Essentially, provide hooks for persisting the data when preparing a transaction, and the arbiter would set the hooks to a function to send the data there. Likewise with the release of the information (normally a very small fraction of a second later). The rest of the arbiter code becomes a distributed transaction manager. It's not a trivial job to get that right, but at least it is a very well-understood problem, and is not likely to take as long to develop and shake out tricky data-eating bugs. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company