On Mon, Nov 9, 2015 at 2:08 PM, Konstantin Knizhnik < k.knizh...@postgrespro.ru> wrote: > > > On 09.11.2015 07:46, Amit Kapila wrote: > > I think so. Basically DLM should be responsible for maintaining > all the lock information which inturn means that any backend process > that needs to acquire/release lock needs to interact with DLM, without that > I don't think even global deadlock detection can work (to detect deadlocks > among all the nodes, it needs to know the lock info of all nodes). > > > I hope that it will not needed, otherwise it will add significant > performance penalty. > Unless I missed something, locks can still managed locally, but we need > DLM to detect global deadlocks. >
How will you check lock conflicts, example Process A on node-1 tries to acquire lock on object-1, but Process B from node-2 already holds a conflicting lock on the object-1. Now if try to think in a way that we will have an entry of lock request from node-2 in node-1, then I think it will be difficult to manage and release locks. > > > >> 3. Should DLM be implemented by separate process or should it be part of >> arbiter (dtmd). >> > > That's important decision. I think it will depend on which kind of design > we choose for distributed transaction manager (arbiter based solution or > non-arbiter based solution, something like tsDTM). I think DLM should be > separate, else arbiter will become hot-spot with respect to contention. > > > There are pros and contras. > Pros for integrating DLM in DTM: > 1. DTM arbiter has information about local to global transaction ID > mapping which may be needed to DLM > 2. If my assumptions about DLM are correct, then it will be accessed > relatively rarely and should not cause significant > impact on performance. > > Yeah, if the usage of DLM is relatively less, then it can make sense to club them, but otherwise it doesn't make much sense. > Contras: > 1. tsDTM doesn't need centralized arbiter but still needs DLM > 2. Logically DLM and DTM are independent components > > > > Can you please explain more about tsDTM approach, how timestamps > are used, what is exactly CSN (is it Commit Sequence Number) and > how it is used in prepare phase? IS CSN used as timestamp? > Is the coordinator also one of the PostgreSQL instance? > > > In tsDTM approach system time (in microseconds) is used as CSN (commit > sequence number). > We also enforce that assigned CSN is unique and monotonic within > PostgreSQL instance. > CSN are assigned locally and do not require interaction with some other > cluster nodes. > This is why in theory tsDTM approach should provide good scalability. > Okay, but won't checking visibility of tuples need transaction to CSN mapping? > > From my point of view there are two different scenarios: > 1. When most transactions are local and only few of them are global (for > example most operation in branch of the back are > performed with accounts of clients of this branch, but there are few > transactions involved accounts from different branches). > 2. When most or all transactions are global. > > It seems to me that first approach is more popular in real life and > actually good performance of distributed system can be achieved only in > case when most transaction are local (involves only one node). There are > several approaches allowing to optimize local transactions. For example > once used in SAP HANA ( > http://pi3.informatik.uni-mannheim.de/~norman/dsi_jour_2014.pdf) > We have also DTM implementation based on this approach but it is not yet > working. > > If most of transaction are global, them affect random subsets of cluster > nodes (so it is not possible to logically split cluster into groups of > tightly coupled nodes) and number of nodes is not very large (<10) then I > do not think that there can be better alternative (from performance point > of view) than centralized arbiter. > > I am slightly confused with above statement, it seems for both the scenario's you seem to be suggesting to have centrailized arbiter. I think when most transactions are global having centrailized arbiter might not be good solution, especially if the nodes in cluster are large. > But it is only my speculations and it will be really very interesting for > me to know access patterns of real customers, using distributed systems. > > I think it is better if the solution is optimized for all kind of scenario's, because once the solution is adopted by PostgreSQL, it will be very difficult to change it. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com