[ https://issues.apache.org/jira/browse/IGNITE-2854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15220558#comment-15220558 ]
Andrey Gura commented on IGNITE-2854: ------------------------------------- Algorithm is changed in order to limit amount of requested info on keys basis: # When {{GridDhtLockFuture}} is timed out we run deadlock detection. As input we have near transaction ID and pending keys that wasn't locked by this transaction. # Deadlock detector maps pending keys on primary nodes (on first step it is always current node). As results deadlock detector have set of candidates represented by pairs {{UUID -> List<IgniteTxKey>}} # For each candidate (if exists) deadlock detector send request to node by its {{UUID}}. Request contains keys from candidates pairs. If thre is no candidate process finishes. # Selected candidate removed from candidate set, node and all keys marked as handled. # Node processes request and returns all mvcc candidates that hold or waiting for *passed keys* and all other keys involved into transactions that associated with found mvcc candidates. # Deadlock detector builds wat-for-graph (or updates it) and tries to find cycle on it using input transaction ID as first vertex of graph. # If cycle is found then deadlock detection stops (deadlock found). # If cycle isn't found then deadlock detector maps obtained keys to primary nodes and near nodes. Candidates set is updated. # Process continues from step 3 Properties of this implementation: * Always will found at most one deadlock for given timed out transaction. * Always will detect deadlock which cause an user transaction timeout (if exist). Step 6. * Detection will finish as soon as possible because after each update of wait-for-graph it can find deadlock. * Detection minimize the network utilisation. Step 5. Implementation requires some test coverage for different cases: * Different nodes that start deadlocked transaction (all from one (clinet/server), all from different (client/server), mix) * Different nodes that start transaction with timeout (server/client near node, server/client non near node) * More then one cycle (waiting for each other or independent) * Transitive transactions waiting for each other and eventually waiting for deadlocked transaction. Problems to be solved: * Deadlock detector behaviour in case of topologu changes and transactions remapping. * Deadlock detector behaviour in case of remote request failed. > Need to implement deadlock detection > ------------------------------------ > > Key: IGNITE-2854 > URL: https://issues.apache.org/jira/browse/IGNITE-2854 > Project: Ignite > Issue Type: New Feature > Components: cache > Affects Versions: 1.5.0.final > Reporter: Valentin Kulichenko > Assignee: Andrey Gura > Fix For: 1.6 > > > Currently, if transactional deadlock occurred, there is no easy way to find > out which locks were reordered. > We need to add a mechanism that will collect information about awating > candidates, analyze it and show guilty keys. Most likely this should be > implemented with the help of custom discovery message. > In addition we should automatically execute this mechanism if transaction > times out and add information to timeout exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)