[ https://issues.apache.org/jira/browse/CLOUDSTACK-9595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15693970#comment-15693970 ]
ASF GitHub Bot commented on CLOUDSTACK-9595: -------------------------------------------- Github user serg38 commented on the issue: https://github.com/apache/cloudstack/pull/1762 Here it is few samples of deadlocks we observe in high transaction volume environment with multiple management servers. As you can see most of them are concurrent operations from different management servers and either select or select for update statements. The following 4 types account for the majority of deadlock s we saw so far ( 80-90% of all deadlocks). Deadlock 1-3 happens much more often than deadlock 4. It is next to impossible to reproduce since they occur one in few days with 4 management servers and average VM deployment volume of 3000 a day. Deadlock type 1: InnoDB: transactions deadlock detected, dumping detailed information. 151217 3:08:20 *** (1) TRANSACTION: TRANSACTION BB4D4C91D, ACTIVE 0 sec fetching rows mysql tables in use 1, locked 1 LOCK WAIT 11 lock struct(s), heap size 3112, 5 row lock(s) MySQL thread id 47654, OS thread handle 0x7f0475bdd700, query id 3821358107 ussclpdcsmgt012.autodesk.com 10.41.13.14 cloud Sending data SELECT host.id, host.disconnected, host.name, host.status, host.type, host.private_ip_address, host.private_mac_address, host.private_netmask, host.public_netmask, host.public_ip_address, host.public_mac_address, host.storage_ip_address, host.cluster_id, host.storage_netmask, host.storage_mac_address, host.storage_ip_address_2, host.storage_netmask_2, host.storage_mac_address_2, host.hypervisor_type, host.proxy_port, host.resource, host.fs_type, host.available, host.setup, host.resource_state, host.hypervisor_version, host.update_count, host.uuid, host.data_center_id, host.pod_id, host.cpu_sockets, host.cpus, host.url, host.speed, host.ram, host.parent, host.guid, host.capabilities, host.total_size, host.last_ping, host.mgmt_server_id, host.dom0_memory, host.version, host.created, host.removed FROM host WHERE host.resource IS NOT NULL AND host.mgmt_server_id = 345048964870 *** (1) WAITING FOR THIS LOCK TO BE GRANTED: *** (2) TRANSACTION: TRANSACTION BB4D4C915, ACTIVE 1 sec fetching rows, thread declared inside InnoDB 449 mysql tables in use 3, locked 3 29 lock struct(s), heap size 6960, 15 row lock(s), undo log entries 1 MySQL thread id 47623, OS thread handle 0x7f0a47074700, query id 3821724056 ussclpdcsmgt013.autodesk.com 10.41.13.15 cloud Copying to tmp table SELECT host.id, host.disconnected, host.name, host.status, host.type, host.private_ip_address, host.private_mac_address, host.private_netmask, host.public_netmask, host.public_ip_address, host.public_mac_address, host.storage_ip_address, host.cluster_id, host.storage_netmask, host.storage_mac_address, host.storage_ip_address_2, host.storage_netmask_2, host.storage_mac_address_2, host.hypervisor_type, host.proxy_port, host.resource, host.fs_type, host.available, host.setup, host.resource_state, host.hypervisor_version, host.update_count, host.uuid, host.data_center_id, host.pod_id, host.cpu_sockets, host.cpus, host.url, host.speed, host.ram, host.parent, host.guid, host.capabilities, host.total_size, host.last_ping, host.mgmt_server_id, host.dom0_memory, host.version, host.created, host.removed FROM host LEFT OUTER JOIN op_host_transfer ON host.id=op_host_transfer.id *** (2) HOLDS THE LOCK(S): RECORD LOCKS space id 0 page no 147488 n bits 840 index `i_host__removed` of table `cloud`.`host` trx id BB4D4C915 lock_mode X locks rec but not gap Deadlock 2: InnoDB: transactions deadlock detected, dumping detailed information. 151218 11:03:00 *** (1) TRANSACTION: TRANSACTION BBB232C81, ACTIVE 51 sec starting index read mysql tables in use 1, locked 1 LOCK WAIT 3 lock struct(s), heap size 1248, 2 row lock(s) MySQL thread id 57308, OS thread handle 0x7f0a45c24700, query id 5217973695 ussclpdcsmgt013.autodesk.com 10.41.13.15 cloud Sending data SELECT resource_count.id, resource_count.type, resource_count.account_id, resource_count.domain_id, resource_count.count FROM resource_count WHERE resource_count.id IN (5083,4867,5079,33652,5077) FOR UPDATE *** (1) WAITING FOR THIS LOCK TO BE GRANTED: *** (2) TRANSACTION: TRANSACTION BBB2254AC, ACTIVE 116 sec starting index read, thread declared inside InnoDB 500 mysql tables in use 1, locked 1 207 lock struct(s), heap size 31160, 1650 row lock(s), undo log entries 2 MySQL thread id 56926, OS thread handle 0x7f04756c9700, query id 5218549710 ussclpdcsmgt014.autodesk.com 10.41.13.16 cloud Sending data SELECT resource_count.id, resource_count.type, resource_count.account_id, resource_count.domain_id, resource_count.count FROM resource_count WHERE resource_count.id IN (5083,4867,5079,33652,5077) FOR UPDATE Deadlock 3: ** (1) TRANSACTION: TRANSACTION BBB232C81, ACTIVE 51 sec starting index read mysql tables in use 1, locked 1 LOCK WAIT 3 lock struct(s), heap size 1248, 2 row lock(s) MySQL thread id 57308, OS thread handle 0x7f0a45c24700, query id 5217973695 ussclpdcsmgt013.autodesk.com 10.41.13.15 cloud Sending data SELECT resource_count.id, resource_count.type, resource_count.account_id, resource_count.domain_id, resource_count.count FROM resource_count WHERE resource_count.id IN (5083,4867,5079,33652,5077) FOR UPDATE *** (1) WAITING FOR THIS LOCK TO BE GRANTED: *** (2) TRANSACTION: TRANSACTION BBB2254AC, ACTIVE 116 sec starting index read, thread declared inside InnoDB 500 mysql tables in use 1, locked 1 207 lock struct(s), heap size 31160, 1650 row lock(s), undo log entries 2 MySQL thread id 56926, OS thread handle 0x7f04756c9700, query id 5218549710 ussclpdcsmgt014.autodesk.com 10.41.13.16 cloud Sending data SELECT resource_count.id, resource_count.type, resource_count.account_id, resource_count.domain_id, resource_count.count FROM resource_count WHERE resource_count.id IN (5083,4867,5079,33652,5077) FOR UPDATE *** (2) HOLDS THE LOCK(S): Deadlock 4: *** (1) TRANSACTION: TRANSACTION C3BDD81EF, ACTIVE 0 sec fetching rows mysql tables in use 1, locked 1 LOCK WAIT 55 lock struct(s), heap size 6960, 3 row lock(s), undo log entries 1 MySQL thread id 250487, OS thread handle 0x7f0a460b6700, query id 32833273614 ussclpdcsmgt013.autodesk.com 10.41.13.15 cloud updating DELETE FROM vm_reservation WHERE vm_reservation.vm_id = 869089 *** (1) WAITING FOR THIS LOCK TO BE GRANTED: *** (2) TRANSACTION: TRANSACTION C3BDD81FC, ACTIVE 0 sec inserting, thread declared inside InnoDB 500 mysql tables in use 1, locked 1 4 lock struct(s), heap size 1248, 2 row lock(s), undo log entries 2 MySQL thread id 250553, OS thread handle 0x7f0a3720c700, query id 32833273762 ussclpdcsmgt013.autodesk.com 10.41.13.15 cloud update INSERT INTO volume_reservation (volume_reservation.vm_reservation_id, volume_reservation.vm_id, volume_reservation.volume_id, volume_reservation.pool_id) VALUES (997419, 1009484, 918449, 316) *** (2) HOLDS THE LOCK(S): > Transactions are not getting retried in case of database deadlock errors > ------------------------------------------------------------------------ > > Key: CLOUDSTACK-9595 > URL: https://issues.apache.org/jira/browse/CLOUDSTACK-9595 > Project: CloudStack > Issue Type: Bug > Security Level: Public(Anyone can view this level - this is the > default.) > Affects Versions: 4.8.0 > Reporter: subhash yedugundla > Fix For: 4.8.1 > > > Customer is seeing occasional error 'Deadlock found when trying to get lock; > try restarting transaction' messages in their management server logs. It > happens regularly at least once a day. The following is the error seen > 2015-12-09 19:23:19,450 ERROR [cloud.api.ApiServer] > (catalina-exec-3:ctx-f05c58fc ctx-39c17156 ctx-7becdf6e) unhandled exception > executing api command: [Ljava.lang.String;@230a6e7f > com.cloud.utils.exception.CloudRuntimeException: DB Exception on: > com.mysql.jdbc.JDBC4PreparedStatement@74f134e3: DELETE FROM > instance_group_vm_map WHERE instance_group_vm_map.instance_id = 941374 > at com.cloud.utils.db.GenericDaoBase.expunge(GenericDaoBase.java:1209) > at sun.reflect.GeneratedMethodAccessor360.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:317) > at > org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183) > at > org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150) > at > com.cloud.utils.db.TransactionContextInterceptor.invoke(TransactionContextInterceptor.java:34) > at > org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:161) > at > org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:91) > at > org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172) > at > org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204) > at com.sun.proxy.$Proxy237.expunge(Unknown Source) > at > com.cloud.vm.UserVmManagerImpl$2.doInTransactionWithoutResult(UserVmManagerImpl.java:2593) > at > com.cloud.utils.db.TransactionCallbackNoReturn.doInTransaction(TransactionCallbackNoReturn.java:25) > at com.cloud.utils.db.Transaction$2.doInTransaction(Transaction.java:57) > at com.cloud.utils.db.Transaction.execute(Transaction.java:45) > at com.cloud.utils.db.Transaction.execute(Transaction.java:54) > at > com.cloud.vm.UserVmManagerImpl.addInstanceToGroup(UserVmManagerImpl.java:2575) > at > com.cloud.vm.UserVmManagerImpl.updateVirtualMachine(UserVmManagerImpl.java:2332) -- This message was sent by Atlassian JIRA (v6.3.4#6332)