[ https://issues.apache.org/jira/browse/CLOUDSTACK-6674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Rayees Namathponnan closed CLOUDSTACK-6674. ------------------------------------------- > [Automation] [DB lock] When KVM agent is alert state, agent never trying to > connect back > ----------------------------------------------------------------------------------------- > > Key: CLOUDSTACK-6674 > URL: https://issues.apache.org/jira/browse/CLOUDSTACK-6674 > Project: CloudStack > Issue Type: Bug > Security Level: Public(Anyone can view this level - this is the > default.) > Components: KVM > Affects Versions: 4.4.0 > Environment: KVM (RHEL 6.3) > 4.4-forward > Reporter: Rayees Namathponnan > Assignee: edison su > Priority: Critical > Fix For: 4.4.0 > > > This issue observed in automation run, when KVM agent is alert state, it > never tries to connect back, also observed SQL DB lock in MS log for > AgentConnectTaskPool > You need to restart agent manually to connect it back > Here is agent log > 2014-05-13 23:59:38,877 INFO [cloud.agent.Agent] (Agent-Handler-2:null) Lost > connection to the server. Dealing with the remaining > commands... > 2014-05-13 23:59:38,877 INFO [cloud.agent.Agent] (Agent-Handler-2:null) > Cannot connect because we still have 5 commands in progre > ss. > 2014-05-13 23:59:43,877 INFO [cloud.agent.Agent] (Agent-Handler-2:null) Lost > connection to the server. Dealing with the remaining > commands... > 2014-05-13 23:59:43,877 INFO [cloud.agent.Agent] (Agent-Handler-2:null) > Cannot connect because we still have 5 commands in progre > ss. > 2014-05-13 23:59:48,877 INFO [cloud.agent.Agent] (Agent-Handler-2:null) Lost > connection to the server. Dealing with the remaining > commands... > 2014-05-13 23:59:48,877 INFO [cloud.agent.Agent] (Agent-Handler-2:null) > Cannot connect because we still have 5 commands in progre > ss. > 2014-05-13 23:59:53,877 INFO [cloud.agent.Agent] (Agent-Handler-2:null) Lost > connection to the server. Dealing with the remaining > commands... > 2014-05-13 23:59:53,878 INFO [cloud.agent.Agent] (Agent-Handler-2:null) > Cannot connect because we still have 5 commands in progre > ss. > 2014-05-13 23:59:58,878 INFO [cloud.agent.Agent] (Agent-Handler-2:null) Lost > connection to the server. Dealing with the remaining > commands... > 2014-05-13 23:59:58,878 INFO [cloud.agent.Agent] (Agent-Handler-2:null) > Cannot connect because we still have 5 commands in progress. > ~ > MS log, please see the attached log for more info > 2014-05-13 20:07:50,170 DEBUG [c.c.v.VirtualMachineManagerImpl] > (API-Job-Executor-93:job-2240 ctx-f6efb16a) Destroying vm > VM[User|i-137-304-QA] > 2014-05-13 20:07:50,171 DEBUG [c.c.u.d.T.Transaction] > (API-Job-Executor-65:job-2223 ctx-a60be5ae) Rolling back the transaction: > Time = 291 Name = API-Job-Ex > ecutor-65; called by > -TransactionLegacy.rollback:903-TransactionLegacy.removeUpTo:846-TransactionLegacy.close:670-TransactionContextInterceptor.invoke:36-Ref > lectiveMethodInvocation.proceed:161-ExposeInvocationInterceptor.invoke:91-ReflectiveMethodInvocation.proceed:172-JdkDynamicAopProxy.invoke:204-$Proxy97.markA > sUnavailable:-1-IpAddressManagerImpl$10.doInTransaction:1639-IpAddressManagerImpl$10.doInTransaction:1620-Transaction$2.doInTransaction:49 > 2014-05-13 20:07:50,173 WARN [c.c.u.AccountManagerImpl] > (API-Job-Executor-65:job-2223 ctx-a60be5ae) Failed to cleanup account > Acct[20c063c7-9913-496d-983f-e > fe18a92a9dc-test-account-TestScaleVmDynamicServiceOffering-test_change_so_running_vm_dynamic_to_dynamic_1_ADMIN] > due to > com.cloud.utils.exception.CloudRuntimeException: DB Exception on: > com.mysql.jdbc.PreparedStatement@76384607: UPDATE user_ip_address SET > user_ip_address.state > ='Releasing' WHERE user_ip_address.id = 15 > at com.cloud.utils.db.GenericDaoBase.update(GenericDaoBase.java:851) > at com.cloud.utils.db.GenericDaoBase.update(GenericDaoBase.java:1327) > at > com.cloud.network.dao.IPAddressDaoImpl.markAsUnavailable(IPAddressDaoImpl.java:323) > at sun.reflect.GeneratedMethodAccessor536.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:317) > at > org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183) > at > org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150) > at > com.cloud.utils.db.TransactionContextInterceptor.invoke(TransactionContextInterceptor.java:34) > at > org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:161) > at > org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:91) > at > org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172) > at > org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204) > at com.sun.proxy.$Proxy97.markAsUnavailable(Unknown Source) > at > com.cloud.network.IpAddressManagerImpl$10.doInTransaction(IpAddressManagerImpl.java:1639) > at > com.cloud.network.IpAddressManagerImpl$10.doInTransaction(IpAddressManagerImpl.java:1620) > at > com.cloud.utils.db.Transaction$2.doInTransaction(Transaction.java:49) > at com.cloud.utils.db.Transaction.execute(Transaction.java:37) > at com.cloud.utils.db.Transaction.execute(Transaction.java:46) > at > com.cloud.network.IpAddressManagerImpl.markIpAsUnavailable(IpAddressManagerImpl.java:1620) > at > org.apache.cloudstack.engine.orchestration.NetworkOrchestrator.cleanupNetworkResources(NetworkOrchestrator.java:2616) > at > org.apache.cloudstack.engine.orchestration.NetworkOrchestrator.destroyNetwork(NetworkOrchestrator.java:2177) > at > com.cloud.user.AccountManagerImpl.cleanupAccount(AccountManagerImpl.java:815) > at > com.cloud.user.AccountManagerImpl.deleteAccount(AccountManagerImpl.java:689) > at > com.cloud.user.AccountManagerImpl.deleteUserAccount(AccountManagerImpl.java:1464) > at sun.reflect.GeneratedMethodAccessor555.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > com.cloud.user.AccountManagerImpl.deleteAccount(AccountManagerImpl.java:689) > at > com.cloud.user.AccountManagerImpl.deleteUserAccount(AccountManagerImpl.java:1464) > at sun.reflect.GeneratedMethodAccessor555.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:317) > at > org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183) > at > org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150) > at > org.apache.cloudstack.network.contrail.management.EventUtils$EventInterceptor.invoke(EventUtils.java:106) > at > org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:161) > at > com.cloud.event.ActionEventInterceptor.invoke(ActionEventInterceptor.java:51) > at > org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:161) > at > org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:91) > at > org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172) > at > org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204) > at com.sun.proxy.$Proxy100.deleteUserAccount(Unknown Source) > at > org.apache.cloudstack.region.RegionManagerImpl.deleteUserAccount(RegionManagerImpl.java:187) > at > org.apache.cloudstack.region.RegionServiceImpl.deleteUserAccount(RegionServiceImpl.java:121) > at > org.apache.cloudstack.api.command.admin.account.DeleteAccountCmd.execute(DeleteAccountCmd.java:104) > at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:119) > at > com.cloud.api.ApiAsyncJobDispatcher.runJob(ApiAsyncJobDispatcher.java:108) > at > org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.runInContext(AsyncJobManagerImpl.java:496) > at > org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49) > at > org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56) > at > org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103) > at > org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53) > at > org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46) > at > org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.run(AsyncJobManagerImpl.java:453) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLTransactionRollbackException: > Deadlock found when trying to get lock; try restarting transaction > ===================================== > 140513 20:07:49 INNODB MONITOR OUTPUT > ===================================== > Per second averages calculated from the last 15 seconds > ---------- > SEMAPHORES > ---------- > OS WAIT ARRAY INFO: reservation count 86171, signal count 75053 > Mutex spin waits 0, rounds 1468094, OS waits 30100 > RW-shared spins 106570, OS waits 49606; RW-excl spins 8147, OS waits 6463 > ------------------------ > ------------------------ > LATEST FOREIGN KEY ERROR > ------------------------ > 140513 18:36:59 Transaction: > TRANSACTION 0 1892217502, ACTIVE 0 sec, process no 1449, OS thread id > 140580633499392 inserting, thread declared inside InnoDB 500 > mysql tables in use 1, locked 1 > 5 lock struct(s), heap size 1216, 2 row lock(s), undo log entries 1 > MySQL thread id 3304, query id 1818700312 localhost 127.0.0.1 cloud update > INSERT INTO op_ha_work (op_ha_work.instance_id, op_ha_work.mgmt_server_id, > op_ha_work.created, op_ha_work.state, op_ha_work.host_id, op_ha_work.taken, > op_ha_work.time_to_try, op_ha_work.type, op_ha_work.updated, op_ha_work.step, > op_ha_work.vm_type, op_ha_work.tried) VALUES (152, null, '2014-05-14 > 01:36:59', 'Stopped', 0, null, 1367218183, 'HA', 3, 'Investigating', 'User', > 1) > Foreign key constraint fails for table `cloud`.`op_ha_work`: > , > CONSTRAINT `fk_op_ha_work__host_id` FOREIGN KEY (`host_id`) REFERENCES > `host` (`id`) > Trying to add in child table, in index `i_op_ha_work__host_id` tuple: > DATA TUPLE: 2 fields; > 0: len 8; hex 0000000000000000; asc ;; 1: len 8; hex > 000000000000001e; asc ;; > But in parent table `cloud`.`host`, in index `PRIMARY`, > the closest match we can find is record: > PHYSICAL RECORD: n_fields 50; compact format; info bits 0 > 0: len 8; hex 0000000000000001; asc ;; 1: len 6; hex 000070c8e312; > asc p ;; 2: len 7; hex 000000296610fd; asc )f ;; 3: len 25; hex > 5261636b32486f737431312e6c61622e766d6f70732e636f6d; asc > Rack2Host11.lab.vmops.com;; 4: len 30; hex > 64333337373432652d653934642d343033302d626132392d336132653463; asc > d337742e-e94d-4030-ba29-3a2e4c;...(truncated); 5: len 5; hex 416c657274; asc > Alert;; 6: len 7; hex 526f7574696e67; asc Routing;; 7: len 30; hex > 31302e3232332e35302e3636202020202020202020202020202020202020; asc > 10.223.50.66 ;...(truncated); 8: len 15; hex > 3235352e3235352e3235352e313932; asc 255.255.255.192;; 9: len 17; hex > 62633a33303a35623a64343a35333a3233; asc bc:30:5b:d4:53:23;; 10: len 30; hex > 31302e3232332e35302e3636202020202020202020202020202020202020; asc > 10.223.50.66 ;...(truncated); 11: len 15; hex > 3235352e3235352e3235352e313932; asc 255.255.255.192;; 12: len 17; hex > 62633a33303a35623a64343a35333a3233; asc bc:30:5b:d4:53:23;; 13: SQL NULL; 14: > SQL NULL; 15: SQL NULL; 16: len 8; hex 0000000000000001; asc ;; 17: > len 30; hex 31302e3232332e35302e3636202020202020202020202020202020202020; asc > 10.223.50.66 ;...(truncated); 18: len 15; hex > 3235352e3235352e3235352e313932; asc 255.255.255.192;; 19: len 17; hex > 62633a33303a35623a64343a35333a3233; asc bc:30:5b:d4:53:23;; 20: SQL NULL; 21: > len 8; hex 0000000000000001; asc ;; 22: len 8; hex 0000000000000001; > asc ;; 23: len 4; hex 00000001; asc ;; 24: len 4; hex 00000004; > asc ;; 25: len 4; hex 000008d5; asc ;; 26: SQL NULL; 27: SQL NULL; > 28: len 3; hex 4b564d; asc KVM;; 29: SQL NULL; 30: len 8; hex > 00000003e430c000; asc 0 ;; 31: SQL NULL; 32: len 14; hex > 342e342e302d534e415053484f54; asc 4.4.0-SNAPSHOT;; 33: SQL NULL; 34: SQL > NULL; 35: len 12; hex 68766d2c736e617073686f74; asc hvm,snapshot;; 36: len > 30; hex 38313731636466622d306434372d333239632d393835382d383739633434; asc > 8171cdfb-0d47-329c-9858-879c44;...(truncated); 37: len 4; hex 00000001; asc > ;; 38: len 4; hex 00000000; asc ;; 39: len 8; hex 0000000000000000; asc > ;; 40: len 4; hex 517dfc67; asc Q} g;; 41: SQL NULL; 42: len 8; hex > 8000125154230a85; asc QT# ;; 43: len 8; hex 80001251542304fb; asc QT# > ;; 44: SQL NULL; 45: len 8; hex 0000000000000006; asc ;; 46: len 7; > hex 456e61626c6564; asc Enabled;; 47: SQL NULL; 48: SQL NULL; 49: len 8; hex > 44697361626c6564; asc Disabled;; > ------------------------ > LATEST DETECTED DEADLOCK > ------------------------ > 140513 20:07:49 > *** (1) TRANSACTION: > TRANSACTION 0 1899231769, ACTIVE 0 sec, process no 1449, OS thread id > 140580624189184 starting index read > mysql tables in use 1, locked 1 -- This message was sent by Atlassian JIRA (v6.3.4#6332)