[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-9595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15700328#comment-15700328
 ] 

ASF GitHub Bot commented on CLOUDSTACK-9595:
--------------------------------------------

Github user rafaelweingartner commented on the issue:

    https://github.com/apache/cloudstack/pull/1762
  
    @serg38, it is great that you found one of the methods that cause the 
deadlock problem 
“com.cloud.host.dao.HostDaoImpl.findAndUpdateDirectAgentToLoad(long, Long, 
long)”.
    
    This method surely is problematic. I would first start asking, (i) does it 
need to manually open a transaction (at line 512)? Isn’t that the goal of “@DB” 
annotation? (ii) what is the objective of the method 
(“findAndUpdateDirectAgentToLoad”)? It is looking too complicated, with too 
many accesses to the DB.
    
    The method “resetHosts” at line 517 looks for hosts that are “managed” by 
the current MS and are “Disconnected” to mark them as unmanaged by any MS. That 
means, it updates the “managementServerId = null” of hosts marked as 
“Disconnect”.
    
    Would not it be better to have a specific method/transaction only for the 
aforementioned process?  If we extract that chunk of code to an isolated 
method, could not we have an atomic access to the DB without locking? “update 
set managementServerId = null from hosts where ……”; If the method is isolated I 
do not see reasons for locks here.
    
    A little further, there is another method which could be isolated, lines 
527 – 546. This block of code looks for clusters being managed by the current 
MS. Then, it searches for hosts of clusters that are managed by the current MS, 
which are not being managed by the current MS (or not managed at all?)? I did 
not understand that because I have seen in some other piece of code that we 
have a balancing approach; meaning that, we try to balance the number of hosts 
managed by an MS.  This piece of code seems to remove the balancing process.
    
    Then, at line 551 and forward (if the number of hosts is less than the 
limit), it tries to look for hosts of clusters not being managed by any MS. 
This block could also be an isolated one. And again, we might be able to do 
this process without using locks.
    
    My final comment, even if we choose not to refactor and improve this piece 
of code, there is one thing that is very strange for me. The method 
“findAndUpdateDirectAgentToLoad”  is annotated with “@DB”, and also opens and 
tries to manage a transaction manually. Then, we have all of the pieces of code 
I mentioned, all of them call other methods that also are annotated with “@DB”. 
Can this cause a problem?
    
    For instance, when I use Spring, methods from a service layer (the place 
where I configure my pattern of transactions) call one another, they will all 
use/share the same transaction opened when the first method of the service 
layer was called, unless specified otherwise. How will it work here in ACS?



> Transactions are not getting retried in case of database deadlock errors
> ------------------------------------------------------------------------
>
>                 Key: CLOUDSTACK-9595
>                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-9595
>             Project: CloudStack
>          Issue Type: Bug
>      Security Level: Public(Anyone can view this level - this is the 
> default.) 
>    Affects Versions: 4.8.0
>            Reporter: subhash yedugundla
>             Fix For: 4.8.1
>
>
> Customer is seeing occasional error 'Deadlock found when trying to get lock; 
> try restarting transaction' messages in their management server logs.  It 
> happens regularly at least once a day.  The following is the error seen 
> 2015-12-09 19:23:19,450 ERROR [cloud.api.ApiServer] 
> (catalina-exec-3:ctx-f05c58fc ctx-39c17156 ctx-7becdf6e) unhandled exception 
> executing api command: [Ljava.lang.String;@230a6e7f
> com.cloud.utils.exception.CloudRuntimeException: DB Exception on: 
> com.mysql.jdbc.JDBC4PreparedStatement@74f134e3: DELETE FROM 
> instance_group_vm_map WHERE instance_group_vm_map.instance_id = 941374
>       at com.cloud.utils.db.GenericDaoBase.expunge(GenericDaoBase.java:1209)
>       at sun.reflect.GeneratedMethodAccessor360.invoke(Unknown Source)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:606)
>       at 
> org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:317)
>       at 
> org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183)
>       at 
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)
>       at 
> com.cloud.utils.db.TransactionContextInterceptor.invoke(TransactionContextInterceptor.java:34)
>       at 
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:161)
>       at 
> org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:91)
>       at 
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
>       at 
> org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)
>       at com.sun.proxy.$Proxy237.expunge(Unknown Source)
>       at 
> com.cloud.vm.UserVmManagerImpl$2.doInTransactionWithoutResult(UserVmManagerImpl.java:2593)
>       at 
> com.cloud.utils.db.TransactionCallbackNoReturn.doInTransaction(TransactionCallbackNoReturn.java:25)
>       at com.cloud.utils.db.Transaction$2.doInTransaction(Transaction.java:57)
>       at com.cloud.utils.db.Transaction.execute(Transaction.java:45)
>       at com.cloud.utils.db.Transaction.execute(Transaction.java:54)
>       at 
> com.cloud.vm.UserVmManagerImpl.addInstanceToGroup(UserVmManagerImpl.java:2575)
>       at 
> com.cloud.vm.UserVmManagerImpl.updateVirtualMachine(UserVmManagerImpl.java:2332)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to