Github user rafaelweingartner commented on the issue:

    https://github.com/apache/cloudstack/pull/1762
  
    @serg38, it is great that you found one of the methods that cause the 
deadlock problem 
“com.cloud.host.dao.HostDaoImpl.findAndUpdateDirectAgentToLoad(long, Long, 
long)”.
    
    This method surely is problematic. I would first start asking, (i) does it 
need to manually open a transaction (at line 512)? Isn’t that the goal of 
“@DB” annotation? (ii) what is the objective of the method 
(“findAndUpdateDirectAgentToLoad”)? It is looking too complicated, with too 
many accesses to the DB.
    
    The method “resetHosts” at line 517 looks for hosts that are 
“managed” by the current MS and are “Disconnected” to mark them as 
unmanaged by any MS. That means, it updates the “managementServerId = null” 
of hosts marked as “Disconnect”.
    
    Would not it be better to have a specific method/transaction only for the 
aforementioned process?  If we extract that chunk of code to an isolated 
method, could not we have an atomic access to the DB without locking? “update 
set managementServerId = null from hosts where ……”; If the method is 
isolated I do not see reasons for locks here.
    
    A little further, there is another method which could be isolated, lines 
527 – 546. This block of code looks for clusters being managed by the current 
MS. Then, it searches for hosts of clusters that are managed by the current MS, 
which are not being managed by the current MS (or not managed at all?)? I did 
not understand that because I have seen in some other piece of code that we 
have a balancing approach; meaning that, we try to balance the number of hosts 
managed by an MS.  This piece of code seems to remove the balancing process.
    
    Then, at line 551 and forward (if the number of hosts is less than the 
limit), it tries to look for hosts of clusters not being managed by any MS. 
This block could also be an isolated one. And again, we might be able to do 
this process without using locks.
    
    My final comment, even if we choose not to refactor and improve this piece 
of code, there is one thing that is very strange for me. The method 
“findAndUpdateDirectAgentToLoad”  is annotated with “@DB”, and also 
opens and tries to manage a transaction manually. Then, we have all of the 
pieces of code I mentioned, all of them call other methods that also are 
annotated with “@DB”. Can this cause a problem?
    
    For instance, when I use Spring, methods from a service layer (the place 
where I configure my pattern of transactions) call one another, they will all 
use/share the same transaction opened when the first method of the service 
layer was called, unless specified otherwise. How will it work here in ACS?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to