-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9133/#review15828
-----------------------------------------------------------



server/src/com/cloud/cluster/ClusterManagerImpl.java
<https://reviews.apache.org/r/9133/#comment34114>

    Will make it a separate method.



server/src/com/cloud/cluster/ClusterManagerImpl.java
<https://reviews.apache.org/r/9133/#comment34113>

    Scheduling of host scan task in peer MS is a best effort operation during 
host add. The regular host scan happens at fixed intervals anyways.



server/src/com/cloud/resource/ResourceManagerImpl.java
<https://reviews.apache.org/r/9133/#comment34112>

    seconds, This is passed to lock method which expects timeout in seconds. 
Added a comment.



server/src/com/cloud/resource/ResourceManagerImpl.java
<https://reviews.apache.org/r/9133/#comment34111>

    fixed



server/src/com/cloud/resource/ResourceManagerImpl.java
<https://reviews.apache.org/r/9133/#comment34115>

    Created a new method createHostAndAgentDeferred. There is some common code 
in createHostAndAgent and createHostAndAgentDeferred but I feel it is better to 
clean it up as part of separate commit.



server/src/com/cloud/resource/ResourceManagerImpl.java
<https://reviews.apache.org/r/9133/#comment34116>

    Created a new method createHostAndAgentDeferred.
    
    There is some code duplication in createHostAndAgent and 
createHostAndAgentDeferred but I feel it is better to do the cleanup as part of 
a separate commit.



server/src/com/cloud/resource/ResourceManagerImpl.java
<https://reviews.apache.org/r/9133/#comment34110>

    fixed


- Koushik Das


On Jan. 29, 2013, 1:38 p.m., Koushik Das wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/9133/
> -----------------------------------------------------------
> 
> (Updated Jan. 29, 2013, 1:38 p.m.)
> 
> 
> Review request for cloudstack, Abhinandan Prateek and Alex Huang.
> 
> 
> Description
> -------
> 
> The issue happens randomly when hosts in a cluster gets distributed across 
> multiple MS. Host can get split in following scenarios:
>     a. Add host – MS on which add host is executed takes ownership of the 
> host. So if 2 hosts belonging to same cluster are added from 2 different MS 
> then cluster gets split
>     b. scanDirectAgentToLoad – This runs every 90 secs. and check if there 
> are any hosts that needs to be reconnected. The current logic of host scan 
> can also lead to a split
>     
>     The idea is to fix (b) to ensure that hosts in a cluster are managed by 
> same MS. For (a) only the entry in the database is going to be created except 
> in case if the host getting added is first in the cluster (in this case agent 
> creation happens at the same time) and then (b) will take care of connection 
> and agent creation part. Since currently addHost only creates an entry in the 
> db there is a small window where the host state will be shown as 'Alert' till 
> the time (b) is scheduled and picks up the host to make a connection. The MS 
> doing add host will immediately schedule a scan task and also send 
> notification to peers to start the scan task.
> 
> 
> This addresses bug CLOUDSTACK-606.
> 
> 
> Diffs
> -----
> 
>   api/src/com/cloud/agent/api/ScheduleHostScanTaskCommand.java PRE-CREATION 
>   server/src/com/cloud/agent/manager/ClusteredAgentManagerImpl.java ca0bf5c 
>   server/src/com/cloud/cluster/ClusterManagerImpl.java e341b88 
>   server/src/com/cloud/host/dao/HostDaoImpl.java 0881675 
>   server/src/com/cloud/resource/ResourceManagerImpl.java f82424a 
> 
> Diff: https://reviews.apache.org/r/9133/diff/
> 
> 
> Testing
> -------
> 
> Manually tested the following scenarios:
> 
> - Added hostA in cluster1 from MS1, gets owned by MS1 as first host in 
> cluster. Added hostB in same cluster1 from MS2. Once both hosts are in 'Up' 
> state ensure that they are owned by the same MS (i.e. MS1).
> - Error scenarios when host goes to disconnected, alert or down state 
> (disconnected host from network) and is reconnected back (connected to 
> network). Ensure that once connected back, host should be owned by same MS as 
> other hosts in the cluster.
> - Have a scenario where hosts are already in a distributed state (before the 
> fix added hosts to the same cluster from different MSs) and ensure that after 
> applying the patch and retarting the MSs distribution happens properly.
> - Did basic validation in a single MS setup, added multiple hosts in a 
> cluster and created VMs on them.
> 
> 
> Thanks,
> 
> Koushik Das
> 
>

Reply via email to