----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/9133/#review15828 -----------------------------------------------------------
server/src/com/cloud/cluster/ClusterManagerImpl.java <https://reviews.apache.org/r/9133/#comment34114> Will make it a separate method. server/src/com/cloud/cluster/ClusterManagerImpl.java <https://reviews.apache.org/r/9133/#comment34113> Scheduling of host scan task in peer MS is a best effort operation during host add. The regular host scan happens at fixed intervals anyways. server/src/com/cloud/resource/ResourceManagerImpl.java <https://reviews.apache.org/r/9133/#comment34112> seconds, This is passed to lock method which expects timeout in seconds. Added a comment. server/src/com/cloud/resource/ResourceManagerImpl.java <https://reviews.apache.org/r/9133/#comment34111> fixed server/src/com/cloud/resource/ResourceManagerImpl.java <https://reviews.apache.org/r/9133/#comment34115> Created a new method createHostAndAgentDeferred. There is some common code in createHostAndAgent and createHostAndAgentDeferred but I feel it is better to clean it up as part of separate commit. server/src/com/cloud/resource/ResourceManagerImpl.java <https://reviews.apache.org/r/9133/#comment34116> Created a new method createHostAndAgentDeferred. There is some code duplication in createHostAndAgent and createHostAndAgentDeferred but I feel it is better to do the cleanup as part of a separate commit. server/src/com/cloud/resource/ResourceManagerImpl.java <https://reviews.apache.org/r/9133/#comment34110> fixed - Koushik Das On Jan. 29, 2013, 1:38 p.m., Koushik Das wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/9133/ > ----------------------------------------------------------- > > (Updated Jan. 29, 2013, 1:38 p.m.) > > > Review request for cloudstack, Abhinandan Prateek and Alex Huang. > > > Description > ------- > > The issue happens randomly when hosts in a cluster gets distributed across > multiple MS. Host can get split in following scenarios: > a. Add host – MS on which add host is executed takes ownership of the > host. So if 2 hosts belonging to same cluster are added from 2 different MS > then cluster gets split > b. scanDirectAgentToLoad – This runs every 90 secs. and check if there > are any hosts that needs to be reconnected. The current logic of host scan > can also lead to a split > > The idea is to fix (b) to ensure that hosts in a cluster are managed by > same MS. For (a) only the entry in the database is going to be created except > in case if the host getting added is first in the cluster (in this case agent > creation happens at the same time) and then (b) will take care of connection > and agent creation part. Since currently addHost only creates an entry in the > db there is a small window where the host state will be shown as 'Alert' till > the time (b) is scheduled and picks up the host to make a connection. The MS > doing add host will immediately schedule a scan task and also send > notification to peers to start the scan task. > > > This addresses bug CLOUDSTACK-606. > > > Diffs > ----- > > api/src/com/cloud/agent/api/ScheduleHostScanTaskCommand.java PRE-CREATION > server/src/com/cloud/agent/manager/ClusteredAgentManagerImpl.java ca0bf5c > server/src/com/cloud/cluster/ClusterManagerImpl.java e341b88 > server/src/com/cloud/host/dao/HostDaoImpl.java 0881675 > server/src/com/cloud/resource/ResourceManagerImpl.java f82424a > > Diff: https://reviews.apache.org/r/9133/diff/ > > > Testing > ------- > > Manually tested the following scenarios: > > - Added hostA in cluster1 from MS1, gets owned by MS1 as first host in > cluster. Added hostB in same cluster1 from MS2. Once both hosts are in 'Up' > state ensure that they are owned by the same MS (i.e. MS1). > - Error scenarios when host goes to disconnected, alert or down state > (disconnected host from network) and is reconnected back (connected to > network). Ensure that once connected back, host should be owned by same MS as > other hosts in the cluster. > - Have a scenario where hosts are already in a distributed state (before the > fix added hosts to the same cluster from different MSs) and ensure that after > applying the patch and retarting the MSs distribution happens properly. > - Did basic validation in a single MS setup, added multiple hosts in a > cluster and created VMs on them. > > > Thanks, > > Koushik Das > >