> On Jan. 31, 2013, 6:32 p.m., Chiradeep Vittal wrote: > > server/src/com/cloud/cluster/ClusterManagerImpl.java, line 371 > > <https://reviews.apache.org/r/9133/diff/3/?file=253825#file253825line371> > > > > If the cloud operator sees this WARNING, what is he supposed to do? > > Should it be INFO? Should you tell him that it is safe to ignore?
What is the logging guideline in the case of suppressing an exception? I see in other places in the code that a warning is logged in a similar situation. As long as there is consistency I feel that warning is fine. I would interpret the warning as some operation failed but the system can recover from that. - Koushik ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/9133/#review15951 ----------------------------------------------------------- On Jan. 31, 2013, 9:10 a.m., Koushik Das wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/9133/ > ----------------------------------------------------------- > > (Updated Jan. 31, 2013, 9:10 a.m.) > > > Review request for cloudstack, Abhinandan Prateek and Alex Huang. > > > Description > ------- > > The issue happens randomly when hosts in a cluster gets distributed across > multiple MS. Host can get split in following scenarios: > a. Add host – MS on which add host is executed takes ownership of the > host. So if 2 hosts belonging to same cluster are added from 2 different MS > then cluster gets split > b. scanDirectAgentToLoad – This runs every 90 secs. and check if there > are any hosts that needs to be reconnected. The current logic of host scan > can also lead to a split > > The idea is to fix (b) to ensure that hosts in a cluster are managed by > same MS. For (a) only the entry in the database is going to be created except > in case if the host getting added is first in the cluster (in this case agent > creation happens at the same time) and then (b) will take care of connection > and agent creation part. Since currently addHost only creates an entry in the > db there is a small window where the host state will be shown as 'Alert' till > the time (b) is scheduled and picks up the host to make a connection. The MS > doing add host will immediately schedule a scan task and also send > notification to peers to start the scan task. > > > This addresses bug CLOUDSTACK-606. > > > Diffs > ----- > > api/src/com/cloud/agent/api/ScheduleHostScanTaskCommand.java PRE-CREATION > server/src/com/cloud/agent/manager/ClusteredAgentManagerImpl.java ca0bf5c > server/src/com/cloud/cluster/ClusterManagerImpl.java e341b88 > server/src/com/cloud/host/dao/HostDaoImpl.java 0881675 > server/src/com/cloud/resource/ResourceManagerImpl.java f82424a > > Diff: https://reviews.apache.org/r/9133/diff/ > > > Testing > ------- > > Manually tested the following scenarios: > > - Added hostA in cluster1 from MS1, gets owned by MS1 as first host in > cluster. Added hostB in same cluster1 from MS2. Once both hosts are in 'Up' > state ensure that they are owned by the same MS (i.e. MS1). > - Error scenarios when host goes to disconnected, alert or down state > (disconnected host from network) and is reconnected back (connected to > network). Ensure that once connected back, host should be owned by same MS as > other hosts in the cluster. > - Have a scenario where hosts are already in a distributed state (before the > fix added hosts to the same cluster from different MSs) and ensure that after > applying the patch and retarting the MSs distribution happens properly. > - Did basic validation in a single MS setup, added multiple hosts in a > cluster and created VMs on them. > > > Thanks, > > Koushik Das > >