Hi, Currently there are lot of Thread.sleep calls in mock iaas component which makes it slow and cause unexpected behavior due to concurrency issues. Also it has a significant performance overhead when running integration tests since mock iaas is being used for test cases. I've been working on improving this component by doing following changes;
- Remove *all* Thread sleep calls in mock iaas - Introduce a method named 'initialize' to start event receivers and publishers. This is a synchronous call which grantees that receiver and publisher objects will be created successfully. If not it will throw an exception and startInstance() method call in CC will fail. Earlier this task was delegated to an executor service which made it difficult to check whether mock instance was created successfully. - Create topology receiver in mock instance and listen for member initialized and member started events. It will publish instance started and instance activated events based on topology events received rather than sleeping for some time interval before publishing. After making those changes I faced multiple integration test failures. This was mainly because integration tests relied heavily on Thread sleep calls to assert various conditions. With these changes, the time taken for a mock instance/app to become active came down to milliseconds, hence test cases could not detect member status or app status correctly. Therefore I had to introduce a new non-blocking mechanism to check app/member status by using thread synchronization. Now the average time taken for complete integration tests is around 16 mins (earlier it was more than 30 mins). This is almost 50% performance gain. Created JIRA at [1]. Following is a summary of changes. AutoscalerTopologyEventReceiver: - Fix formatting and log messages - ClusterInstanceTerminatedEventListener check appMonitor is null when calling destroy() on monitor AutoscalerServiceImpl - Fix formatting and log messages ClusterStatusActiveProcessor - Fix formatting and log messages (log cluster-instance-id) GroupStatusActiveProcessor - Fix formatting and log messages (log group-instance-id) - Print groups map entries and cluster data holder map entries if debug enabled GroupStatusProcessor - Fix formatting and log messages (log group-instance-id) GroupStatusTerminatedProcessor - Fix formatting and log messages (log group-instance-id) CloudControllerServiceComponent - Increase THREAD_POOL_SIZE from 10 to 20 TopologyBuilder - Fix formatting and log messages - Move acquire topology lock call outside of try block RestClient - Add logs for every method to help troubleshooting integration failures StratosTestServerManager - Call waitForPort method with restart timeout of 600000 ms. This is to avoid test failures due to slow builder machines. RestConstants - Add entity name: REPO_NOTIFY_NAME = "GitHook" TopologyHandler - Update timeout values: APPLICATION_ACTIVATION_TIMEOUT = 300000; APPLICATION_INACTIVATION_TIMEOUT = 120000; APPLICATION_UNDEPLOYMENT_TIMEOUT = 30000; MEMBER_TERMINATION_TIMEOUT = 120000; APPLICATION_TOPOLOGY_INIT_TIMEOUT = 20000; - Increase executorService pool size from 10 to 30 to compensate for additional event receivers - Event receivers with logs for events - healthStatEventReceiver: MemberFaultEvent - applicationsEventReceiver: ApplicationInstanceActivatedEventListener, ApplicationInstanceInactivatedEventListener - topologyEventReceiver: MemberActivatedEventListener, MemberTerminatedEventListener, ClusterInstanceActivatedEventListener, ClusterInstanceInactivateEventListener - Added logs for every method to help troubleshoot integration failures - Asynchronous mechanism to assertApplicationActiveStatus - Asynchronous mechanism to assertApplicationInactiveStatus Application - Added log in getStatus() method to print status of all application instances ApplicationCreatedMessageProcessor - Fix formatting and log messages ApplicationDeletedMessageProcessor - Fix formatting and log messages ApplicationInstanceActivatedMessageProcessor - Fix formatting and log messages (log app-instance-id) ApplicationInstanceCreatedMessageProcessor - Fix formatting and log messages (log app-instance-id) ApplicationInstanceInactivatedMessageProcessor - Fix formatting and log messages (log app-instance-id) ApplicationInstanceTerminatedMessageProcessor - Fix formatting and log messages (log app-instance-id) ApplicationInstanceTerminatingMessageProcessor - Fix formatting and log messages (log app-instance-id) ClusterStatusClusterActivatedMessageProcessor - Fix formatting and log messages (log cluster-instance-id) ClusterStatusClusterInactivateMessageProcessor - Fix formatting and log messages (log cluster-instance-id) ClusterStatusClusterInstanceCreatedMessageProcessor - Fix formatting and log messages (log cluster-instance-id) ClusterStatusClusterResetMessageProcessor - Fix formatting and log messages (log cluster-instance-id) ClusterStatusClusterTerminatedMessageProcessor - Fix formatting and log messages (log cluster-instance-id) ClusterStatusClusterTerminatingMessageProcessor - Fix formatting and log messages (log cluster-instance-id) Introduce removeEventListener method for message processor chain to remove a registered event listener object. This is needed since integration tests will be registering event listeners on-demand. Those listeners needs to be removed at the end of the test case. - ApplicationSignUpMessageProcessorChain - ApplicationsMessageProcessorChain - ClusterStatusMessageProcessorChain - DomainMappingMessageProcessorChain - HealthStatMessageProcessorChain - InitializerMessageProcessorChain - InstanceNotifierMessageProcessorChain - InstanceStatusMessageProcessorChain - TenantMessageProcessorChain - TopologyMessageProcessorChain - MessageProcessorChain - ApplicationsEventMessageDelegator - ApplicationsEventReceiver InstanceNotifierEventReceiver - synchronized blocks for execute() and terminate() - eventSubscriber object creation moved to constructor from execute() method. This is to avoid possible NPE when calling terminate() method MetadataApi - Catch generic Exception instead of RegistryException DataStore - throws MetadataException in addition to RegistryException MetadataApiRegistry - throw new MetadataException instead of RegistryException MockIaasServiceComponent - Move mockIaasServiceUtil.startInstancesPersisted() to MockIaasServiceImpl() MockIaasServiceImpl - start persisted mock instances in the constructor - Remove all thread sleep calls MockIaasServiceUtil - Move startInstancesPersisted() method to MockIaasServiceImpl() MockInstance - Create TopologyEventReceiver object and register: MemberInitializedEventListener, MemberStartedEventListener, MemberMaintenanceListener - Publish InstanceStartedEvent upon receiving MemberInitializedEvent - Publish InstanceActivatedEvent upon receiving MemberStartedEvent - Publish InstanceReadyToShutdownEvent upon receiving MemberMaintenanceModeEvent - synchronized terminate() and initialize() methods - Introduce an initialize() method to start event receivers and health stat publisher - Introduce field 'memberStatus' of type MemberStatus to track the life-cycle of mock instance MockIaasServiceTest - Start embedded MB in unit test case since mockIaasService.startInstance call will start the event receivers as well GitHookTestCase - Make artifactUpdateEventCount an AtomicInteger - Replace restClient.doPost call with restClient.addEntity call - Terminate instanceNotifierEventReceiver at the end of test case - Shutdown eventListenerExecutorService at the end of test case [1] https://issues.apache.org/jira/browse/STRATOS-1633 Thanks. -- Akila Ravihansa Perera WSO2 Inc.; http://wso2.com/ Blog: http://ravihansa3000.blogspot.com