[ 
https://issues.apache.org/jira/browse/STRATOS-706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14073185#comment-14073185
 ] 

Martin Eppel commented on STRATOS-706:
--------------------------------------

I checked the code and found a few instances of VM termination:
1. scale-down 
   - logging (at info level) is provided (see scale.drl, "[scale-down] Trying 
to terminating an instace to scale down!"
2. termination of obsolete member 
  - in the prelude of running the obsolete member check logging is provided but 
one could argue it is missing in the execution part of the rule similar to 
scale.drl (see mincheck.drl, rule "Terminate Obsoleted Instances")
  - more info available under debug level (RuleLog.debug)
3. Unregistering of subscription
  - seems to be missing when VMs (members) are terminated 
(CloudControlerServiceImpl.unregisterService)
4. (only applies to local POC grouping branch: is missing when instances are 
terminated because of failed dependency check)
    has been added since
Are there other instance of VM termination which I might have missed ?


> member terminate event should log reason
> ----------------------------------------
>
>                 Key: STRATOS-706
>                 URL: https://issues.apache.org/jira/browse/STRATOS-706
>             Project: Stratos
>          Issue Type: Bug
>          Components: Autoscaler
>    Affects Versions: 4.0.0
>            Reporter: Martin Eppel
>             Fix For: 4.0.1
>
>
> When Stratos terminates a member it must log the reason for it. Ideally the 
> logging should be systematic enough so that one can grep for different 
> severity, or by member, or by event type or some other useful categorization.
> The justification for this defect is that it will improve greatly debugging 
> and troubleshooting capabilities. Without logging it is very difficult to 
> debug terminations of members.
>  
> For example, consider this sequence in the stratos log file:
>  
> ===================
> TID: [0] [STRATOS] [2014-07-15 09:58:48,654] DEBUG 
> {org.apache.stratos.cloud.controller.impl.CloudControllerServiceImpl} -  
> Received an instance spawn request : MemberContext [memberId=null, 
> nodeId=null, clusterId=cisco-gilan-appmgr-1.cisco-gil, cartridgeType=null, 
> privateIpAddress=null, publicIpAddress=null, allocatedIpAddress=null, 
> initTime=1405418328649, lbClusterId=null, networkPartitionId=OAM1] 
> {org.apache.stratos.cloud.controller.impl.CloudControllerServiceImpl}
> TID: [0] [STRATOS] [2014-07-15 09:58:48,654] DEBUG 
> {org.apache.stratos.cloud.controller.impl.CloudControllerServiceImpl} -  
> Payload: 
> SERVICE_NAME=cisco-gilan-appmgr,HOST_NAME=cisco-gilan-appmgr-1.qmog.cisco.com,MULTITENANT=false,TENANT_ID=-1234,TENANT_RANGE=-1234,CARTRIDGE_ALIAS=cisco-gilan-appmgr-1,CLUSTER_ID=cisco-gilan-appmgr-1.cisco-gil,CARTRIDGE_KEY=o1jbiPPmPWBgyNVM,DEPLOYMENT=default,REPO_URL=null,PORTS=9482,PUPPET_IP=PUPPET_IP,PUPPET_HOSTNAME=PUPPET_HOSTNAME,PUPPET_ENV=PUPPET_ENV,HEARTBEAT_AUTHKEY=20c9629a87f53ecdb5278d2ddb5a9d42,TRUSTSTORE_PASSWORD=wso2carbon,CEP_PORT=7611,MONITORING_SERVER_SECURE_PORT=0,MB_PORT=61616,OPENSTACK_COMPUTE_DNS=10.58.10.82,MB_IP=octl-01.qmog.cisco.com,QSB_PUPPET_ENVIR=,CEP_IP=octl-01.qmog.cisco.com,VSM_USER=admin,VEM_IP=192.168.66.43,ENABLE_DATA_PUBLISHER=false,MONITORING_SERVER_ADMIN_PASSWORD=xxxx,MONITORING_SERVER_IP=octl-01.qmog.cisco.com,VEM_USER=ubuntu,VEM_PWD=ubuntu,COMMIT_ENABLED=false,MONITORING_SERVER_ADMIN_USERNAME=xxxx,CERT_TRUSTSTORE=/opt/apache-stratos-cartridge-agent/security/client-truststore.jks,VSM_PWD=Starent123!,VSM_IP=192.168.66.2,MONITORING_SERVER_PORT=0,APPMGR_GITREPO=ssh://jenapper@10.58.10.189/home/jenapper/code/eccentrica.git,MEMBER_ID=cisco-gilan-appmgr-1.cisco-gil7ef7327f-2bb2-4768-820f-d064de29aa59,LB_CLUSTER_ID=null,NETWORK_PARTITION_ID=OAM1,PARTITION_ID=RegionOne-AZ-1
>  {org.apache.stratos.cloud.controller.impl.CloudControllerServiceImpl}
> TID: [0] [STRATOS] [2014-07-15 09:58:55,888]  INFO 
> {org.apache.stratos.cloud.controller.impl.CloudControllerServiceImpl} -  
> Member is terminated: MemberContext 
> [memberId=cisco-gilan-appmgr-1.cisco-gil407f5bdc-aad2-4234-80fc-6cdf17be6192, 
> nodeId=RegionOne/89433818-21ed-48d4-bd8f-c396ab30f6d2, 
> clusterId=cisco-gilan-appmgr-1.cisco-gil, cartridgeType=cisco-gilan-appmgr, 
> privateIpAddress=192.168.66.1, publicIpAddress=null, allocatedIpAddress=null, 
> initTime=1405417410736, lbClusterId=null, networkPartitionId=OAM1] 
> {org.apache.stratos.cloud.controller.impl.CloudControllerServiceImpl}
> ===================
>  
> The problem is that Stratos gives no indication of why it is doing this [1]. 
> Stratos should be enhanced so that the above message gives some indication of 
> *why* the member is being terminated (loss of heartbeats, timeout on port 
> knocking etc. etc.). This is needed as apache stratos expands it's user base.
> This issue has high priority as it affects the efficiency of troubleshooting 
> and system stability.
>  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to