Martin Eppel created STRATOS-706:
------------------------------------

             Summary: member terminate event should log reason
                 Key: STRATOS-706
                 URL: https://issues.apache.org/jira/browse/STRATOS-706
             Project: Stratos
          Issue Type: Bug
          Components: Autoscaler
    Affects Versions: 4.0.0
            Reporter: Martin Eppel
             Fix For: 4.0.1


When Stratos terminates a member it must log the reason for it. Ideally the 
logging should be systematic enough so that one can grep for different 
severity, or by member, or by event type or some other useful categorization.
The justification for this defect is that it will improve greatly debugging and 
troubleshooting capabilities. Without logging it is very difficult to debug 
terminations of members.
 
For example, consider this sequence in the stratos log file:
 
===================
TID: [0] [STRATOS] [2014-07-15 09:58:48,654] DEBUG 
{org.apache.stratos.cloud.controller.impl.CloudControllerServiceImpl} -  
Received an instance spawn request : MemberContext [memberId=null, nodeId=null, 
clusterId=cisco-gilan-appmgr-1.cisco-gil, cartridgeType=null, 
privateIpAddress=null, publicIpAddress=null, allocatedIpAddress=null, 
initTime=1405418328649, lbClusterId=null, networkPartitionId=OAM1] 
{org.apache.stratos.cloud.controller.impl.CloudControllerServiceImpl}
TID: [0] [STRATOS] [2014-07-15 09:58:48,654] DEBUG 
{org.apache.stratos.cloud.controller.impl.CloudControllerServiceImpl} -  
Payload: 
SERVICE_NAME=cisco-gilan-appmgr,HOST_NAME=cisco-gilan-appmgr-1.qmog.cisco.com,MULTITENANT=false,TENANT_ID=-1234,TENANT_RANGE=-1234,CARTRIDGE_ALIAS=cisco-gilan-appmgr-1,CLUSTER_ID=cisco-gilan-appmgr-1.cisco-gil,CARTRIDGE_KEY=o1jbiPPmPWBgyNVM,DEPLOYMENT=default,REPO_URL=null,PORTS=9482,PUPPET_IP=PUPPET_IP,PUPPET_HOSTNAME=PUPPET_HOSTNAME,PUPPET_ENV=PUPPET_ENV,HEARTBEAT_AUTHKEY=20c9629a87f53ecdb5278d2ddb5a9d42,TRUSTSTORE_PASSWORD=wso2carbon,CEP_PORT=7611,MONITORING_SERVER_SECURE_PORT=0,MB_PORT=61616,OPENSTACK_COMPUTE_DNS=10.58.10.82,MB_IP=octl-01.qmog.cisco.com,QSB_PUPPET_ENVIR=,CEP_IP=octl-01.qmog.cisco.com,VSM_USER=admin,VEM_IP=192.168.66.43,ENABLE_DATA_PUBLISHER=false,MONITORING_SERVER_ADMIN_PASSWORD=xxxx,MONITORING_SERVER_IP=octl-01.qmog.cisco.com,VEM_USER=ubuntu,VEM_PWD=ubuntu,COMMIT_ENABLED=false,MONITORING_SERVER_ADMIN_USERNAME=xxxx,CERT_TRUSTSTORE=/opt/apache-stratos-cartridge-agent/security/client-truststore.jks,VSM_PWD=Starent123!,VSM_IP=192.168.66.2,MONITORING_SERVER_PORT=0,APPMGR_GITREPO=ssh://jenapper@10.58.10.189/home/jenapper/code/eccentrica.git,MEMBER_ID=cisco-gilan-appmgr-1.cisco-gil7ef7327f-2bb2-4768-820f-d064de29aa59,LB_CLUSTER_ID=null,NETWORK_PARTITION_ID=OAM1,PARTITION_ID=RegionOne-AZ-1
 {org.apache.stratos.cloud.controller.impl.CloudControllerServiceImpl}
TID: [0] [STRATOS] [2014-07-15 09:58:55,888]  INFO 
{org.apache.stratos.cloud.controller.impl.CloudControllerServiceImpl} -  Member 
is terminated: MemberContext 
[memberId=cisco-gilan-appmgr-1.cisco-gil407f5bdc-aad2-4234-80fc-6cdf17be6192, 
nodeId=RegionOne/89433818-21ed-48d4-bd8f-c396ab30f6d2, 
clusterId=cisco-gilan-appmgr-1.cisco-gil, cartridgeType=cisco-gilan-appmgr, 
privateIpAddress=192.168.66.1, publicIpAddress=null, allocatedIpAddress=null, 
initTime=1405417410736, lbClusterId=null, networkPartitionId=OAM1] 
{org.apache.stratos.cloud.controller.impl.CloudControllerServiceImpl}
===================
 
The problem is that Stratos gives no indication of why it is doing this [1]. 
Stratos should be enhanced so that the above message gives some indication of 
*why* the member is being terminated (loss of heartbeats, timeout on port 
knocking etc. etc.). This is needed as apache stratos expands it's user base.
This issue has high priority as it affects the efficiency of troubleshooting 
and system stability.
 




--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to