[jira] [Commented] (STRATOS-706) member terminate event should log reason

Martin Eppel (meppel) (JIRA) Thu, 17 Jul 2014 00:06:08 -0700

    [ 
https://issues.apache.org/jira/browse/STRATOS-706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14064650#comment-14064650
 ]


Martin Eppel (meppel) commented on STRATOS-706:
-----------------------------------------------

Ok,

let us verify that the logs from the autoscaler will satisfy the request for 
sufficient logging, if yes I’ll close it otherwise update the JIRA
Thanks

Martin


From: Udara Liyanage [mailto:[email protected]]
Sent: Wednesday, July 16, 2014 9:24 PM
To: dev
Cc: [email protected]
Subject: Re: [jira] [Commented] (STRATOS-706) member terminate event should log 
reason

Hi Martin,

The job of the CC is to spawn/terminate instances. AS is the one who decides 
when/what to start and when to terminate. So as Nirmal said have a look at the 
AS logs in order to find the reason for termination/spawning.

On Thu, Jul 17, 2014 at 5:09 AM, Nirmal Fernando (JIRA) 
<[email protected]<mailto:[email protected]>> wrote:

    [ 
https://issues.apache.org/jira/browse/STRATOS-706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14064337#comment-14064337
 ]

Nirmal Fernando commented on STRATOS-706:
-----------------------------------------

On Thu, Jul 17, 2014 at 1:11 AM, Martin Eppel (JIRA) 
<[email protected]<mailto:[email protected]>>


All the log file you quoted is from Cloud Controller. And what CC does is
providing an API to terminate instances. The caller of this API, i.e.
auto-scaler is the one who logs the reason for calling CC to terminate
instances. Did you check auto-scaler logs?




--
Best Regards,
Nirmal

Nirmal Fernando.
PPMC Member & Committer of Apache Stratos,
Senior Software Engineer, WSO2 Inc.

Blog: http://nirmalfdo.blogspot.com/





--
This message was sent by Atlassian JIRA
(v6.2#6252)



--

Udara Liyanage
Software Engineer
WSO2, Inc.: http://wso2.com<http://wso2.com/>
lean. enterprise. middleware
web: http://udaraliyanage.wordpress.com
phone: +94 71 443 6897


> member terminate event should log reason
> ----------------------------------------
>
>                 Key: STRATOS-706
>                 URL: https://issues.apache.org/jira/browse/STRATOS-706
>             Project: Stratos
>          Issue Type: Bug
>          Components: Autoscaler
>    Affects Versions: 4.0.0
>            Reporter: Martin Eppel
>             Fix For: 4.0.1
>
>
> When Stratos terminates a member it must log the reason for it. Ideally the 
> logging should be systematic enough so that one can grep for different 
> severity, or by member, or by event type or some other useful categorization.
> The justification for this defect is that it will improve greatly debugging 
> and troubleshooting capabilities. Without logging it is very difficult to 
> debug terminations of members.
>  
> For example, consider this sequence in the stratos log file:
>  
> ===================
> TID: [0] [STRATOS] [2014-07-15 09:58:48,654] DEBUG 
> {org.apache.stratos.cloud.controller.impl.CloudControllerServiceImpl} -  
> Received an instance spawn request : MemberContext [memberId=null, 
> nodeId=null, clusterId=cisco-gilan-appmgr-1.cisco-gil, cartridgeType=null, 
> privateIpAddress=null, publicIpAddress=null, allocatedIpAddress=null, 
> initTime=1405418328649, lbClusterId=null, networkPartitionId=OAM1] 
> {org.apache.stratos.cloud.controller.impl.CloudControllerServiceImpl}
> TID: [0] [STRATOS] [2014-07-15 09:58:48,654] DEBUG 
> {org.apache.stratos.cloud.controller.impl.CloudControllerServiceImpl} -  
> Payload: 
> SERVICE_NAME=cisco-gilan-appmgr,HOST_NAME=cisco-gilan-appmgr-1.qmog.cisco.com,MULTITENANT=false,TENANT_ID=-1234,TENANT_RANGE=-1234,CARTRIDGE_ALIAS=cisco-gilan-appmgr-1,CLUSTER_ID=cisco-gilan-appmgr-1.cisco-gil,CARTRIDGE_KEY=o1jbiPPmPWBgyNVM,DEPLOYMENT=default,REPO_URL=null,PORTS=9482,PUPPET_IP=PUPPET_IP,PUPPET_HOSTNAME=PUPPET_HOSTNAME,PUPPET_ENV=PUPPET_ENV,HEARTBEAT_AUTHKEY=20c9629a87f53ecdb5278d2ddb5a9d42,TRUSTSTORE_PASSWORD=wso2carbon,CEP_PORT=7611,MONITORING_SERVER_SECURE_PORT=0,MB_PORT=61616,OPENSTACK_COMPUTE_DNS=10.58.10.82,MB_IP=octl-01.qmog.cisco.com,QSB_PUPPET_ENVIR=,CEP_IP=octl-01.qmog.cisco.com,VSM_USER=admin,VEM_IP=192.168.66.43,ENABLE_DATA_PUBLISHER=false,MONITORING_SERVER_ADMIN_PASSWORD=xxxx,MONITORING_SERVER_IP=octl-01.qmog.cisco.com,VEM_USER=ubuntu,VEM_PWD=ubuntu,COMMIT_ENABLED=false,MONITORING_SERVER_ADMIN_USERNAME=xxxx,CERT_TRUSTSTORE=/opt/apache-stratos-cartridge-agent/security/client-truststore.jks,VSM_PWD=Starent123!,VSM_IP=192.168.66.2,MONITORING_SERVER_PORT=0,APPMGR_GITREPO=ssh://[email protected]/home/jenapper/code/eccentrica.git,MEMBER_ID=cisco-gilan-appmgr-1.cisco-gil7ef7327f-2bb2-4768-820f-d064de29aa59,LB_CLUSTER_ID=null,NETWORK_PARTITION_ID=OAM1,PARTITION_ID=RegionOne-AZ-1
>  {org.apache.stratos.cloud.controller.impl.CloudControllerServiceImpl}
> TID: [0] [STRATOS] [2014-07-15 09:58:55,888]  INFO 
> {org.apache.stratos.cloud.controller.impl.CloudControllerServiceImpl} -  
> Member is terminated: MemberContext 
> [memberId=cisco-gilan-appmgr-1.cisco-gil407f5bdc-aad2-4234-80fc-6cdf17be6192, 
> nodeId=RegionOne/89433818-21ed-48d4-bd8f-c396ab30f6d2, 
> clusterId=cisco-gilan-appmgr-1.cisco-gil, cartridgeType=cisco-gilan-appmgr, 
> privateIpAddress=192.168.66.1, publicIpAddress=null, allocatedIpAddress=null, 
> initTime=1405417410736, lbClusterId=null, networkPartitionId=OAM1] 
> {org.apache.stratos.cloud.controller.impl.CloudControllerServiceImpl}
> ===================
>  
> The problem is that Stratos gives no indication of why it is doing this [1]. 
> Stratos should be enhanced so that the above message gives some indication of 
> *why* the member is being terminated (loss of heartbeats, timeout on port 
> knocking etc. etc.). This is needed as apache stratos expands it's user base.
> This issue has high priority as it affects the efficiency of troubleshooting 
> and system stability.
>  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (STRATOS-706) member terminate event should log reason

Reply via email to