Hi,

I found the cause for this. Let me summarize.

   - AS asks CC to create a container
   - CC schedule a new task and returning member contexts to AS
   - AS adding the members to pending list
   - CC is also adding member contexts to its data holder
   - But CC will not add it to the topology until pods status become to
   "Running" state
   - Meanwhile members are exceeding their timeouts and moved to obsolete
   list
   - AS asks CC to terminate obsolete pods
   - CC kills it, but didn't send the member terminated event, because
   member is not in the topology
   - AS didn't get any member terminated event, so it will not remove any
   members from its member lists
   - AS will ask CC again to terminate the pod, in next monitor interval
   - CC now will complain "Failed to terminate member. Member id not found",
   because it terminated those pods already and removed the member contexts
   from its data holder
   - So this "Failed to terminate member. Member id not found" will occur
   continuously

We can fix it as follows;

After terminating an instance successfully, CC will check whether the
member is in the topology. If it is there, it will follow the normal path
(that is, removing the member from the topology and send member terminated
event).
If it is not there, CC will add it to the topology, send member created
event and send member terminated soon after that. Now AS will get this
event and remove the member from its lists. We can't send member terminated
events for members which are not in the topology. If we do so, these events
will be rejected by message processors. That's the reason I am proposing to
add it to the topology and then send the member terminated event soon after
this. Then message processors will proceed because member is in the
topology.

wdyt?

Thanks.

On Tue, Oct 28, 2014 at 8:00 AM, Rajkumar Rajaratnam <rajkum...@wso2.com>
wrote:

> Hi Chamila,
>
> This happens when the container is actually killed by CC, but CC didn't
> send the member terminated event. Hence AS didn't remove the member from
> obsolete list. In next cycle it is asking CC again to terminated the
> instance. Since CC terminated it already, it will report this error. I
> guess you didn't enable DEBUG log for AS and CC. That's why you couldn't
> see these happened.
>
> I observed, this happens when it is getting bit long time to terminate the
> pods. That's why we increased the waiting time in CC, but seems it is
> happening again.
>
> We will look into it.
>
> Thanks
>
> On Tue, Oct 28, 2014 at 1:15 AM, Chamila De Alwis <chami...@wso2.com>
> wrote:
>
>> Hi,
>>
>> I'm doing a test run with the Python cartridge agent in Kubernetes. I
>> have three pods running, with the two machine setup (discovery +
>> master/minion). Stratos is from master, built about 8 hours ago.
>>
>> After a while the following error started appearing.
>>
>>
>> [2014-10-28 01:03:07,947]  INFO
>> {org.apache.stratos.autoscaler.KubernetesClusterContext} -  Pending state
>> of member: *5c05dd5f-5e0f-11e4-be71-080027c9178c* is expired. Adding as
>> an obsoleted member.
>> [2014-10-28 01:03:07,947]  INFO
>> {org.apache.stratos.autoscaler.KubernetesClusterContext} -  Pending state
>> of member: 5c065407-5e0f-11e4-be71-080027c9178c is expired. Adding as an
>> obsoleted member.
>> [2014-10-28 01:03:07,947]  INFO
>> {org.apache.stratos.autoscaler.KubernetesClusterContext} -  Pending state
>> of member: 5c048055-5e0f-11e4-be71-080027c9178c is expired. Adding as an
>> obsoleted member.
>> [2014-10-28 01:03:07,948]  INFO
>> {org.apache.stratos.autoscaler.KubernetesClusterContext} -  Pending state
>> of member: 5c053dae-5e0f-11e4-be71-080027c9178c is expired. Adding as an
>> obsoleted member.
>> [2014-10-28 01:03:07,948]  INFO
>> {org.apache.stratos.autoscaler.KubernetesClusterContext} -  Pending state
>> of member: 5c06ba55-5e0f-11e4-be71-080027c9178c is expired. Adding as an
>> obsoleted member.
>> [2014-10-28 01:03:07,948]  INFO
>> {org.apache.stratos.autoscaler.KubernetesClusterContext} -  Pending state
>> of member: 5c061e70-5e0f-11e4-be71-080027c9178c is expired. Adding as an
>> obsoleted member.
>> [2014-10-28 01:03:07,948]  INFO
>> {org.apache.stratos.autoscaler.KubernetesClusterContext} -  Pending state
>> of member: 5c05718b-5e0f-11e4-be71-080027c9178c is expired. Adding as an
>> obsoleted member.
>> [2014-10-28 01:03:54,198]  INFO
>> {org.apache.stratos.autoscaler.rule.RuleLog} -  Running minimum rule:
>> [kub-cluster] KubGrp1 [cluster] php.php.domain
>> [2014-10-28 01:03:54,199]  INFO
>> {org.apache.stratos.autoscaler.rule.RuleLog} -  [min-check]  [cluster] :
>> php.php.domain [Replicas] nonTerminated : 3
>> [2014-10-28 01:03:54,199]  INFO
>> {org.apache.stratos.autoscaler.rule.RuleLog} -  [min-check]  [cluster] :
>> php.php.domain [Replicas] minReplicas : 3
>> [2014-10-28 01:03:54,199]  INFO
>> {org.apache.stratos.autoscaler.rule.RuleLog} -  Running obsolete containers
>> rule [kub-cluster] : KubGrp1 [cluster] : php.php.domain
>> [2014-10-28 01:03:54,200]  INFO
>> {org.apache.stratos.autoscaler.rule.RuleLog} -  [obsolete-check] [cluster]
>> : php.php.domain [Replicas] obsoleteReplicas : 7
>> [2014-10-28 01:03:54,201]  INFO
>> {org.apache.stratos.autoscaler.rule.RuleLog} -  [obsolete-check]
>> Terminating the obsolete member with id :
>> 5c05dd5f-5e0f-11e4-be71-080027c9178c in the cluster : php.php.domain
>> [2014-10-28 01:03:54,218] ERROR
>> {org.apache.stratos.cloud.controller.impl.CloudControllerServiceImpl} -
>> Failed to terminate member. Member id not found. [Member id]
>> *5c05dd5f-5e0f-11e4-be71-080027c9178c*
>> [2014-10-28 01:03:54,244] ERROR
>> {org.apache.axis2.rpc.receivers.RPCMessageReceiver} -  Failed to terminate
>> member. Member id not found. [Member id]
>> 5c05dd5f-5e0f-11e4-be71-080027c9178c
>> java.lang.reflect.InvocationTargetException
>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>     at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>     at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>     at java.lang.reflect.Method.invoke(Method.java:601)
>>     at
>> org.apache.axis2.rpc.receivers.RPCUtil.invokeServiceClass(RPCUtil.java:212)
>>     at
>> org.apache.axis2.rpc.receivers.RPCMessageReceiver.invokeBusinessLogic(RPCMessageReceiver.java:117)
>>     at
>> org.apache.axis2.receivers.AbstractInOutMessageReceiver.invokeBusinessLogic(AbstractInOutMessageReceiver.java:40)
>>     at
>> org.apache.axis2.receivers.AbstractMessageReceiver.receive(AbstractMessageReceiver.java:110)
>>     at org.apache.axis2.engine.AxisEngine.receive(AxisEngine.java:180)
>>     at
>> org.apache.axis2.transport.http.HTTPTransportUtils.processHTTPPostRequest(HTTPTransportUtils.java:172)
>>     at
>> org.apache.axis2.transport.http.AxisServlet.doPost(AxisServlet.java:146)
>>     at
>> org.wso2.carbon.core.transports.CarbonServlet.doPost(CarbonServlet.java:231)
>>     at javax.servlet.http.HttpServlet.service(HttpServlet.java:755)
>>     at javax.servlet.http.HttpServlet.service(HttpServlet.java:848)
>>     at
>> org.eclipse.equinox.http.servlet.internal.ServletRegistration.service(ServletRegistration.java:61)
>>     at
>> org.eclipse.equinox.http.servlet.internal.ProxyServlet.processAlias(ProxyServlet.java:128)
>>     at
>> org.eclipse.equinox.http.servlet.internal.ProxyServlet.service(ProxyServlet.java:68)
>>     at javax.servlet.http.HttpServlet.service(HttpServlet.java:848)
>>     at
>> org.wso2.carbon.tomcat.ext.servlet.DelegationServlet.service(DelegationServlet.java:68)
>>     at
>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:305)
>>     at
>> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
>>     at
>> org.wso2.carbon.tomcat.ext.filter.CharacterSetFilter.doFilter(CharacterSetFilter.java:61)
>>     at
>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
>>     at
>> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
>>     at
>> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)
>>     at
>> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
>>     at
>> org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:472)
>>     at
>> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
>>     at
>> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)
>>     at
>> org.wso2.carbon.tomcat.ext.valves.CompositeValve.continueInvocation(CompositeValve.java:178)
>>     at
>> org.wso2.carbon.tomcat.ext.valves.CarbonTomcatValve$1.invoke(CarbonTomcatValve.java:47)
>>     at
>> org.wso2.carbon.webapp.mgt.TenantLazyLoaderValve.invoke(TenantLazyLoaderValve.java:56)
>>     at
>> org.wso2.carbon.tomcat.ext.valves.TomcatValveContainer.invokeValves(TomcatValveContainer.java:47)
>>     at
>> org.wso2.carbon.tomcat.ext.valves.CompositeValve.invoke(CompositeValve.java:141)
>>     at
>> org.wso2.carbon.tomcat.ext.valves.CarbonStuckThreadDetectionValve.invoke(CarbonStuckThreadDetectionValve.java:156)
>>     at
>> org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:936)
>>     at
>> org.wso2.carbon.tomcat.ext.valves.CarbonContextCreatorValve.invoke(CarbonContextCreatorValve.java:52)
>>     at
>> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
>>     at
>> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
>>     at
>> org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1004)
>>     at
>> org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589)
>>     at
>> org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1653)
>>     at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>>     at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>>     at java.lang.Thread.run(Thread.java:722)
>> Caused by: java.lang.IllegalArgumentException: Failed to terminate
>> member. Member id not found. [Member id]
>> 5c05dd5f-5e0f-11e4-be71-080027c9178c
>>     at
>> org.apache.stratos.cloud.controller.impl.CloudControllerServiceImpl.handleNullObject(CloudControllerServiceImpl.java:1758)
>>     at
>> org.apache.stratos.cloud.controller.impl.CloudControllerServiceImpl.terminateContainer(CloudControllerServiceImpl.java:1715)
>>     ... 45 more
>>
>> Kubernetes tried to spawn these containers based on an abnormally high
>> value from the CEP, which is being discussed in another thread.
>>
>> [2014-10-28 00:57:54,204]  INFO
>> {org.apache.stratos.autoscaler.rule.RuleLog} -  [scaling]  [cluster] :
>> php.php.domain [Replicas] minReplicas : 3
>> [2014-10-28 00:57:54,205]  INFO
>> {org.apache.stratos.autoscaler.rule.RuleLog} -  [scaling]  [cluster] :
>> php.php.domain [Replicas] maxReplicas : 10
>> [2014-10-28 00:57:54,205]  INFO
>> {org.apache.stratos.autoscaler.rule.RuleLog} -  [scaling]  [cluster] :
>> php.php.domain [Replicas] nonTerminated : 3
>> [2014-10-28 00:57:54,205]  INFO
>> {org.apache.stratos.autoscaler.rule.RuleLog} -  [scaling]  [cluster] :
>> php.php.domain [Replicas] activeReplicas : 3
>> [2014-10-28 00:57:54,205]  INFO
>> {org.apache.stratos.autoscaler.rule.RuleLog} -  [scaling]  [cluster] :
>> php.php.domain [RequestInFlight] predicted value : 0.0
>> [2014-10-28 00:57:54,206]  INFO
>> {org.apache.stratos.autoscaler.rule.RuleLog} -  [scaling]  [cluster] :
>> php.php.domain [RequestInFlight] upper limit : 80.0
>> [2014-10-28 00:57:54,206]  INFO
>> {org.apache.stratos.autoscaler.rule.RuleLog} -  [scaling]  [cluster] :
>> php.php.domain [RequestInFlight] lower limit : 5.0
>> [2014-10-28 00:57:54,206]  INFO
>> {org.apache.stratos.autoscaler.rule.RuleLog} -  [scaling]  [cluster] :
>> php.php.domain [MemoryConsumption] predicted value : *15418.244003295898*
>> [2014-10-28 00:57:54,206]  INFO
>> {org.apache.stratos.autoscaler.rule.RuleLog} -  [scaling]  [cluster] :
>> php.php.domain [MemoryConsumption] upper limit : 80.0
>> [2014-10-28 00:57:54,207]  INFO
>> {org.apache.stratos.autoscaler.rule.RuleLog} -  [scaling]  [cluster] :
>> php.php.domain [MemoryConsumption] lower limit : 15.0
>> [2014-10-28 00:57:54,207]  INFO
>> {org.apache.stratos.autoscaler.rule.RuleLog} -  [scaling]  [cluster] :
>> php.php.domain [LoadAverage] predicted value : *456648.05810546875*
>> [2014-10-28 00:57:54,207]  INFO
>> {org.apache.stratos.autoscaler.rule.RuleLog} -  [scaling]  [cluster] :
>> php.php.domain [LoadAverage] upper limit : 180.0
>> [2014-10-28 00:57:54,207]  INFO
>> {org.apache.stratos.autoscaler.rule.RuleLog} -  [scaling]  [cluster] :
>> php.php.domain [LoadAverage] lower limit : 20.0
>> [2014-10-28 00:57:54,207]  INFO
>> {org.apache.stratos.autoscaler.rule.RuleLog} -  [scaling]  [cluster] :
>> php.php.domain *scale-up action : true*
>> [2014-10-28 00:57:54,207]  INFO
>> {org.apache.stratos.autoscaler.rule.RuleLog} -  [scaling]  [cluster] :
>> php.php.domain scale-down action : false
>> [2014-10-28 00:57:54,209]  INFO
>> {org.apache.stratos.autoscaler.rule.RuleLog} -  [scaling]  [cluster] :
>> php.php.domain [MemoryConsumption] predicted replicas : 579
>> [2014-10-28 00:57:54,242]  INFO
>> {org.apache.stratos.autoscaler.rule.RuleLog} -  [scaling]  [cluster] :
>> php.php.domain [LoadAverage] predicted replicas : 7611
>> [2014-10-28 00:57:54,242]  INFO
>> {org.apache.stratos.autoscaler.rule.RuleLog} -  [scaling]  [cluster] :
>> php.php.domain predicted replicas > max replicas :
>> [2014-10-28 00:57:54,242]  INFO
>> {org.apache.stratos.autoscaler.rule.RuleLog} -  [scaling] *Decided to
>> scale-up : [cluster] : php.php.domain*
>> [2014-10-28 00:57:54,243]  INFO
>> {org.apache.stratos.autoscaler.rule.RuleLog} -  [scaling-up]  [cluster] :
>> php.php.domain valid number of replicas to expand : 10
>> [2014-10-28 00:57:54,243]  INFO
>> {org.apache.stratos.autoscaler.client.cloud.controller.CloudControllerClient}
>> -  Updating kubernetes replication controller via cloud controller:
>> [cluster] php.php.domain [replicas] 10
>> [2014-10-28 00:58:01,001] DEBUG
>> {org.apache.stratos.manager.publisher.TenantSynzhronizerTask} -  Publishing
>> complete tenant event
>> [2014-10-28 00:58:04,502]  INFO
>> {org.apache.stratos.cloud.controller.impl.CloudControllerServiceImpl} -
>> Kubernetes entities are successfully starting up. [MemberContext [memberId=
>> *5c05dd5f-5e0f-11e4-be71-080027c9178c*, nodeId=null, instanceId=null,
>> clusterId=php.php.domain, partition=null, cartridgeType=php,
>> privateIpAddress=null, publicIpAddress=null, allocatedIpAddress=null,
>> initTime=1414438084398, lbClusterId=null, networkPartitionId=null,
>> properties=Properties [properties=[Property
>> [name=ALLOCATED_SERVICE_HOST_PORT, value=4500]]]], MemberContext
>> [memberId=5c065407-5e0f-11e4-be71-080027c9178c, nodeId=null,
>> instanceId=null, clusterId=php.php.domain, partition=null,
>> cartridgeType=php, privateIpAddress=null, publicIpAddress=null,
>> allocatedIpAddress=null, initTime=1414438084398, lbClusterId=null,
>> networkPartitionId=null, properties=Properties [properties=[Property
>> [name=ALLOCATED_SERVICE_HOST_PORT, value=4500]]]], MemberContext
>> [memberId=5c048055-5e0f-11e4-be71-080027c9178c, nodeId=null,
>> instanceId=null, clusterId=php.php.domain, partition=null,
>> cartridgeType=php, privateIpAddress=null, publicIpAddress=null,
>> allocatedIpAddress=null, initTime=1414438084398, lbClusterId=null,
>> networkPartitionId=null, properties=Properties [properties=[Property
>> [name=ALLOCATED_SERVICE_HOST_PORT, value=4500]]]], MemberContext
>> [memberId=5c053dae-5e0f-11e4-be71-080027c9178c, nodeId=null,
>> instanceId=null, clusterId=php.php.domain, partition=null,
>> cartridgeType=php, privateIpAddress=null, publicIpAddress=null,
>> allocatedIpAddress=null, initTime=1414438084398, lbClusterId=null,
>> networkPartitionId=null, properties=Properties [properties=[Property
>> [name=ALLOCATED_SERVICE_HOST_PORT, value=4500]]]], MemberContext
>> [memberId=5c06ba55-5e0f-11e4-be71-080027c9178c, nodeId=null,
>> instanceId=null, clusterId=php.php.domain, partition=null,
>> cartridgeType=php, privateIpAddress=null, publicIpAddress=null,
>> allocatedIpAddress=null, initTime=1414438084398, lbClusterId=null,
>> networkPartitionId=null, properties=Properties [properties=[Property
>> [name=ALLOCATED_SERVICE_HOST_PORT, value=4500]]]], MemberContext
>> [memberId=5c061e70-5e0f-11e4-be71-080027c9178c, nodeId=null,
>> instanceId=null, clusterId=php.php.domain, partition=null,
>> cartridgeType=php, privateIpAddress=null, publicIpAddress=null,
>> allocatedIpAddress=null, initTime=1414438084398, lbClusterId=null,
>> networkPartitionId=null, properties=Properties [properties=[Property
>> [name=ALLOCATED_SERVICE_HOST_PORT, value=4500]]]], MemberContext
>> [memberId=5c05718b-5e0f-11e4-be71-080027c9178c, nodeId=null,
>> instanceId=null, clusterId=php.php.domain, partition=null,
>> cartridgeType=php, privateIpAddress=null, publicIpAddress=null,
>> allocatedIpAddress=null, initTime=1414438084398, lbClusterId=null,
>> networkPartitionId=null, properties=Properties [properties=[Property
>> [name=ALLOCATED_SERVICE_HOST_PORT, value=4500]]]]]
>>
>>
>>
>>
>>
>> Regards,
>> Chamila de Alwis
>> Software Engineer | WSO2 | +94772207163
>> Blog: code.chamiladealwis.com
>>
>>
>>
>
>
> --
> Rajkumar Rajaratnam
> Software Engineer | WSO2, Inc.
> Mobile +94777568639 | +94783498120
>



-- 
Rajkumar Rajaratnam
Software Engineer | WSO2, Inc.
Mobile +94777568639 | +94783498120

Reply via email to