Hi Imesh, > Ideally cartridge agent should only wait for Complete Topology event once in > its lifecycle. If it is waiting more than once then there is an issue.
That’s not the problem, it only waits once for the complete topology. To me it looks like the topology is never updated, or if it is, then it’s not clear to me how that’s happening? It looks like the Python cartridge agent for example does call an ‘update’ method: https://github.com/apache/stratos/blob/22fdf78be8a62312a65b23e017f0de20cfad82b2/components/org.apache.stratos.python.cartridge.agent/src/main/python/cartridge.agent/cartridge.agent/agent.py#L250 <https://github.com/apache/stratos/blob/22fdf78be8a62312a65b23e017f0de20cfad82b2/components/org.apache.stratos.python.cartridge.agent/src/main/python/cartridge.agent/cartridge.agent/agent.py#L250> - I don’t see anything similar in the Java cartridge agent. Please could someone confirm whether this is the case, and perhaps explain how updating the topology is supposed to work in the Java cartridge agent? Best regards, Michiel On 20 Apr 2015, at 19:05, Imesh Gunaratne <im...@apache.org> wrote: > Hi Michiel, > > It's a pleasure! My guess is that either cartridge agent has been restarted > or there is a bug in its logic. > > Ideally cartridge agent should only wait for Complete Topology event once in > its lifecycle. If it is waiting more than once then there is an issue. > > Thanks > > On Mon, Apr 20, 2015 at 11:06 PM, Michiel Blokzijl (mblokzij) > <mblok...@cisco.com <mailto:mblok...@cisco.com>> wrote: > HI Imesh, > > Thanks for replying, > >> This issue might occur if the cartridge agent start processing member events >> before consuming Complete Topology event. > > > The issue happened way after that, I had Stratos running for a day or so, and > in the logs I saw some “waiting for complete topology event ..” but they went > away pretty quickly (way before this happened). > > Is this the code that’s supposed to do the updates? > https://github.com/apache/stratos/blob/4.0.0/components/org.apache.stratos.cartridge.agent/src/main/java/org/apache/stratos/cartridge/agent/extensions/DefaultExtensionHandler.java#L328 > > <https://github.com/apache/stratos/blob/4.0.0/components/org.apache.stratos.cartridge.agent/src/main/java/org/apache/stratos/cartridge/agent/extensions/DefaultExtensionHandler.java#L328> > > Because I don’t see anything that actually updates anything (beyond > function-local variables like ‘env').. > > Michiel > > On 20 Apr 2015, at 18:13, Imesh Gunaratne <im...@apache.org > <mailto:im...@apache.org>> wrote: > >> Hi Michiel, >> >> This issue might occur if the cartridge agent start processing member events >> before consuming Complete Topology event. >> >> This is how the topology get initialized in any component that listen to >> topology topic in message broker; First of all when the component starts up >> it waits for the Complete Topology event to receive. This event is >> periodically published by Cloud Controller with the entire topology of a >> given moment of time. >> >> Once it is received the component would initialize the local topology and >> start listening to other events. Since Complete Topology event has given the >> latest state of the topology now the component can consume any other event >> published afterwards. >> >> Thanks >> >> >> >> On Mon, Apr 20, 2015 at 7:44 PM, Michiel Blokzijl (mblokzij) >> <mblok...@cisco.com <mailto:mblok...@cisco.com>> wrote: >> Hi, >> I’m looking at an issue with Stratos 4.0.0 code, and I’m having an issue >> with the cartridge agent. It complains about the topology being >> inconsistent, triggered by this code [1]. >> >> This causes the extension handler not to fire for cartridges going down. >> >> [2015-04-19 07:19:22,486] INFO - [MemberTerminatedMessageProcessor] Member >> terminated: [service] XXX [cluster] XXX [member] >> XXX-0.dom2a4618d5-edd9-4a99-9d9c-918715c761bd >> [2015-04-19 07:19:22,486] INFO - [DefaultExtensionHandler] Member >> terminated event received: [service] XXX [cluster] XX [member] >> XXX-0.dom2a4618d5-edd9-4a99-9d9c-918715c761bd >> [2015-04-19 07:19:22,486] ERROR - [ExtensionUtils] Member id not found in >> topology [member] XXXX.dom2a4618d5-edd9-4a99-9d9c-918715c761bd >> [2015-04-19 07:19:22,486] ERROR - [DefaultExtensionHandler] Topology is >> inconsistent...failed to execute member terminated event >> >> Any idea what’s going wrong here? >> >> I assume the topology isn’t being maintained correctly for some reason, but >> I haven’t quite figured out how/if the topology is being maintained at all. >> Looking at the complete topology event handler [2] for example, it doesn’t >> actually update the internally stored topology.. There’s nothing in the >> cartridge agent that calls the topology manager’s acquireWriteLock function.. >> >> Best regards, >> >> Michiel >> >> [1] >> https://github.com/apache/stratos/blob/4.0.0/components/org.apache.stratos.cartridge.agent/src/main/java/org/apache/stratos/cartridge/agent/extensions/DefaultExtensionHandler.java#L374 >> >> <https://github.com/apache/stratos/blob/4.0.0/components/org.apache.stratos.cartridge.agent/src/main/java/org/apache/stratos/cartridge/agent/extensions/DefaultExtensionHandler.java#L374> >> >> [2] >> https://github.com/apache/stratos/blob/4.0.0/components/org.apache.stratos.cartridge.agent/src/main/java/org/apache/stratos/cartridge/agent/extensions/DefaultExtensionHandler.java#L328 >> >> <https://github.com/apache/stratos/blob/4.0.0/components/org.apache.stratos.cartridge.agent/src/main/java/org/apache/stratos/cartridge/agent/extensions/DefaultExtensionHandler.java#L328> >> >> >> -- >> Imesh Gunaratne >> >> Technical Lead, WSO2 >> Committer & PMC Member, Apache Stratos > > > > > -- > Imesh Gunaratne > > Technical Lead, WSO2 > Committer & PMC Member, Apache Stratos
signature.asc
Description: Message signed with OpenPGP using GPGMail