Hi,

It looks like this isn’t an issue in the latest 4.1 code, so that’s good.

However, in the 4.1 code we’ve lost the ability to pass data about topology events to the extensions run by the cartridge agent. I’ve attached a diff which shows how I would add this functionality back. Basically I’d extend the (Topology)Event interface to include a toEnv() method, which can be overridden by the subclasses to populate a HashMap with event data. This HashMap will then be passed into the subprocess as extra environment info.

How do people feel about the approach? I think it’s cleaner than the approach that existed in the 4.0 code, where the calling code put the event info into environment variables.

If people think this is a good idea I’m happy to expand it to cover the other TopologyEvents, and go through the process of getting it into the codebase. Feedback is welcome!

Best regards,

Michiel


Attachment: event_env.diff
Description: Binary data


On 22 Apr 2015, at 19:05, Imesh Gunaratne <im...@apache.org> wrote:

Hi Michiel,

In JCA topology is handled by the messaging module, you could see how topology is updated on Complete Topology event here:

In PCA, still topology is not properly updated after initializing it with the Complete Topology event.


On Wed, Apr 22, 2015 at 7:28 PM, Michiel Blokzijl (mblokzij) <mblok...@cisco.com> wrote:
Hi Imesh,

Ideally cartridge agent should only wait for Complete Topology event once in its lifecycle. If it is waiting more than once then there is an issue.

That’s not the problem, it only waits once for the complete topology.

To me it looks like the topology is never updated, or if it is, then it’s not clear to me how that’s happening? It looks like the Python cartridge agent for example does call an ‘update’ method:

- I don’t see anything similar in the Java cartridge agent.

Please could someone confirm whether this is the case, and perhaps explain how updating the topology is supposed to work in the Java cartridge agent?

Best regards,

Michiel

On 20 Apr 2015, at 19:05, Imesh Gunaratne <im...@apache.org> wrote:

Hi Michiel,

It's a pleasure! My guess is that either cartridge agent has been restarted or there is a bug in its logic. 

Ideally cartridge agent should only wait for Complete Topology event once in its lifecycle. If it is waiting more than once then there is an issue.

Thanks

On Mon, Apr 20, 2015 at 11:06 PM, Michiel Blokzijl (mblokzij) <mblok...@cisco.com> wrote:
HI Imesh,

Thanks for replying,

This issue might occur if the cartridge agent start processing member events before consuming Complete Topology event.

The issue happened way after that, I had Stratos running for a day or so, and in the logs I saw some “waiting for complete topology event ..” but they went away pretty quickly (way before this happened).


Because I don’t see anything that actually updates anything (beyond function-local variables like ‘env')..

Michiel

On 20 Apr 2015, at 18:13, Imesh Gunaratne <im...@apache.org> wrote:

Hi Michiel,

This issue might occur if the cartridge agent start processing member events before consuming Complete Topology event.

This is how the topology get initialized in any component that listen to topology topic in message broker; First of all when the component starts up it waits for the Complete Topology event to receive. This event is periodically published by Cloud Controller with the entire topology of a given moment of time. 

Once it is received the component would initialize the local topology and start listening to other events. Since Complete Topology event has given the latest state of the topology now the component can consume any other event published afterwards.

Thanks



On Mon, Apr 20, 2015 at 7:44 PM, Michiel Blokzijl (mblokzij) <mblok...@cisco.com> wrote:
Hi,
I’m looking at an issue with Stratos 4.0.0 code, and I’m having an issue with the cartridge agent. It complains about the topology being inconsistent, triggered by this code [1].

This causes the extension handler not to fire for cartridges going down.

[2015-04-19 07:19:22,486]  INFO - [MemberTerminatedMessageProcessor] Member terminated: [service] XXX [cluster] XXX [member] XXX-0.dom2a4618d5-edd9-4a99-9d9c-918715c761bd
[2015-04-19 07:19:22,486]  INFO - [DefaultExtensionHandler] Member terminated event received: [service] XXX [cluster] XX [member] XXX-0.dom2a4618d5-edd9-4a99-9d9c-918715c761bd
[2015-04-19 07:19:22,486] ERROR - [ExtensionUtils] Member id not found in topology [member] XXXX.dom2a4618d5-edd9-4a99-9d9c-918715c761bd
[2015-04-19 07:19:22,486] ERROR - [DefaultExtensionHandler] Topology is inconsistent...failed to execute member terminated event

Any idea what’s going wrong here?

I assume the topology isn’t being maintained correctly for some reason, but I haven’t quite figured out how/if the topology is being maintained at all. Looking at the complete topology event handler [2] for example, it doesn’t actually update the internally stored topology.. There’s nothing in the cartridge agent that calls the topology manager’s acquireWriteLock function..

Best regards,

Michiel





--
Imesh Gunaratne

Technical Lead, WSO2
Committer & PMC Member, Apache Stratos




--
Imesh Gunaratne

Technical Lead, WSO2
Committer & PMC Member, Apache Stratos




--
Imesh Gunaratne

Technical Lead, WSO2
Committer & PMC Member, Apache Stratos

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

Reply via email to