Thomas Draier created KARAF-5562:
------------------------------------
Summary: Improve cellar groups configuration from hazelcast
Key: KARAF-5562
URL: https://issues.apache.org/jira/browse/KARAF-5562
Project: Karaf
Issue Type: Improvement
Components: cellar-hazelcast
Affects Versions: 4.1.4, 4.0.10
Reporter: Thomas Draier
We encountered different issues due to HazelcastGroupManager, I'm grouping them
here as all of them are linked and we fixed them in a single refactoring of the
class. This globally result in a better synchronization of the cellar groups
configuration.
- Hazelcast network splits can result in very bad behaviour on the “groups”
shared map - this map contains the list of groups and its members, and the
system fully rely on it to know in which groups you are. If multiple nodes
updates the map while they are not connected together (easy to reproduce by
starting both nodes at the same time), and then join afterwards, the default
merge algorithm is applied and simply overwrite the full map. This basically
result in groups loosing members, even if the configuration file claims that
the nodes are still members.
- When handling the groups configuration, HazelcastGroupManager replicates the
felix.fileinstall.filename property on each node, containing the configuration
file path. It’s quite “ok” if you’re on a cluster with each node installed on
the exact same path - however if you’re on the same machine, with 2 nodes on
different paths : one node will at one point write on the config file of the
other node and never updates its own config, which can be quite confusing.
- The HazelcastGroupManager can start even when a configuration is not detected
by fileinstall yet - it then creates a new config, based on the hazelcast
shared config, which will override the config file when fileinstall detects it.
It does not have a huge impact, but it shuffles the properties files and makes
it unreadable.
- The updates from hazelcast to local config trigger back update on hazelcast
which goes back to local config and sometimes revert the changes, resulting in
no change in the config. Basically , when adding a group, a lot of properties
are updated - for each of them we trigger a configuration update. Each
configuration update triggers an event which send the whole config back to
hazelcast, including properties that are not updated yet, setting them back to
their old values. All events (hazelcast updates and osgi config) are treated
asynchronously - depending on the orders of events, some properties can be
reverted or never added (usually groups property is always reverted after a
group add).
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)