Hi all,

We are looking at implementing a "cleaner" that can remove orphaned locations from persisted state.

_*Problem statement*_
In older versions of Brooklyn (e.g. prior to [1]), we sometimes did not unmanage locations when the associated entity was deleted. This means that the persisted state for some customers contains many "orphaned locations" that are no longer referenced.

We want a way to safely delete these. We only want to delete locations that are not referenced.

These orphaned locations can also cause "dangling references" to be reported, where the orphaned location(s) hold references to things that have been deleted.

References to locations can be in a few formats:

1. Location is directly referenced from an entity's getLocations().
2. Location is indirectly referenced from an entity (e.g. the location
   is the parent of another location that is referenced).
3. Location is referenced by an entity in some other way (rather than
   getLocations()) - e.g. in a sensor or config key, such as [2].
4. Location is referenced by a policy or enricher.

For (4), I can't think of any such use-case off-hand, but it's possible that a customer might write a bespoke policy/enricher that does this.

For (2), it means we need to worry about reachability. Note there might be groups of locations that are unreachable (e.g. location X and its parent refer to each other, but are not referenced by anything else).

_*Location deletion: proposed solution*_
We propose an offline tool, similar in use to copy-state [3], which will clean up the persisted state, and save the cleaned-up copy to a given location.

It is important that the tool is run offline, in case a Brooklyn server is in the middle of writing multiple new files.

Ideally this will not deserialize all the persisted state (so does not require classloading, etc). We'll therefore work with BrooklynMementoRawData [4].
We'd therefore be able to run this outside of the Karaf container.

We can identify location references in the XML using a combination of the following techniques:

1. The marker <locationProxy>...</locationProxy> for references inside
   config keys, sensors, etc.
2. Inside an entity, the <locations>...</locations> section.
3. Inside a location, the <parent>...</parent> and
   <children>...</children> section.

From (1) and (2), we'll identify all locations that are reachable. From (3), we'll identify the locations that are indirectly referenced. We'll then know we can delete all others.

_Optional second part: validating location deletions_
We could validate that we were right to delete those locations. When we next start Brooklyn, we could look at the set of dangling references [5]. If anything we deleted is now reported as a dangling reference, then we'd report this error.

Is this worth doing? Would it be optional (because it requires being able to class-load everything).


_*Policy/Enricher deletion: proposed solution*_
We can apply the same logic for deleting policies/enrichers that have become orphaned.

It is a lot easier to identify the policies/enrichers that are in use: they are all directly referenced by an entity in the section <enrichers>....</enrichers> or <policies>....</policies>.

Anything not referenced, we can delete.

Aled

[1] https://github.com/apache/brooklyn-server/pull/148
[2] https://github.com/apache/brooklyn-server/blob/0.9.0/core/src/main/java/org/apache/brooklyn/core/location/dynamic/LocationOwner.java#L64 [3] http://brooklyn.apache.org/v/0.9.0/ops/persistence/index.html#cli-commands-for-copying-state [4] https://github.com/apache/brooklyn-server/blob/0.9.0/api/src/main/java/org/apache/brooklyn/api/mgmt/rebind/mementos/BrooklynMementoRawData.java [5] https://github.com/apache/brooklyn-server/blob/0.9.0/api/src/main/java/org/apache/brooklyn/api/mgmt/rebind/RebindExceptionHandler.java#L55-L88

Reply via email to