David,

The current tag propagation code in review doesn't account for inherited
tags (from super types), this was done intentionally as this is being
handled by entity notification listener. When a tag is
added/deleted/updated on an entity we send notification messages which
includes the tag's super type information and a list of impacted entities
(by tag propagation). The inherited tag information are computed from
typeRegistry (cache of the Atlas type system).

I feel this will be efficient than traversing the graph to get the
inherited information. Atlas relies on typeRegistry cache to resolve all
super type, sub type information and for each type addition/deletion the
cache is refreshed to contain the latest type information. The effort you
mentioned in *step 4* might be duplicate of what we currently have in type
registry.

Regarding entity-classification restrictions on propagated tags, I think
this should not be part of the propagation query and we should restrict
this using relationship property in the edge -
1. To allow only certain tags for propagation
2. To allow tags of only certain parent tags.
3. or any other tag propagation overrides/restrictions on entities

This offers more flexibility to add constraints on tag propagation than
including in graph query. Once we move to Janus graph we can tweak the
query using TP3 syntax.


Thanks,
Sarath Subramanian

On Fri, Sep 15, 2017 at 1:25 AM, David Radley <david_rad...@uk.ibm.com>
wrote:

> Hi Madhan and Sarath,
> It occurs to me that we are introducing 2 new definitions around
> classifications that require the code to traverse around the graph.
> - classificationDefs now have entityTypes to restrict the entities that
> they can be applied to. This requires us to check entity and
> classification hierarchies to ensure that inherited entities and
> classifications abide by these restrictions.
> This is currently done in code in the AtlasClassificationType. One set of
> checks at classification add / update time and another when we try to add
> a classification to an entity.
> - tag propagation implementation is currently in review and looks to work
> out where tags should be propagated to using Gremlin TP2 queries. The
> current proposed query is neat around 10 lines long, but does not account
> for inheritance or entityType restrictions.
>
> If we carry on with the current approach , we potentially need to
> implement checking down the graph in the type code and also in the Gremlin
> query. I wonder if we can have a consistent approach so we use gremlin
> queries in both scenarios or use code in both scenarios. I see a few
> options
>
> 1) Carry on as is , code for Classification entityTypes , TP2 query for
> tag propagation. The TP2 query may become much more complex as it will
> need to recurse around the classification types in the graph and the
> entity types in the graph as well as the instance graph. The entityTypes
> gremlin logic will need to match the entityTypes checking code logic.
> 2) Move all the logic to code, this should mean we work at TP3, may give
> us more flexibility to handle tag propagation overrides we will need at a
> later date
> 3) Move all navigation logic to gremlin queries, this is appealing as the
> graph engine then can optimize the queries.
> 4) Extend 3) to store (cache) some of the inherited states in the instance
> graph so a simpler query can be made. We could also extend this approach
> to store when a user overrides the default propagation. I know we have
> concerns with duplicating metadata. I wonder if we could split the
> properties in the vertices so there is a defined section and a derived /
> cached section, so it is obvious which properties might need
> re-calculating.
>
> Thoughts?
>    all the best, David.
>
>
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>

Reply via email to