Hello Madhan, David, I would not wish to remove the option to have tag propagation flow in both directions. Most metadata relationships are not hierarchical. They are two-way and different situations will cause for different classifications to flow in each direction. I do not remember the discussion on removing the BOTH open - but if I missed it I apologise. What is the justification?
The enforcement of the classification's entity types should not prevent the propagation of the tag through an entity because it does not support a tag. Down stream entities may support the tag and need it to be propagated to them. We need to work through more scenarios because we also need a way to bound tag propagation :) As an FYI, the OMRS API for classifications includes an origin attribute that lets us return classifications with an entity that are explicitly assigned or propagated to the entity. Most callers will not care but some might. All the best Mandy ___________________________________________ Mandy Chessell CBE FREng CEng FBCS IBM Distinguished Engineer Master Inventor Member of the IBM Academy of Technology Visiting Professor, Department of Computer Science, University of Sheffield Email: mandy_chess...@uk.ibm.com LinkedIn: http://www.linkedin.com/pub/mandy-chessell/22/897/a49 Assistant: Janet Brooks - jsbrook...@uk.ibm.com From: Madhan Neethiraj <mad...@apache.org> To: David Radley <david_rad...@uk.ibm.com>, Sarath Subramanian <sar...@apache.org> Cc: atlas <d...@atlas.incubator.apache.org> Date: 13/01/2018 02:14 Subject: Re: Tag propagation David, Sarath was working on tag-propagation, but had to take up tasks related to JanusGraph and others. He will be resuming tag-propagation work next week; this feature would be part of Atlas-1.0.0 release. - lose BOTH - this is still in the code - I think we agreed we wanted to get rid of this. Agree. - should honour the classification entitytypes - so that we do not get classifications applied to inappropriate entityTypes Perhaps we should stop the propagation at the entity where the classification is not applicable? I think it wouldn’t be correct to block a classification association to an entity if the classification is not applicable for a down-stream entity. - There is the question about how the propagated classifications would look in the get entity rest API - I suggest that they appear in the entities classification with a field indicating that they are derived (and hence not able to be removed by an entity update). I was thinking about a separate attribute, AtlasEntity.propagatedClassifications, for this. However, I think your suggestion of adding a field to AtlasClassification is a better one; with this approach no changes would be needed in applications that process classifications on an entity. How about we capture the guid of the source entity on which the classification is associated, AtlasClassification.sourceEntityGuid? If this value is null, then the classification is associated with the current entity directly. - I would hope that Ranger would pick up these new propagated tags using the existing tag sync. Yes. With the approach detailed above, no changes would be needed in Ranger. - I think you wanted the derived classifications to be picked up at query time. I also remember suggesting that we store the derived classifications in a derivedClassifiation property in the entity which would contain the list of derived classifications. Or we could store them as a new type of edge "propagated classification" edges to the real classification. I like the edge idea. To enable queries like ‘get list of entities that are classified as PII’, it will be performant if each entity vertex has data about the propagated classifications as well, similar to entities having data on classifications directly associated with the entity currently. However, all the entities should directly reference a single instance of a classification, so that it will be easier to manage changes to classification attribute values. Sarath will send an update on the design choices later next week. If we had the above, we could classify a Term as PSI, and use the semantic mapping to propagate the classifications to the hive column. The hive column would not pick up classifications defined in the area 3 model like "SpineObject", which is defined as only applying to "GlossaryTerm". Yes. This usecase should be covered by the design discussed above. Thanks, Madhan From: David Radley <david_rad...@uk.ibm.com> Date: Thursday, January 11, 2018 at 8:52 AM To: Madhan Neethiraj <mneethi...@hortonworks.com> Cc: atlas <d...@atlas.incubator.apache.org> Subject: Tag propagation Hi Madhan, I have a look in the code - I was surprised that the tag propagation was not in. Is this something you are looking at in the near future? If not I may need to look into it. I suggest the tag propagation implementation should phase 1 should: - lose BOTH - this is still in the code - I think we agreed we wanted to get rid of this. - should honour the classification entitytypes - so that we do not get classifications applied to inappropriate entityTypes - There is the question about how the propagated classifications would look in the get entity rest API - I suggest that they appear in the entities classification with a field indicating that they are derived (and hence not able to be removed by an entity update). - I would hope that Ranger would pick up these new propagated tags using the existing tag sync. - I think you wanted the derived classifications to be picked up at query time. I also remember suggesting that we store the derived classifications in a derivedClassifiation property in the entity which would contain the list of derived classifications. Or we could store them as a new type of edge "propagated classification" edges to the real classification. I like the edge idea. If we had the above, we could classify a Term as PSI, and use the semantic mapping to propagate the classifications to the hive column. The hive column would not pick up classifications defined in the area 3 model like "SpineObject", which is defined as only applying to "GlossaryTerm". What do you think? all the best, David. Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU