[ 
https://issues.apache.org/jira/browse/ATLAS-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15905259#comment-15905259
 ] 

David Radley commented on ATLAS-1410:
-------------------------------------

Responses to comments 

Page numbers would help to tie these comments to the document.  <<David Agreed 
>>
Page 2 - Asset type - defined in terms of itself. How are they used? or is this 
not relevant to this paper?  <<David >>
Page 2 - Why do we need to know about V1 and V2? I think it is because the 
current interfaces works with V1 and the new one will work with V2 - it would 
be helpful to state this explicitly. <<David Yes that is right I will add this. 
  >>
Page 4 - bullets 4-5 - has-a and is-a relationships are semantic relationships. 
<<David Yes that is right I have changed this. >>
Page 4 - missing from list - ability to associate a semantic meaning to a 
classification (v2), trait (v1)?  <<David Yes I have changed this. >>
Page 4 - Missing from the list - "typed-by" relationship to associate terms 
that include meaning in context with terms that describe more pure objects. For 
example Home Address is typed by Address.<<David Yes I have changed this. >>
Page 5 - Figure 1 - I am not comfortable with terms being owned by categories. 
I think each terms should be owned by a glossary and linked into 0, 1 or more 
categories as appropriate. This creates a much simpler deletion rule for the 
API/End user - particularly when you look at Figure 2 where terms are owned by 
multiple categories. IE, delete term from its glossary and it is deleted. In 
the proposed design, it raises such questions as "Is the term deleted when 
unlinked from all categories - or the first category it is linked to?" <<David. 
I agree this is the way we want it; I have changed the words to be explicit 
about this  >>
Page 6 - Figure 3 - I need more detail to understand the "classifies" 
relationship and how it relates to a classification. It seems redundant. Would 
you not relate a term to a classification which is in itself semantically 
classified by its definition term?
Page 6 - Bullet 6) - What is the alternative to using Gremlin queries? <<David 
there is a  DSL API that can be used as well>> 
Page 6 - Bullet 7) - is this an incomplete sentence - or does the paragraph 
that follows supposed to be a nested bullet list? Assuming it is a follow on 
point. My confusion is that I do not understand why the term/category hierarchy 
is relevant to the enhancement of classifications? The Classification object is 
defining the type of classification and its meaning is coming from the term? 
<<David yes>>  Is this suggesting that the relationships between 
classifications is coming from the term relationships in the same way we do 
thin in IGC today? <<David I am using classification to mean the Atlas 
Classification - which linked to zero or more  classifying terms; the terms are 
linked together in many varied ways >> If so it may help to show an example? 
<<David I have added a more sophisticated glossaries example later in the 
document >> 
Page 7 - Figure 4 and 5 - what is the difference between "Classification" and 
"Classification Relationship"? <<David I have removed the classifying 
relationship. At the moment I have the Classification instance having the same 
name ad its associated classifying term.  >> 
Page 7 - Maybe strange examples - the Glossaries would be for different subject 
areas - for example, there may be a marketing glossary, a customer care 
glossary, a banking glossary. These may be used for associating meaning to data 
assets (ie data assets). there may also be glossaries for different 
regulations, or standard governance approaches, and these may include terms 
that can be used to describe classification for data that drive operational 
governance? <<David I am trying to show simpler glossary scenarios to show 
simple interactions around the glossary metadata; maybe this is misleading - I 
could remove these simpler examples.    as you say these are not sophisticated 
enough to drive more real life scenarios. I have included a more realistic 
model later in the document >>  
Page 8 - I am not sure what the proposed enhancements are - it just seems to 
list the problems with the current model. All relationships in metadata are 
bi-directional. It should be the default. This mechanism seems complicated. 
Really need to define relationships independent of entities so we can define 
attributes on these relationships. The Classification is actually an example of 
an independently defined relationship that includes the GUID of the 2 entities 
it connects. This should be the common style of relationship. <<David as I have 
been prototyping and investigating , base Atlas has gained capabilities that 
can be used by the glossary. The main enhancements are classification to gain 
terms & guids , enhance / confirm that relationships work with many to many. 
Introducing a base type hierarchy that everything inherits from so we cna use 
collections, introduce the glossary model, POJO  and REST APIs. >> 
Page 9 - on discussion point - a Taxonomy is a hierarchy of categories that the 
terms are placed in - I thought this was included in the proposal and we do 
need this for organising terms so that people can find them - and the category 
hierarchies (taxonomies) help to provide context to terms too. Also, the 
semantic relationships discussed would mean we could support a simple ontology. 
<<David The new proposal does not include a  GlossaryTaxonomy; we may want to 
use the word taxonomy to describe the hierarchy of categories - though it will 
not be something that the code uses. >> 
Page 9 - Fully-qualified name - What a grandparent or parent term? What does a 
fully qualified name mean and when is it used? The unique name is its GUID. Its 
path name (there may be many) is the navigation to the term through the 
category hierarchies. <<David I have updated the text to clarify >>  
Page 9 - why do Atlas terms need to follow the schema in defined at this link - 
https://www.ibm.com/support/knowledgecenter/en/SSN364_8.8.0/com.ibm.ima.using/comp/vocab/terms_prop.html?
 it seem to imply a lifecycle which is not included in this proposal and a very 
specific modelling of the IBM industry models that have mandatory fields that 
are not always applicable to all glossaries. I think this doc should describe 
the schema of the glossary term explicitly and explain the fields.<<David OK I 
have removed this >>
page 10 - Figure 7 shows the navigation relationships and 1 way. We need to be 
able to navigate from the hive table to its classification to support the GAF. 
<<David All relationships should be navigable both ways - many relationships 
also have a direction. All the main glossary relationships can be modeled with 
a directional relationship. For the short term : Synonyms can be modeled with a 
main term that others are synonyms of; then only one of the terms would be 
assigned to assets. Longer term we could be more flexible as we get true 
bidirectional; relationship support.   >>
Page 11 - Figure 8 - Atlas entities box is hard to see which are terms and 
which are assets (ie hive columns)
Page 12 - Fully qualified name - Does this require all categories to be in a 3 
level hierarchy - or is this an example of a path name that happens to be 3 
levels deep? <<David Changed to make this clearer>>
Page 12 - What does the Taxonomy refer to in this table? <<David Changed to 
Glossary>>
Page 13 - The Glossary API is a OMAS API. <<David OK I will introduce this 
terminology. >>

> V2 Glossary API
> ---------------
>
>                 Key: ATLAS-1410
>                 URL: https://issues.apache.org/jira/browse/ATLAS-1410
>             Project: Atlas
>          Issue Type: Improvement
>            Reporter: David Radley
>            Assignee: David Radley
>         Attachments: Atlas Glossary V2 proposal v1.0.pdf, Atlas Glossary V2 
> proposal v1.1.pdf
>
>
> The BaseResourceDefinition uses the AttributeDefintion class from typesystem. 
> There are newer more funcitonal versions of this capability in the atlas-intg 
> project. This Jira is changing over the glossary implementation to the newer 
> entity / type classes.  
> Instread of the instanceProperties and collectionProperties in the 
> BaseResourceDefintions we should use something in this sort of style :  
> "
>  AtlasEntityDef deptTypeDef =
>                 AtlasTypeUtil.createClassTypeDef(DEPARTMENT_TYPE, 
> "Department"+_description, ImmutableSet.<String>of(),
>                         AtlasTypeUtil.createRequiredAttrDef("name", "string"),
>                         new AtlasAttributeDef("employees", 
> String.format("array<%s>", "Person"), true,
>                                 AtlasAttributeDef.Cardinality.SINGLE, 0, 1, 
> false, false,
>                                 
> Collections.<AtlasStructDef.AtlasConstraintDef>emptyList()));
>         AtlasEntityDef personTypeDef = 
> AtlasTypeUtil.createClassTypeDef("Person", "Person"+_description, 
> ImmutableSet.<String>of(),
>                 AtlasTypeUtil.createRequiredAttrDef("name", "string"),
>                 AtlasTypeUtil.createOptionalAttrDef("address", "Address"),
>                 AtlasTypeUtil.createOptionalAttrDef("birthday", "date"),
>                 AtlasTypeUtil.createOptionalAttrDef("hasPets", "boolean"),
>                 AtlasTypeUtil.createOptionalAttrDef("numberOfCars", "byte"),
>                 AtlasTypeUtil.createOptionalAttrDef("houseNumber", "short"),
>                 AtlasTypeUtil.createOptionalAttrDef("carMileage", "int"),
>                 AtlasTypeUtil.createOptionalAttrDef("age", "float"),
> "
> For the parent child relationships with glossary categories and terms we 
> should be able to have the type system manage edge deletion. As part of this, 
> we will need to investigate whether we could get rid of the disconnect and 
> connect methods added in ATLAS-1186 
>  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to