[ https://issues.apache.org/jira/browse/ATLAS-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15928409#comment-15928409 ]
David Radley commented on ATLAS-1410: ------------------------------------- Thank you [~zimnymc] for your feedback. It is great. Here are my responses ad. Use Case use case 1 It shouldn't be possible to define two terms with exactly the same name. <<David Yes this might be desirable in a single glossary - but we should not restrict another glossary (name space with a new context) from using the same name. There are cases where we want 2 term names to be the same in a glossary for example when we use replaces. >> It can be possible to do it only through synonyms if definition stays the same. <<David I agree that synonyms are useful ways of equating differently named terms with the same meaning>> If we have different definition then we also must have different name for each term. If we will allow same naming we will probably enormously stress glossary integrity. <<David. Maybe we should consider a constraint of the form; Active Terms need to have unique names in a glossary, unless they are connected with replaces relationships. [~mandy_chessell] Is this too restrictive? >> use cases 2 and 3 I agree that Categories are needed to give more control over terms organization but I think I need a bit more thinking if categories should help in creating hierarchies. It might be the case but then we should allow terms to only be leaves and every kind of grouping should be done via category. This would mean that categories should also have classifiers. <<David I am not sure what the benefit would be on not allowing non-leaf terms. I like your idea of classifying categories; do you have a use case in mind that would require this ? >> use case 7 Do you mean collections ? <<David. Yes, I'll make this clearer>> use case 11 this sounds a bit too high level and would probably be nice to describe it in more details <<David OK I will add more detail>> I'm explicitly missing two things: 1. ability to inherit classifiers <<David I assume you are meaning that a term has a classification and there are downstream terms that should get (inherit) the classifier. This is important -we want the OMAS layers to do this based on context; as we only want the classification stored once in the metadata store. Or are you thinking of a is-a relationship? >> 2. are there any models between terms and assets or is it only about term to asset ? we might want to include couple of levels of models (like LDM and/or PDM for particular technology) at least one is already there - by connecting terms to other terms we are creating concepts which should be visualized in some way for easier navigation <<David great idea - I suggest raising a Jira for this. This proposal only talks of associating assets and terms. An asset could have many terms, and a term could be assigned to many assets. This proposal does not introduce any models to control the assignments. >> ad. discussion point on p. 5 yes, that's how I also see it - Taxonomy is the name of the hierarchy of Glossary Categories but does this mean that Taxonomy is a name of Glossary instance ? <<David. >From discussions it seems that taxonomies would naturally fit as named category hierarchies. The terms would not be in a taxonomy. As the existing Atlas v1 taxonomy has terms - we felt it clearer to remove taxonomy and add a glossary to hold terms and categories. . >> ad. Glossary Terms and Glossary Categories discussion point - can there be a term without Category ? if not will there always be at least one prime category for each Glossary ? if yes what is the difference then between Glossary and prime category ? is there any at all ? <<David. We are suggesting a term does not need a category, it is owned by its glossary. We are suggesting not having a prime category. >> point for discussion - should it be allowed that term from one Glossary is inside Category from another Glossary ? I think we should not allow this kind of situations as those increase the risk of loosing integrity for particular Glossary. <<David I am interested in how you define glossary integrity; are there a series of rules you have in mind?>> I'd say that there should be a copy of that term done to the other Glossary with some kind of a marker "inspired by". Otherwise we will create tight connection between two Glossaries and their maintenance will be more difficult (e.g. upgrades). <<David I am curious what sort of upgrade scenarios do you feel might be problematic? And how do you think the "inspired-by" term would solve this? The risk with inspired-by approach is that the "inspired-by" term gets out of sync with the original. Leading to duplication of slightly different metadata. I think your concerns would be addressed when terms and other metadata are versioned (not in scope of this document >> ad. Glossary Term identification and names Glossary Term names might not be unique in a Glossary. For example, there could be 2 definitions of customer. - just NO :-) <<David lets discuss off line :-) >> "we do not allow 2 Glossary Terms of the same name inheriting from a parent Glossary Term" - so we do allow or not ? or I missed something ? I need an example for this one to properly understand it. ad. Glossary Term context I'd like to create clear distinction between what is here meant by context and the term business context (being a term to term relations that create business context) - I just don't like using word "context" for both. <<david So we are not introducing anything called context. this was an English word to try to make sense of it. I can see it might confuse because of this. I will take look and see if I can rephrase to make it clearer.>> ad. page 9, example In general I do agree with the line of thinking but I have a question: both customer and attributes are terms right ? if so then is "has-a" relationship the best one to do term-to-term assignments ? <<David I assume you are suggesting we do not need attribute terms as we can work this out from the "has-a" relationship. >> ad. Owning relationships this "Concept Glossary Terms own Attribute Glossary Terms." I've some doubts about (see above remark for page 9, example) I not saying not go there, I just want to explore it more to understand it better <<David: I was inspired by the IBM industry models. I am interested in whether this is too prescriptive at this level [~mandy_chessell] what to you think? >> ad. Discussion point – maybe we should consider defining the Glossary Term attributes using the type system rather than relationships - yes we should ad. Discussion point: we could add homophones as well – if there was a need. I don't think there is a need now to do that. <<David agreed>> ad. Discussion point preferred-term attribute could be stored in the entity, AtlasObjectId or classification. I suggest storing it in the entity. I agree. ad. Discussion point: We will enable collection types to be created. Additionally, we may want to consider including a Collection type that has one attribute called contents with multiple values of the top-level type. Do you mean nesting Collections ? <<David I was not thinking of that - but if we do allow that, we would need to check for recursion. I was thinking of the implementation of one collection>> ad. Discussion point Introduction of bidirectional relationships, could be done separately from the Glossary enhancement. We may take a step-by-step approach but I'd say we need this from nearly the very beginning. <<David I agree it is important. Pragmatically I suggest this is a definable addition so we could have a pretty functional glossary without it and could then add it as the next enhancement. >> ad. Discussion point: We may wish to take a more revolutionary approach and allow relationships to be defined as top level artifacts, of which classifications are a type. Can we explore it more ? Sound pretty ambitious and worth to do but let's list consequences. <<David lets discuss>> > V2 Glossary API > --------------- > > Key: ATLAS-1410 > URL: https://issues.apache.org/jira/browse/ATLAS-1410 > Project: Atlas > Issue Type: Improvement > Reporter: David Radley > Assignee: David Radley > Attachments: Atlas Glossary V2 proposal v1.0.pdf, Atlas Glossary V2 > proposal v1.1.pdf > > > The BaseResourceDefinition uses the AttributeDefintion class from typesystem. > There are newer more funcitonal versions of this capability in the atlas-intg > project. This Jira is changing over the glossary implementation to the newer > entity / type classes. > Instread of the instanceProperties and collectionProperties in the > BaseResourceDefintions we should use something in this sort of style : > " > AtlasEntityDef deptTypeDef = > AtlasTypeUtil.createClassTypeDef(DEPARTMENT_TYPE, > "Department"+_description, ImmutableSet.<String>of(), > AtlasTypeUtil.createRequiredAttrDef("name", "string"), > new AtlasAttributeDef("employees", > String.format("array<%s>", "Person"), true, > AtlasAttributeDef.Cardinality.SINGLE, 0, 1, > false, false, > > Collections.<AtlasStructDef.AtlasConstraintDef>emptyList())); > AtlasEntityDef personTypeDef = > AtlasTypeUtil.createClassTypeDef("Person", "Person"+_description, > ImmutableSet.<String>of(), > AtlasTypeUtil.createRequiredAttrDef("name", "string"), > AtlasTypeUtil.createOptionalAttrDef("address", "Address"), > AtlasTypeUtil.createOptionalAttrDef("birthday", "date"), > AtlasTypeUtil.createOptionalAttrDef("hasPets", "boolean"), > AtlasTypeUtil.createOptionalAttrDef("numberOfCars", "byte"), > AtlasTypeUtil.createOptionalAttrDef("houseNumber", "short"), > AtlasTypeUtil.createOptionalAttrDef("carMileage", "int"), > AtlasTypeUtil.createOptionalAttrDef("age", "float"), > " > For the parent child relationships with glossary categories and terms we > should be able to have the type system manage edge deletion. As part of this, > we will need to investigate whether we could get rid of the disconnect and > connect methods added in ATLAS-1186 > -- This message was sent by Atlassian JIRA (v6.3.15#6346)