[ https://issues.apache.org/jira/browse/ATLAS-1955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16129558#comment-16129558 ]
Richard Ding commented on ATLAS-1955: ------------------------------------- I have an offline discussion with [~davidrad] on how to add attribute validations to the type system. It seems that option 2 is more clean by separating validation rules from the attributes they validate. But it does add another top level type to the system. Please let me know what you think. *Option 1: Embed the validation rules inside attribute definition* Add subclass AtlasValidationDef to AtlasStructDef class and AtlasValidationDef will be an element of AtlasAttributeDef. For example, an email attribute can contain a validation definition: {code} { "name": "email", "typeName": "string", "cardinality": "SINGLE", "validations": [ { "type": "regex", "validator": "[0-9a-z]@[0-9a-z].[0-9a-z]+" } ], "isIndexable": false, "isOptional": true, "isUnique": false } {code} Notes: # The validationDefs will be serialized / deserialized as part of attributeDef Json string # The validationDefs associated with attributeDefs will be retrieved and invoked when validateValue or ValidateValueForUpdate (of AtlasStructType class) is called. # Initially, we’ll support three validation types: regex, lookup and class #* regex: the validator value is a regex string #* lookup: the validator value is the name of an existing AtlasEnumDef #* class: the validator value is the name of a validator class (e.g. org.apache.atlas.model.validation.CreditCardValidator). Validator classes all implement AttributeValidator interface. These classes can be builtin (part of Atlas), or dynamically loaded via Java provider framework. *Option 2: Validation rules as top level type definition* Here we define _AtlasAttributeValidationType_ (and _AtlasAttributeValidationDef_) as top level Atlas type, similar to _AtlasEnumType_ (and _AtlasEnumDef_), and add a optional validation field to _AtlasAttributeDef_. For example, we first define _AtlasAttributeValidationDefs_: {code} "validationDefs": [ { "name": "email_validation", "typeVersion": "1.0", "type": "regex", "validator": "[0-9a-z]@[0-9a-z].[0-9a-z]+" }, { "name": "country_code_validation", "typeVersion": "1.0", "type": "lookup", "validator": "country_code_enum_type" }, { "name": "credit_card_validation", "typeVersion": "1.0", "type": "class", "validator": "org.apache.atlas.model.validataion.CreditCardValidator" } ] {code} Then we define the validation field in email attributeDef: {code} { "name": "email", "typeName": "string", "cardinality": "SINGLE", "validation": "email_validation", "isIndexable": false, "isOptional": true, "isUnique": false } {code} Notes: # As a top level Atlas type, an AtlasAttibuteValidationType instance will be stored as a vertex in the backing graph db. # If validation field exists in an AtlasAttributeDef object, the attribute value will be validated based on validationDef in method validateValue or ValidateValueForUpdate (of AtlasStructType class) # Initially, we’ll support three validation types: regex, lookup and class > Validation for Attributes > ------------------------- > > Key: ATLAS-1955 > URL: https://issues.apache.org/jira/browse/ATLAS-1955 > Project: Atlas > Issue Type: New Feature > Components: atlas-core > Affects Versions: 0.9-incubating > Reporter: Israel Varea > Assignee: Richard Ding > Fix For: 0.9-incubating > > > It would be very nice that Atlas model could contain a way to represent > attribute validation. > A simple example is that we would like to model a Person, with attributes > Name, Email and Country. Now we would like to specify that Email has to > follow a specific regular expression, so it would be nice if we could set > Email -> hasValidation -> EmailRegex, with EmailRegex having: > Name: Email Regular Expresion > Expression: /[0-9a-z]+@[0-9a-z]+.[0-9a-z]+/ > For more complex types of validation, e.g. checking card number validity, it > could be added some external validator function/service. > Name: Credit Card Number Validator > Validator: org.apache.atlas.validators.creditcard or > https://host:port/creditCardValidator > For validations from a reference table, for example a country name, it could > be: > Name: Country Name Ref Validator > Reference Column: <country_name_column> > where <country_name_column> would be an instance of type Hive_Column or > HBase_Column. > Since this is a kind of Standarization, it could be placed in [Area > 5|https://cwiki.apache.org/confluence/display/ATLAS/Area+5+-+Standards]. > A similar approach is followed in software > [Kylo|https://github.com/Teradata/kylo/tree/master/integrations/spark/spark-validate-cleanse] -- This message was sent by Atlassian JIRA (v6.4.14#64029)