Repository: incubator-atlas
Updated Branches:
  refs/heads/master 0cf250099 -> 8e8b51b8e


ATLAS-1022 Update typesystem wiki with details (yhemanth via shwethags)


Project: http://git-wip-us.apache.org/repos/asf/incubator-atlas/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-atlas/commit/f672aaef
Tree: http://git-wip-us.apache.org/repos/asf/incubator-atlas/tree/f672aaef
Diff: http://git-wip-us.apache.org/repos/asf/incubator-atlas/diff/f672aaef

Branch: refs/heads/master
Commit: f672aaeffdad9539b52d0122cc2593e7352d6329
Parents: 0cf2500
Author: Shwetha GS <[email protected]>
Authored: Wed Jul 20 12:10:10 2016 +0530
Committer: Shwetha GS <[email protected]>
Committed: Wed Jul 20 12:10:10 2016 +0530

----------------------------------------------------------------------
 .../site/resources/images/twiki/data-types.png  | Bin 413738 -> 0 bytes
 .../resources/images/twiki/types-instance.png   | Bin 445893 -> 0 bytes
 docs/src/site/twiki/TypeSystem.twiki            | 266 ++++++++++++++-----
 release-log.txt                                 |   1 +
 4 files changed, 197 insertions(+), 70 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-atlas/blob/f672aaef/docs/src/site/resources/images/twiki/data-types.png
----------------------------------------------------------------------
diff --git a/docs/src/site/resources/images/twiki/data-types.png 
b/docs/src/site/resources/images/twiki/data-types.png
deleted file mode 100755
index 3aa1904..0000000
Binary files a/docs/src/site/resources/images/twiki/data-types.png and 
/dev/null differ

http://git-wip-us.apache.org/repos/asf/incubator-atlas/blob/f672aaef/docs/src/site/resources/images/twiki/types-instance.png
----------------------------------------------------------------------
diff --git a/docs/src/site/resources/images/twiki/types-instance.png 
b/docs/src/site/resources/images/twiki/types-instance.png
deleted file mode 100755
index 6afca21..0000000
Binary files a/docs/src/site/resources/images/twiki/types-instance.png and 
/dev/null differ

http://git-wip-us.apache.org/repos/asf/incubator-atlas/blob/f672aaef/docs/src/site/twiki/TypeSystem.twiki
----------------------------------------------------------------------
diff --git a/docs/src/site/twiki/TypeSystem.twiki 
b/docs/src/site/twiki/TypeSystem.twiki
index 64cf5a2..b658cfa 100755
--- a/docs/src/site/twiki/TypeSystem.twiki
+++ b/docs/src/site/twiki/TypeSystem.twiki
@@ -1,74 +1,200 @@
 ---+ Type System
 
----++ Introduction
-
 ---++ Overview
 
----+++ Data Types Overview
-<img src="images/twiki/data-types.png" height="400" width="600" />
-
----+++ Types Instances Overview
-<img src="images/twiki/types-instance.png" height="400" width="600" />
-
----++ Details
-
-### Structs are like C structs - they don't have an identity
-- no independent lifecycle
-- like a bag of properties
-- like in hive, also
-
-### Classes are classes
-- like any OO class
-- have identity
-- can have inheritence
-- can contain structs
-- don't necessarily need to use a struct inside the class to define props
-- can also define props using !AttributeDefinition using the basic data types
-- classes are immutable once created
-
-### On search interface:
-- can search for all instances of a class
-- classes could become tables in a relational system, for instance
-       - also databases, columns, etc.
-
-### Traits is similar to scala - traits more like decorators (?)
-- traits get applied to instances - not classes
-       - this satisfies the classification mechanism (ish)
-- can have a class instance have any number of traits
-- e.g. security clearance - any Person class could have it; so we add it as a 
mixin to the Person class
-       - security clearance trait has a level attribute
-       - traits are labels
-       - each label can have its own attribute
-- reason for doing this is:
-       - modeled security clearance trait
-       - want to prescribe it to other things, too
-       - can now search for anything that has security clearance level = 1, 
for instance
-
-### On Instances:
-- class, trait, struct all have bags of attributes
-- can get name of type associated with attribute
-- can get or set the attribute in that bag for each instance
-
-### On Classification:
-- create column as a class
-- create a trait to classify as "PHI"
-- would create the instance of the column with the PHI trait
-- apply traits to instances
-- CAN'T apply traits to class
-
-### Other useful information
-
-!HierarchicalClassType - base type for !ClassType and !TraitType
-Instances created from Definitions
-
-Every instance is referenceable - i.e. something can point to it in the graph 
db
-!MetadataService may not be used longterm - it is currently used for 
bootstrapping the repo & type system
-
-Id class - represents the Id of an instance
-
-When the web service receives an object graph, the !ObjectGraphWalker is used 
to update things
-       - !DiscoverInstances is used to discover the instances in the object 
graph received by the web service
-
-!MapIds assigns new ids to the discovered instances in the object graph
-
-Anything under the storage package is not part of the public interface
\ No newline at end of file
+Atlas allows users to define a model for the metadata objects they want to 
manage. The model is composed of definitions
+called ‘types’. Instances of ‘types’ called ‘entities’ represent 
the actual metadata objects that are managed. The Type
+System is a component that allows users to define and manage the types and 
entities. All metadata objects managed by
+Atlas out of the box (like Hive tables, for e.g.) are modelled using types and 
represented as entities. To store new
+types of metadata in Atlas, one needs to understand the concepts of the type 
system component.
+
+---++ Types
+
+A ‘Type’ in Atlas is a definition of how a particular type of metadata 
objects are stored and accessed. A type
+represents one or a collection of attributes that define the properties for 
the metadata object. Users with a
+development background will recognize the similarity of a type to a 
‘Class’ definition of object oriented programming
+languages, or a ‘table schema’ of relational databases.
+
+An example of a type that comes natively defined with Atlas is a Hive table. A 
Hive table is defined with these
+attributes:
+
+<verbatim>
+Name: hive_table
+MetaType: Class
+SuperTypes: DataSet
+Attributes:
+    name: String (name of the table)
+    db: Database object of type hive_db
+    owner: String
+    createTime: Date
+    lastAccessTime: Date
+    comment: String
+    retention: int
+    sd: Storage Description object of type hive_storagedesc
+    partitionKeys: Array of objects of type hive_column
+    aliases: Array of strings
+    columns: Array of objects of type hive_column
+    parameters: Map of String keys to String values
+    viewOriginalText: String
+    viewExpandedText: String
+    tableType: String
+    temporary: Boolean
+</verbatim>
+
+The following points can be noted from the above example:
+
+   * A type in Atlas is identified uniquely by a ‘name’
+   * A type has a metatype. A metatype represents the type of this model in 
Atlas. Atlas has the following metatypes:
+      * Basic metatypes: E.g. Int, String, Boolean etc.
+      * Enum metatypes
+      * Collection metatypes: E.g. Array, Map
+      * Composite metatypes: E.g. Class, Struct, Trait
+   * A type can ‘extend’ from a parent type called ‘supertype’ - by 
virtue of this, it will get to include the attributes that are defined in the 
supertype as well. This allows modellers to define common attributes across a 
set of related types etc. This is again similar to the concept of how Object 
Oriented languages define super classes for a class. It is also possible for a 
type in Atlas to extend from multiple super types.
+      * In this example, every hive table extends from a pre-defined supertype 
called a ‘DataSet’. More details about this pre-defined types will be 
provided later.
+   * Types which have a metatype of ‘Class’, ‘Struct’ or ‘Trait’ 
can have a collection of attributes. Each attribute has a name (e.g.  
‘name’) and some other associated properties. A property can be referred to 
using an expression type_name.attribute_name. It is also good to note that 
attributes themselves are defined using Atlas metatypes.
+      * In this example, hive_table.name is a String, hive_table.aliases is an 
array of Strings, hive_table.db refers to an instance of a type called hive_db 
and so on.
+   * Type references in attributes, (like hive_table.db) are particularly 
interesting. Note that using such an attribute, we can define arbitrary 
relationships between two types defined in Atlas and thus build rich models. 
Note that one can also collect a list of references as an attribute type (e.g. 
hive_table.cols which represents a list of references from hive_table to the 
hive_column type)
+
+---++ Entities
+
+An ‘entity’ in Atlas is a specific value or instance of a Class ‘type’ 
and thus represents a specific metadata object
+in the real world. Referring back to our analogy of Object Oriented 
Programming languages, an ‘instance’ is an
+‘Object’ of a certain ‘Class’.
+
+An example of an entity will be a specific Hive Table. Say Hive has a table 
called ‘customers’ in the ‘default’
+database. This table will be an ‘entity’ in Atlas of type hive_table. By 
virtue of being an instance of a class
+type, it will have values for every attribute that are a part of the Hive 
table ‘type’, such as:
+
+<verbatim>
+id: "9ba387dd-fa76-429c-b791-ffc338d3c91f"
+typeName: “hive_table”
+values:
+    name: “customers”
+    db: "b42c6cfc-c1e7-42fd-a9e6-890e0adf33bc"
+    owner: “admin”
+    createTime: "2016-06-20T06:13:28.000Z"
+    lastAccessTime: "2016-06-20T06:13:28.000Z"
+    comment: null
+    retention: 0
+    sd: "ff58025f-6854-4195-9f75-3a3058dd8dcf"
+    partitionKeys: null
+    aliases: null
+    columns: ["65e2204f-6a23-4130-934a-9679af6a211f", 
"d726de70-faca-46fb-9c99-cf04f6b579a6", ...]
+    parameters: {"transient_lastDdlTime": "1466403208"}
+    viewOriginalText: null
+    viewExpandedText: null
+    tableType: “MANAGED_TABLE”
+    temporary: false
+</verbatim>
+
+The following points can be noted from the example above:
+
+   * Every entity that is an instance of a Class type is identified by a 
unique identifier, a GUID. This GUID is generated by the Atlas server when the 
object is defined, and remains constant for the entire lifetime of the entity. 
At any point in time, this particular entity can be accessed using its GUID.
+      * In this example, the ‘customers’ table in the default database is 
uniquely identified by the GUID "9ba387dd-fa76-429c-b791-ffc338d3c91f"
+   * An entity is of a given type, and the name of the type is provided with 
the entity definition.
+      * In this example, the ‘customers’ table is a ‘hive_table.
+   * The values of this entity are a map of all the attribute names and their 
values for attributes that are defined in the hive_table type definition.
+   * Attribute values will be according to the metatype of the attribute.
+      * Basic metatypes: integer, String, boolean values. E.g. ‘name’ = 
‘customers’, ‘Temporary’ = ‘false’
+      * Collection metatypes: An array or map of values of the contained 
metatype. E.g. parameters = { “transient_lastDdlTime”: “1466403208”}
+      * Composite metatypes: For classes, the value will be an entity with 
which this particular entity will have a relationship. E.g. The hive table 
“customers” is present in a database called “default”. The relationship 
between the table and database are captured via the “db” attribute. Hence, 
the value of the “db” attribute will be a GUID that uniquely identifies the 
hive_db entity called “default”
+
+With this idea on entities, we can now see the difference between Class and 
Struct metatypes. Classes and Structs
+both compose attributes of other types. However, entities of Class types have 
the Id attribute (with a GUID value) a
+nd can be referenced from other entities (like a hive_db entity is referenced 
from a hive_table entity). Instances of
+Struct types do not have an identity of their own. The value of a Struct type 
is a collection of attributes that are
+‘embedded’ inside the entity itself.
+
+---++ Attributes
+
+We already saw that attributes are defined inside composite metatypes like 
Class and Struct. But we simplistically
+referred to attributes as having a name and a metatype value. However, 
attributes in Atlas have some more properties
+that define more concepts related to the type system.
+
+An attribute has the following properties:
+<verbatim>
+    name: string,
+    dataTypeName: string,
+    isComposite: boolean,
+    isIndexable: boolean,
+    isUnique: boolean,
+    multiplicity: enum,
+    reverseAttributeName: string
+</verbatim>
+
+The properties above have the following meanings:
+
+   * name - the name of the attribute
+   * dataTypeName - the metatype name of the attribute (native, collection or 
composite)
+   * isComposite -
+      * This flag indicates an aspect of modelling. If an attribute is defined 
as composite, it means that it cannot have a lifecycle independent of the 
entity it is contained in. A good example of this concept is the set of columns 
that make a part of a hive table. Since the columns do not have meaning outside 
of the hive table, they are defined as composite attributes.
+      * A composite attribute must be created in Atlas along with the entity 
it is contained in. i.e. A hive column must be created along with the hive 
table.
+   * isIndexable -
+      * This flag indicates whether this property should be indexed on, so 
that look ups can be performed using the attribute value as a predicate and can 
be performed efficiently.
+   * isUnique -
+      * This flag is again related to indexing. If specified to be unique, it 
means that a special index is created for this attribute in Titan that allows 
for equality based look ups.
+      * Any attribute with a true value for this flag is treated like a 
primary key to distinguish this entity from other entities. Hence care should 
be taken ensure that this attribute does model a unique property in real world.
+         * For e.g. consider the name attribute of a hive_table. In isolation, 
a name is not a unique attribute for a hive_table, because tables with the same 
name can exist in multiple databases. Even a pair of (database name, table 
name) is not unique if Atlas is storing metadata of hive tables amongst 
multiple clusters. Only a cluster location, database name and table name can be 
deemed unique in the physical world.
+   * multiplicity - indicates whether this attribute is required, optional, or 
could be multi-valued. If an entity’s definition of the attribute value does 
not match the multiplicity declaration in the type definition, this would be a 
constraint violation and the entity addition will fail. This field can 
therefore be used to define some constraints on the metadata information.
+
+Using the above, let us expand on the attribute definition of one of the 
attributes of the hive table below.
+Let us look at the attribute called ‘db’ which represents the database to 
which the hive table belongs:
+
+<verbatim>
+db:
+    "dataTypeName": "hive_db",
+    "isComposite": false,
+    "isIndexable": true,
+    "isUnique": false,
+    "multiplicity": "required",
+    "name": "db",
+    "reverseAttributeName": null
+</verbatim>
+
+Note the “required” constraint on multiplicity. A table entity cannot be 
sent without a db reference.
+
+<verbatim>
+columns:
+    "dataTypeName": "array<hive_column>",
+    "isComposite": true,
+    "isIndexable": true,
+    “isUnique": false,
+    "multiplicity": "optional",
+    "name": "columns",
+    "reverseAttributeName": null
+</verbatim>
+
+Note the “isComposite” true value for columns. By doing this, we are 
indicating that the defined column entities should
+always be bound to the table entity they are defined with.
+
+From this description and examples, you will be able to realize that attribute 
definitions can be used to influence
+specific modelling behavior (constraints, indexing, etc) to be enforced by the 
Atlas system.
+
+---++ System specific types and their significance
+
+Atlas comes with a few pre-defined system types. We saw one example (DataSet) 
in the preceding sections. In this
+section we will see all these types and understand their significance.
+
+*Referenceable*: This type represents all entities that can be searched for 
using a unique attribute called
+qualifiedName.
+
+*Asset*: This type contains attributes like name, description and owner. Name 
is a required attribute
+(multiplicity = required), the others are optional. The purpose of 
Referenceable and Asset is to provide modellers
+with way to enforce consistency when defining and querying entities of their 
own types. Having these fixed set of
+attributes allows applications and User interfaces to make convention based 
assumptions about what attributes they can
+expect of types by default.
+
+*Infrastructure*: This type extends Referenceable and Asset and typically can 
be used to be a common super type for
+infrastructural metadata objects like clusters, hosts etc.
+
+*!DataSet*: This type extends Referenceable and Asset. Conceptually, it can be 
used to represent an type that stores
+data. In Atlas, hive tables, Sqoop RDBMS tables etc are all types that extend 
from !DataSet. Types that extend !DataSet
+can be expected to have a Schema in the sense that they would have an 
attribute that defines attributes of that dataset.
+For e.g. the columns attribute in a hive_table. Also entities of types that 
extend !DataSet participate in data
+transformation and this transformation can be captured by Atlas via lineage 
(or provenance) graphs.
+
+*Process*: This type extends Referenceable and Asset. Conceptually, it can be 
used to represent any data transformation
+operation. For example, an ETL process that transforms a hive table with raw 
data to another hive table that stores
+some aggregate can be a specific type that extends the Process type. A Process 
type has two specific attributes,
+inputs and outputs. Both  inputs and outputs are arrays of !DataSet entities. 
Thus an instance of a Process type can
+use these inputs and outputs to capture how the lineage of a !DataSet evolves.
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-atlas/blob/f672aaef/release-log.txt
----------------------------------------------------------------------
diff --git a/release-log.txt b/release-log.txt
index 824778e..40de319 100644
--- a/release-log.txt
+++ b/release-log.txt
@@ -6,6 +6,7 @@ INCOMPATIBLE CHANGES:
 
 
 ALL CHANGES:
+ATLAS-1022 Update typesystem wiki with details (yhemanth via shwethags)
 ATLAS-1021 Update Atlas architecture wiki (yhemanth via sumasai)
 ATLAS-957 Atlas is not capturing topologies that have $ in the data payload 
(shwethags)
 ATLAS-1032 Atlas hook package should not include libraries already present in 
host component - like log4j (mneethiraj via sumasai)

Reply via email to