Repository: incubator-atlas
Updated Branches:
  refs/heads/master eb6e656be -> b6acff6d5


ATLAS-1182 Hive Column level lineage docs (svimal2106 via shwethags)


Project: http://git-wip-us.apache.org/repos/asf/incubator-atlas/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-atlas/commit/3cc1bd5b
Tree: http://git-wip-us.apache.org/repos/asf/incubator-atlas/tree/3cc1bd5b
Diff: http://git-wip-us.apache.org/repos/asf/incubator-atlas/diff/3cc1bd5b

Branch: refs/heads/master
Commit: 3cc1bd5b837c944c8af1d50451ef89a1d3f15ee6
Parents: eb6e656
Author: Shwetha GS <[email protected]>
Authored: Wed Oct 19 15:21:51 2016 +0530
Committer: Shwetha GS <[email protected]>
Committed: Wed Oct 19 15:21:51 2016 +0530

----------------------------------------------------------------------
 .../resources/images/column_lineage_ex1.png     | Bin 0 -> 34057 bytes
 docs/src/site/twiki/Bridge-Hive.twiki           |  37 +++++++++++++++++++
 release-log.txt                                 |   1 +
 3 files changed, 38 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-atlas/blob/3cc1bd5b/docs/src/site/resources/images/column_lineage_ex1.png
----------------------------------------------------------------------
diff --git a/docs/src/site/resources/images/column_lineage_ex1.png 
b/docs/src/site/resources/images/column_lineage_ex1.png
new file mode 100644
index 0000000..a41c5fb
Binary files /dev/null and 
b/docs/src/site/resources/images/column_lineage_ex1.png differ

http://git-wip-us.apache.org/repos/asf/incubator-atlas/blob/3cc1bd5b/docs/src/site/twiki/Bridge-Hive.twiki
----------------------------------------------------------------------
diff --git a/docs/src/site/twiki/Bridge-Hive.twiki 
b/docs/src/site/twiki/Bridge-Hive.twiki
index 653ed4e..dd22b5c 100644
--- a/docs/src/site/twiki/Bridge-Hive.twiki
+++ b/docs/src/site/twiki/Bridge-Hive.twiki
@@ -71,6 +71,43 @@ The following properties in 
<atlas-conf>/atlas-application.properties control th
 
 Refer [[Configuration][Configuration]] for notification related configurations
 
+---++ Column Level Lineage
+
+Starting from 0.8-incubating version of Atlas, Column level lineage is 
captured in Atlas. Below are the details
+
+---+++ Model
+   * !ColumnLineageProcess type is a subclass of Process
+
+   * This relates an output Column to a set of input Columns or the Input Table
+
+   * The Lineage also captures the kind of Dependency: currently the values 
are SIMPLE, EXPRESSION, SCRIPT
+      * A SIMPLE dependency means the output column has the same value as the 
input
+      * An EXPRESSION dependency means the output column is transformed by 
some expression in the runtime(for e.g. a Hive SQL expression) on the Input 
Columns.
+      * SCRIPT means that the output column is transformed by a user provided 
script.
+
+   * In case of EXPRESSION dependency the expression attribute contains the 
expression in string form
+
+   * Since Process links input and output !DataSets, we make Column a subclass 
of !DataSet
+
+---+++ Examples
+For a simple CTAS below:
+<verbatim>
+create table t2 as select id, name from T1
+</verbatim>
+
+The lineage is captured as
+
+<img src="images/column_lineage_ex1.png" height="200" width="400" />
+
+
+
+---+++ Extracting Lineage from Hive commands
+  * The !HiveHook maps the !LineageInfo in the !HookContext to Column lineage 
instances
+
+  * The !LineageInfo in Hive provides column-level lineage for the final 
!FileSinkOperator, linking them to the input columns in the Hive Query
+
+---+++ NOTE
+Column level lineage works with Hive version 1.2.1 after the patch for <a 
href="https://issues.apache.org/jira/browse/HIVE-13112";>HIVE-13112</a> is 
applied to Hive source
 
 ---++ Limitations
    * Since database name, table name and column names are case insensitive in 
hive, the corresponding names in entities are lowercase. So, any search APIs 
should use lowercase while querying on the entity names

http://git-wip-us.apache.org/repos/asf/incubator-atlas/blob/3cc1bd5b/release-log.txt
----------------------------------------------------------------------
diff --git a/release-log.txt b/release-log.txt
index d2b848b..3f24063 100644
--- a/release-log.txt
+++ b/release-log.txt
@@ -9,6 +9,7 @@ ATLAS-1060 Add composite indexes for exact match performance 
improvements for al
 ATLAS-1127 Modify creation and modification timestamps to Date instead of 
Long(sumasai)
 
 ALL CHANGES:
+ATLAS-1182 Hive Column level lineage docs (svimal2106 via shwethags)
 ATLAS-1230 updated AtlasTypeRegistry to support batch, atomic type updates 
(mneethiraj)
 ATLAS-1229 Add TypeCategory and methods to access attribute definitiions in 
AtlasTypes (sumasai)
 ATLAS-1227 Added support for attribute constraints in the API (mneethiraj)

Reply via email to