[ 
https://issues.apache.org/jira/browse/ATLAS-184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shwetha G S resolved ATLAS-184.
-------------------------------
    Resolution: Duplicate

ATLAS-379

> Integrate Sqoop metadata into Atlas
> -----------------------------------
>
>                 Key: ATLAS-184
>                 URL: https://issues.apache.org/jira/browse/ATLAS-184
>             Project: Atlas
>          Issue Type: Improvement
>    Affects Versions: 0.6-incubating
>            Reporter: Venkatesh Seetharam
>             Fix For: 0.6-incubating
>
>
> Apache Sqoop Integration with Apache Atlas (incubating)
> Introduction
> Apache Sqoop is a tool designed for efficiently transferring bulk data 
> between Apache Hadoop and structured data stores such as relational databases.
> Apache Atlas is a metadata repository that enables end-to-end data lineage, 
> search and associate business classification. 
> Overview
> The goal of this integration is to at minimum push the Sqoop generated query 
> metadata along with the source provenance, target(s), and any available 
> business context so Atlas can capture the lineage for this topology.
> There are 2 parts in this process detailed below:
> 1.    Data model to represent the concepts in Sqoop
> 2.    Sqoop Bridge/Hook to update metadata in Atlas
> Data Model
> A data model is represented as a Type in Atlas. This can reuse or closely be 
> modeled after Hive data types that already exist. At the least, we need to 
> create types for:
> •     Sqoop processes containing the SQL query text, start/end times, user, 
> etc. 
> •     Source Provenance, fine-grained at DB, Table, Column, etc. so we have a 
> 1-1 mapping between source and target assets
> •     Target (typically Hive, HBase, HDFS, etc.)
> You can take a look at the data model code for Hive. Sqoop should reuse the 
> data model from Hive or closely model after that.
> Pushing Metadata into Atlas
> There are 2 parts to the bridge:
> 1.    Sqoop Bridge 
> This does not apply to Sqoop tool. However, will apply if and when we migrate 
> to Sqoop 2.
> 2.    Post-execution Hook
> Atlas needs to be notified when a new Sqoop Ingest is executed successfully 
> or when someone changes the definition of an existing Sqoop Job.
> You can refer to the hook code for Hive.
> 3.    Column-level lineage
> It would be good to have column level lineage for data flowing from the 
> source database/WH into Hive. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to