[ https://issues.apache.org/jira/browse/ATLAS-184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Shwetha G S resolved ATLAS-184. ------------------------------- Resolution: Duplicate ATLAS-379 > Integrate Sqoop metadata into Atlas > ----------------------------------- > > Key: ATLAS-184 > URL: https://issues.apache.org/jira/browse/ATLAS-184 > Project: Atlas > Issue Type: Improvement > Affects Versions: 0.6-incubating > Reporter: Venkatesh Seetharam > Fix For: 0.6-incubating > > > Apache Sqoop Integration with Apache Atlas (incubating) > Introduction > Apache Sqoop is a tool designed for efficiently transferring bulk data > between Apache Hadoop and structured data stores such as relational databases. > Apache Atlas is a metadata repository that enables end-to-end data lineage, > search and associate business classification. > Overview > The goal of this integration is to at minimum push the Sqoop generated query > metadata along with the source provenance, target(s), and any available > business context so Atlas can capture the lineage for this topology. > There are 2 parts in this process detailed below: > 1. Data model to represent the concepts in Sqoop > 2. Sqoop Bridge/Hook to update metadata in Atlas > Data Model > A data model is represented as a Type in Atlas. This can reuse or closely be > modeled after Hive data types that already exist. At the least, we need to > create types for: > • Sqoop processes containing the SQL query text, start/end times, user, > etc. > • Source Provenance, fine-grained at DB, Table, Column, etc. so we have a > 1-1 mapping between source and target assets > • Target (typically Hive, HBase, HDFS, etc.) > You can take a look at the data model code for Hive. Sqoop should reuse the > data model from Hive or closely model after that. > Pushing Metadata into Atlas > There are 2 parts to the bridge: > 1. Sqoop Bridge > This does not apply to Sqoop tool. However, will apply if and when we migrate > to Sqoop 2. > 2. Post-execution Hook > Atlas needs to be notified when a new Sqoop Ingest is executed successfully > or when someone changes the definition of an existing Sqoop Job. > You can refer to the hook code for Hive. > 3. Column-level lineage > It would be good to have column level lineage for data flowing from the > source database/WH into Hive. -- This message was sent by Atlassian JIRA (v6.3.4#6332)