[ 
https://issues.apache.org/jira/browse/ATLAS-183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shwetha G S updated ATLAS-183:
------------------------------
    Attachment:     (was: ATLAS-433-v2.patch)

> Add a Hook in Storm to post the topology metadata
> -------------------------------------------------
>
>                 Key: ATLAS-183
>                 URL: https://issues.apache.org/jira/browse/ATLAS-183
>             Project: Atlas
>          Issue Type: Sub-task
>    Affects Versions: 0.6-incubating
>            Reporter: Venkatesh Seetharam
>            Assignee: Hemanth Yamijala
>             Fix For: trunk
>
>         Attachments: ATLAS-183-1.patch, ATLAS-183-4.patch, ATLAS-183.patch
>
>
> Apache Storm Integration with Apache Atlas (incubating)
> Introduction
> Apache Storm is a distributed real-time computation system. Storm makes it 
> easy to reliably process unbounded streams of data, doing for real-time 
> processing what Hadoop did for batch processing.  The process is essentially 
> a DAG of nodes, which is called topology.
> Apache Atlas is a metadata repository that enables end-to-end data lineage, 
> search and associate business classification. 
> Overview
> The goal of this integration is to at minimum push the operational topology 
> metadata along with the underlying data source(s), target(s), derivation 
> processes and any available business context so Atlas can capture the lineage 
> for this topology.
> It would also help to support custom user annotations per node in the 
> topology.
> There are 2 parts in this process detailed below:
> Data model to represent the concepts in Storm
> Storm Bridge to update metadata in Atlas
> Data Model
> A data model is represented as a Type in Atlas. It contains the descriptions 
> of various nodes in the DAG, such as spouts and bolts and the corresponding 
> source and target types.  These need to be expressed as Types in Atlas type 
> system. At the least, we need to create types for:
> Storm topology containing spouts, bolts, etc. with associations between them
> Source (typically Kafka, etc.)
> Target (typically Hive, HBase, HDFS, etc.)
> You can take a look at the data model code for Hive. Storm should only be 
> simpler than Hive from a data modeling perspective.
> Pushing Metadata into Atlas
> There are 2 parts to the bridge:
> Storm Bridge 
> This is a one-time import for Storm to list all the active topologies and 
> push the metadata into Atlas to address cases where Storm deployments exist 
> before Atlas.
> You can refer to the bridge code for Hive.
> Post-execution Hook
> Atlas needs to be notified when a new topology is registered successfully in 
> Storm or when someone changes the definition of an existing topology.
> You can refer to the hook code for Hive.
>  
> Example use case:
> Custom annotations associated with each node in the topology.  
> For example: Data Quality Rules, Error Handling, etc. A set of annotations 
> that enumerates rules handling nulls– all nulls for a column get filtered, 
> etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to