[ 
https://issues.apache.org/jira/browse/ATLAS-183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Ahn updated ATLAS-183:
-----------------------------
    Description: 
Apache Storm Integration with Apache Atlas (incubating)
Introduction
Apache Storm is a distributed real-time computation system. Storm makes it easy 
to reliably process unbounded streams of data, doing for real-time processing 
what Hadoop did for batch processing.  The process is essentially a DAG of 
nodes, which is called topology.

Apache Atlas is a metadata repository that enables end-to-end data lineage, 
search and associate business classification. 
Overview
The goal of this integration is to at minimum push the operational topology 
metadata along with the underlying data source(s), target(s), derivation 
processes and any available business context so Atlas can capture the lineage 
for this topology.

It would also help to support custom user annotations per node in the topology.

There are 2 parts in this process detailed below:
Data model to represent the concepts in Storm
Storm Bridge to update metadata in Atlas
Data Model
A data model is represented as a Type in Atlas. It contains the descriptions of 
various nodes in the DAG, such as spouts and bolts and the corresponding source 
and target types.  These need to be expressed as Types in Atlas type system. At 
the least, we need to create types for:
Storm topology containing spouts, bolts, etc. with associations between them
Source (typically Kafka, etc.)
Target (typically Hive, HBase, HDFS, etc.)

You can take a look at the data model code for Hive. Storm should only be 
simpler than Hive from a data modeling perspective.
Pushing Metadata into Atlas
There are 2 parts to the bridge:
Storm Bridge 
This is a one-time import for Storm to list all the active topologies and push 
the metadata into Atlas to address cases where Storm deployments exist before 
Atlas.

You can refer to the bridge code for Hive.

Post-execution Hook
Atlas needs to be notified when a new topology is registered successfully in 
Storm or when someone changes the definition of an existing topology.

You can refer to the hook code for Hive.
 
Example use case:
Custom annotations associated with each node in the topology.  
For example: Data Quality Rules, Error Handling, etc. A set of annotations that 
enumerates rules handling nulls– all nulls for a column get filtered, etc.



> Add a Hook in Storm to post the topology metadata
> -------------------------------------------------
>
>                 Key: ATLAS-183
>                 URL: https://issues.apache.org/jira/browse/ATLAS-183
>             Project: Atlas
>          Issue Type: Sub-task
>    Affects Versions: 0.6-incubating
>            Reporter: Venkatesh Seetharam
>             Fix For: 0.6-incubating
>
>
> Apache Storm Integration with Apache Atlas (incubating)
> Introduction
> Apache Storm is a distributed real-time computation system. Storm makes it 
> easy to reliably process unbounded streams of data, doing for real-time 
> processing what Hadoop did for batch processing.  The process is essentially 
> a DAG of nodes, which is called topology.
> Apache Atlas is a metadata repository that enables end-to-end data lineage, 
> search and associate business classification. 
> Overview
> The goal of this integration is to at minimum push the operational topology 
> metadata along with the underlying data source(s), target(s), derivation 
> processes and any available business context so Atlas can capture the lineage 
> for this topology.
> It would also help to support custom user annotations per node in the 
> topology.
> There are 2 parts in this process detailed below:
> Data model to represent the concepts in Storm
> Storm Bridge to update metadata in Atlas
> Data Model
> A data model is represented as a Type in Atlas. It contains the descriptions 
> of various nodes in the DAG, such as spouts and bolts and the corresponding 
> source and target types.  These need to be expressed as Types in Atlas type 
> system. At the least, we need to create types for:
> Storm topology containing spouts, bolts, etc. with associations between them
> Source (typically Kafka, etc.)
> Target (typically Hive, HBase, HDFS, etc.)
> You can take a look at the data model code for Hive. Storm should only be 
> simpler than Hive from a data modeling perspective.
> Pushing Metadata into Atlas
> There are 2 parts to the bridge:
> Storm Bridge 
> This is a one-time import for Storm to list all the active topologies and 
> push the metadata into Atlas to address cases where Storm deployments exist 
> before Atlas.
> You can refer to the bridge code for Hive.
> Post-execution Hook
> Atlas needs to be notified when a new topology is registered successfully in 
> Storm or when someone changes the definition of an existing topology.
> You can refer to the hook code for Hive.
>  
> Example use case:
> Custom annotations associated with each node in the topology.  
> For example: Data Quality Rules, Error Handling, etc. A set of annotations 
> that enumerates rules handling nulls– all nulls for a column get filtered, 
> etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to