[ https://issues.apache.org/jira/browse/ATLAS-183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Shwetha G S updated ATLAS-183: ------------------------------ Attachment: (was: ATLAS-433-v2.patch) > Add a Hook in Storm to post the topology metadata > ------------------------------------------------- > > Key: ATLAS-183 > URL: https://issues.apache.org/jira/browse/ATLAS-183 > Project: Atlas > Issue Type: Sub-task > Affects Versions: 0.6-incubating > Reporter: Venkatesh Seetharam > Assignee: Hemanth Yamijala > Fix For: trunk > > Attachments: ATLAS-183-1.patch, ATLAS-183-4.patch, ATLAS-183.patch > > > Apache Storm Integration with Apache Atlas (incubating) > Introduction > Apache Storm is a distributed real-time computation system. Storm makes it > easy to reliably process unbounded streams of data, doing for real-time > processing what Hadoop did for batch processing. The process is essentially > a DAG of nodes, which is called topology. > Apache Atlas is a metadata repository that enables end-to-end data lineage, > search and associate business classification. > Overview > The goal of this integration is to at minimum push the operational topology > metadata along with the underlying data source(s), target(s), derivation > processes and any available business context so Atlas can capture the lineage > for this topology. > It would also help to support custom user annotations per node in the > topology. > There are 2 parts in this process detailed below: > Data model to represent the concepts in Storm > Storm Bridge to update metadata in Atlas > Data Model > A data model is represented as a Type in Atlas. It contains the descriptions > of various nodes in the DAG, such as spouts and bolts and the corresponding > source and target types. These need to be expressed as Types in Atlas type > system. At the least, we need to create types for: > Storm topology containing spouts, bolts, etc. with associations between them > Source (typically Kafka, etc.) > Target (typically Hive, HBase, HDFS, etc.) > You can take a look at the data model code for Hive. Storm should only be > simpler than Hive from a data modeling perspective. > Pushing Metadata into Atlas > There are 2 parts to the bridge: > Storm Bridge > This is a one-time import for Storm to list all the active topologies and > push the metadata into Atlas to address cases where Storm deployments exist > before Atlas. > You can refer to the bridge code for Hive. > Post-execution Hook > Atlas needs to be notified when a new topology is registered successfully in > Storm or when someone changes the definition of an existing topology. > You can refer to the hook code for Hive. > > Example use case: > Custom annotations associated with each node in the topology. > For example: Data Quality Rules, Error Handling, etc. A set of annotations > that enumerates rules handling nulls– all nulls for a column get filtered, > etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)