[
https://issues.apache.org/jira/browse/FALCON-288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897363#comment-13897363
]
Venkatesh Seetharam commented on FALCON-288:
--------------------------------------------
The entity dependency graph looks good except that we can add 2 more things:
* colo as the vertex with an edge from cluster to colo as "collocated"
* workflow as a vertex with an edge from process as "executes". I'd like to
capture versioning on workflows and makes sense to capture that in an instance.
Also, the entity vertices have 2 keys that are indexed,
* *name* = entity-name which should be unique
* *type* = entity type
I have a few questions on instance lineage graph
* Redundant edge to the cluster? Its already there for a feed, no? Feed might
get updated with a new cluster?
* How to version workflow instances?
* How do we treat reinstatements of instances? Delete old and retain the latest?
Instance keys:
* name = instance-id (timed partition)
* type = entity-type
* creation-time = timestamp
* workflow will have workflowId, subflowId, engine-url, and engine
Thoughts?
> Persist lineage information into a persistent store
> ---------------------------------------------------
>
> Key: FALCON-288
> URL: https://issues.apache.org/jira/browse/FALCON-288
> Project: Falcon
> Issue Type: Sub-task
> Affects Versions: 0.5
> Reporter: Venkatesh Seetharam
> Labels: lineage
> Attachments: Dependency Graph.png, Lineage Over Dependency.png
>
>
> Need to evaluate the store - rdbms vs graph db. Leaning towards latter since
> the data is hierarchical.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)