[ 
https://issues.apache.org/jira/browse/FALCON-288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897363#comment-13897363
 ] 

Venkatesh Seetharam commented on FALCON-288:
--------------------------------------------

The entity dependency graph looks good except that we can add 2 more things:
* colo as the vertex with an edge from cluster to colo as "collocated"
* workflow as a vertex with an edge from process as "executes". I'd like to 
capture versioning on workflows and makes sense to capture that in an instance. 

Also, the entity vertices have 2 keys that are indexed, 
* *name* = entity-name which should be unique
* *type* = entity type

I have a few questions on instance lineage graph
* Redundant edge to the cluster? Its already there for a feed, no? Feed might 
get updated with a new cluster?
* How to version workflow instances?
* How do we treat reinstatements of instances? Delete old and retain the latest?

Instance keys:
* name = instance-id (timed partition)
* type = entity-type
* creation-time = timestamp
* workflow will have workflowId, subflowId, engine-url, and engine

Thoughts?

> Persist lineage information into a persistent store
> ---------------------------------------------------
>
>                 Key: FALCON-288
>                 URL: https://issues.apache.org/jira/browse/FALCON-288
>             Project: Falcon
>          Issue Type: Sub-task
>    Affects Versions: 0.5
>            Reporter: Venkatesh Seetharam
>              Labels: lineage
>         Attachments: Dependency Graph.png, Lineage Over Dependency.png
>
>
> Need to evaluate the store - rdbms vs graph db. Leaning towards latter since 
> the data is hierarchical.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to