[ 
https://issues.apache.org/jira/browse/FALCON-288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13917297#comment-13917297
 ] 

Venkatesh Seetharam commented on FALCON-288:
--------------------------------------------

Thanks [~sriksun] for reviewing the patch over the weekend. Sincerely 
appreciate it.

bq. Why do we need user node attached to the cluster vertex that relation isn't 
very useful and is likely to be misleading as well.
The intent was to capture the user who created this cluster. Not an owner per 
se.

bq. addVertex() checks for existence of the vertex, however similar thing is 
not done for edge. 
Good catch. Will add this.

bq. it might be useful to not assume the default edge label to be "output", but 
to actually check for it and throw an assertion error otherwise.
Ok, makes sense.

bq. This is going to be a little tricky. If you leave behind vertices, even 
after all incident edges are removed, database is going to monotonically 
increase in size and cause performance issue along the line.
This will never be the case in this model. Also, I don't think we will ever 
have thousands of entities to work with, no?

bq. Is the motivation of adding classification & groups relationship for every 
instance to provide "WHAT-WAS" view of the feed instance?
Yes but not very religious about it if it does indeed affect performance. These 
are only edges and no new vertices.

bq. Why is workflowInstance a separate node in the graph and not a set of 
property on the process instance? 
I thought this can capture changes to workflows and add more properties down 
the line as you describe with reruns.

bq. I can imagine this being useful in re-run scenarios, but I dont see that 
run-relationship being captured though.
The run id is captured as a property. This is an initial implementation and 
needs to be worked upon to enhance it.

bq. It is reasonable to leave behind graph elements after an entity is deleted 
to allow historical queries. However there has to be some cleanup based on time 
limit that ought to be available. 
This also is what we discussed to leave behind elements. The clean up can be 
time based in the background which can come in a separate jira.

> Persist lineage information into a persistent store
> ---------------------------------------------------
>
>                 Key: FALCON-288
>                 URL: https://issues.apache.org/jira/browse/FALCON-288
>             Project: Falcon
>          Issue Type: Sub-task
>    Affects Versions: 0.5
>            Reporter: Venkatesh Seetharam
>            Assignee: Venkatesh Seetharam
>              Labels: lineage
>         Attachments: Dependency Graph.png, FALCON-288-Hive-Review.patch, 
> FALCON-288-review-v1.patch, FALCON-288-review.patch, FALCON-288-v1.patch, 
> Lineage Over Dependency.png
>
>
> Need to evaluate the store - rdbms vs graph db. Leaning towards latter since 
> the data is hierarchical.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to