[
https://issues.apache.org/jira/browse/FALCON-288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13917297#comment-13917297
]
Venkatesh Seetharam commented on FALCON-288:
--------------------------------------------
Thanks [~sriksun] for reviewing the patch over the weekend. Sincerely
appreciate it.
bq. Why do we need user node attached to the cluster vertex that relation isn't
very useful and is likely to be misleading as well.
The intent was to capture the user who created this cluster. Not an owner per
se.
bq. addVertex() checks for existence of the vertex, however similar thing is
not done for edge.
Good catch. Will add this.
bq. it might be useful to not assume the default edge label to be "output", but
to actually check for it and throw an assertion error otherwise.
Ok, makes sense.
bq. This is going to be a little tricky. If you leave behind vertices, even
after all incident edges are removed, database is going to monotonically
increase in size and cause performance issue along the line.
This will never be the case in this model. Also, I don't think we will ever
have thousands of entities to work with, no?
bq. Is the motivation of adding classification & groups relationship for every
instance to provide "WHAT-WAS" view of the feed instance?
Yes but not very religious about it if it does indeed affect performance. These
are only edges and no new vertices.
bq. Why is workflowInstance a separate node in the graph and not a set of
property on the process instance?
I thought this can capture changes to workflows and add more properties down
the line as you describe with reruns.
bq. I can imagine this being useful in re-run scenarios, but I dont see that
run-relationship being captured though.
The run id is captured as a property. This is an initial implementation and
needs to be worked upon to enhance it.
bq. It is reasonable to leave behind graph elements after an entity is deleted
to allow historical queries. However there has to be some cleanup based on time
limit that ought to be available.
This also is what we discussed to leave behind elements. The clean up can be
time based in the background which can come in a separate jira.
> Persist lineage information into a persistent store
> ---------------------------------------------------
>
> Key: FALCON-288
> URL: https://issues.apache.org/jira/browse/FALCON-288
> Project: Falcon
> Issue Type: Sub-task
> Affects Versions: 0.5
> Reporter: Venkatesh Seetharam
> Assignee: Venkatesh Seetharam
> Labels: lineage
> Attachments: Dependency Graph.png, FALCON-288-Hive-Review.patch,
> FALCON-288-review-v1.patch, FALCON-288-review.patch, FALCON-288-v1.patch,
> Lineage Over Dependency.png
>
>
> Need to evaluate the store - rdbms vs graph db. Leaning towards latter since
> the data is hierarchical.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)