Sorry about being late to this. Love this change. Will look deep into this in the next couple of days.
Thank you, Kevin Y On Sat, Feb 23, 2019 at 10:54 PM Tao Feng <[email protected]> wrote: > I manage to test out the latest version of the branch which I think all my > previous concerns don't exist anymore. Great job! The latest screenshot for > the new graph view could be found in the pr comment. > > On Sat, Feb 23, 2019 at 3:30 PM Tao Feng <[email protected]> wrote: > > > Thanks for bringing this up. I look at the pr before and here recaps > some > > of the previous concerns: > > 1. There will have a new *giant *button next to "go" button in graph > > view(it could be found in the screenshot in the pr) based on my last read > > of the pr. It even exists after DAGRun is created. I don't think it is > > ideal. Have you removed the button or not? I understand we need to read > the > > DAG File when run doesn't exist. But IMO we need a better UI than this. > If > > the latest UI removed the button, how user found out this link( > > http://host/graph?dag_id=my_dag&read_from_file=True) ? > > 2. Previous discussion mentioned that it requires user to run `airflow > > resetdb` if there are historical DAGRun. Is it still the case or not? > > 3. During Andrew's testing, he found that the DAG shape keeps changing > > during the backfill. Have we fixed the issue? > > 4. By reading the pr, I recall the graph_id is always increment by 1 > based > > on last graph_id. The last graph_id matches the latest execution date > > DAGRun. What happens to the case when the latest execution date doesn't > > change while user backfill multiple days? Will all these backfill DAGRuns > > get the same graph id in this case? Is it intentded? > > > > Let me if the latest version made this change. And I would suggest the > > community to test out the pr in your setup(with historical DAGRuns) if > > possible as it is potentially a big(breaking) change. We would like to > > make sure nothing unexpected happens if we are going with this direction. > > > > Thanks, > > -Tao > > > > On Sat, Feb 23, 2019 at 11:41 AM Bas Harenslak < > > [email protected]> wrote: > > > >> Let's discuss AIP-12 here: > >> > https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-12+Persist+DAG+into+DB > . > >> It involves persisting the entire DAG into the metastore. For full > details, > >> please read the AIP. > >> > >> A PR was made to create “versioned graphs” given by option #3 in the > AIP: > >> https://github.com/apache/airflow/pull/4396. This led to a long > >> discussion but has been quiet for the last few days. It would be sad to > see > >> the effort put in by @ffinfo not leading to anything. To recap the > summary > >> at the end of the PR, the current status is: > >> > >> Internal changes: > >> > >> > >> * This PR persists task dependencies in a new table `dag_edge`. > >> * The term "graph" is introduced in the code, this contains the > >> structure of a DAG, so the "edges" (dependencies) and "nodes" (tasks). > >> * A DagRun is bound to one `graph_id`. > >> * Currently in Airflow only the latest version of a DAG is displayed > >> in the UI (both graph & tree view). This means if you delete a task, you > >> cannot see runs of that task in the past anymore. > >> * In the graph view you can now see different graph versions, > because > >> we store both tasks and edges. > >> * For the record: in the tree view you still only the latest version > >> because it is not possible to combine all history into a single view. > >> > >> Changes from a user perspective: > >> > >> > >> * Nothing in the tree view. > >> * In the graph view, you can now view different "graphs" if you > >> change the structure of your DAG. Note the graph view shows DAG runs. If > >> you change your DAG without running it, it does not show in the graph > view. > >> * When you have no DAG runs, there is no graph to show. So, as > >> @ffinfo described above he then reads the graph from the DAG file > instead. > >> You can see this behaviour in the graph view url: > >> * if DagRuns exist: http://host/graph?dag_id=my_dag > >> * if no DagRuns exist: > >> http://host/graph?dag_id=my_dag&read_from_file=True > >> * In the screenshots in > >> https://github.com/apache/airflow/pull/4396#issuecomment-465217731, you > >> see this case. Since this is more of an internal thing how Airflow > works, > >> and not really informative for the user, @ffinfo removed the message in > his > >> last commit. > >> > >> Judging by the PR comments, everybody likes the idea of persisting more > >> of the DAG in the DB. All issues mentioned were addressed. It would be > >> great to see this work merged in Airflow, so please discuss anything > about > >> the PR/AIP here. > >> > >> Cheers, > >> Bas > >> > > >
