-1 (binding)
Good points made by Dan. We don't need to have the future plan implemented
completely but it would be nice to see more detailed notes about how this
will play out in the future. We shouldn't walk into a system that causes
more pain in the future. (I can't say for sure that it does, but I can't
say that it doesn't either). I don't think the proposal is necessarily
wrong or bad, but I think we need some more detailed planning around future
milestones.

On Tue, Oct 15, 2019 at 12:04 PM Dan Davydov <ddavy...@twitter.com.invalid>
wrote:

> -1 (binding), this may sound a bit FUD-y but I don't feel this has been
> thought through enough...
>
> Having both a SimpleDagBag representation and the JSON representation
> doesn't make sense to me at the moment: *"**Quoting from Airflow code, it
> is “a simplified representation of a DAG that contains all attributes
> required for instantiating and scheduling its associated tasks.”. It does
> not contain enough information required by the webserver.". *Why not create
> a representation that can be used by both? This is going to be a big
> headache to both understand and work with in the codebase since it will be
> another definition that we need to keep in sync.
>
> Not sure if fileloc/fileloc_hash is the right solution, the longterm
> solution I am imagining has clients responsible for uploading DAGs rather
> than retrieving them from the filesystem so fileloc/fileloc_hash wouldn't
> even exist (dag_id would be used for indexing here).
>
> Versioning isn't really addressed either (e.g. if a DAG topology changes
> with some update you want to be able to show both the old and new ones, or
> at least have a way to deal with them), there is an argument that this is
> acceptable since it isn't currently addressed now, but I'm worried that
> large schema changes should think through the long term plan a bit more.
>
> I feel like getting this wrong is going to make it very hard to migrate
> things in the future, and make the codebase worse (another representation
> of DAGs that we need to support/understand/keep parity for). If I'm wrong
> about this then I would be more willing to +1 this change. This doc is a
> 1-2 pager and I feel like it is not thorough or deep enough and doesn't
> give me enough confidence that the work in the PR is going to make it
> easier to complete the future milestones instead of harder.
>
> On Tue, Oct 15, 2019 at 11:26 AM Kamil Breguła <kamil.breg...@polidea.com>
> wrote:
>
> > +1 (binding)
> >
> > On Tue, Oct 15, 2019 at 2:57 AM Kaxil Naik <kaxiln...@gmail.com> wrote:
> >
> > > Hello, Airflow community,
> > >
> > > This email calls for a vote to add the DAG Serialization feature at
> > > https://github.com/apache/airflow/pull/5743.
> > >
> > > *AIP*:
> > >
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-24+DAG+Persistence+in+DB+using+JSON+for+Airflow+Webserver+and+%28optional%29+Scheduler
> > >
> > > *Previous Mailing List discussion*:
> > >
> >
> https://lists.apache.org/thread.html/65d282368e0a7c19815badb8b1c6c8d72b0975ce94f601e13af44f74@%3Cdev.airflow.apache.org%3E
> > >  .
> > >
> > > *Authors*: Kaxil Naik, Zhou Fang, Ash-Berlin Taylor
> > >
> > > *Summary*:
> > >
> > >    - DAGs are serialized using JSON format and stored in a
> SerializedDag
> > >    table
> > >    - The Webserver now instead of having to parse the DAG file again,
> > >    reads the serialized DAGs in JSON, de-serializes them and creates
> the
> > >    DagBag and uses it to show in the UI.
> > >    - Instead of loading an entire DagBag when the WebServer starts we
> > >    only load each DAG on demand from the Serialized Dag table. This
> helps
> > >    reduce Webserver startup time and memory. The reduction is notable
> > when you
> > >    have a large number of DAGs.
> > >    - A JSON Schema has been defined and we validate the serialized dag
> > >    before writing it to the database
> > >
> > > [image: image.png]
> > >
> > > A PR (https://github.com/apache/airflow/pull/5743) is ready for review
> > > from the committers and community.
> > >
> > > We also have a WIP PR (https://github.com/apache/airflow/pull/5992) to
> > > backport this feature to 1.10.* branch.
> > >
> > > A big thank you to Zhou and Ash for their continuous help in improving
> > > this feature/PR.
> > >
> > > This email is formally calling for a vote to accept the AIP and PR.
> > Please
> > > note that we will update the PR / feature to fix bugs if we find any.
> > >
> > > Cheers,
> > > Kaxil
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> >
>

Reply via email to