I have been following it from the beginning as well. I understand there
would be short-term wins for some users (I don't think a huge amount of
users?), but I still feel like we are being a bit short-sighted here and
that we are creating more work for ourselves and potentially our users in
the future. I also feel like there will be side effects to users as well,
many of which don't care about the webserver scalability, such as bugs
caused by the addition of the new webserver representation. I think without
a design that is much larger in scope I wouldn't feel comfortable moving
forward with this AIP.

On Tue, Oct 15, 2019 at 3:21 PM Jarek Potiuk <jarek.pot...@polidea.com>
wrote:

> Hello Dan, Alex,
>
> I believe all the points you make are super-valid ones. But maybe you are
> missing the full context a bit.
>
> I followed the original discussion
> <
> https://lists.apache.org/thread.html/a2d426f93c0f4e5f0347627308638b59ca4072fd022a42af1163e34a@%3Cdev.airflow.apache.org%3E
> >
> from the very beginning and took part in the initial discussions when this
> topic was raised. From the discussion it is quite clear to me that this
> mostly a "tactical" approach to implement something that is backportable to
> 1.10 and rather quick to implement. This is targeted to make users more
> happy with their 1.10 version without the timing uncertainty and effort of
> migration to 2.0. It solves the major pain point of stability of the UI in
> case there are complex DAGs for which parsing crashes the webserver.
> Like in "being nice to your users".
>
> There will be a separate effort to make pretty much all of the things you
> mentioned in 2.0 in a non-backportable way as it requires far too many
> changes in the way how Airflow works internally.
>
> Maybe it needs some more explanation + long term plan that follows in the
> AIP itself to explain it to those who have not followed the initial
> discussion, but I think it's fully justified change.
>
> J.
>
> On Tue, Oct 15, 2019 at 9:10 PM Alex Guziel <alex.guz...@airbnb.com
> .invalid>
> wrote:
>
> > -1 (binding)
> > Good points made by Dan. We don't need to have the future plan
> implemented
> > completely but it would be nice to see more detailed notes about how this
> > will play out in the future. We shouldn't walk into a system that causes
> > more pain in the future. (I can't say for sure that it does, but I can't
> > say that it doesn't either). I don't think the proposal is necessarily
> > wrong or bad, but I think we need some more detailed planning around
> future
> > milestones.
> >
> > On Tue, Oct 15, 2019 at 12:04 PM Dan Davydov
> <ddavy...@twitter.com.invalid
> > >
> > wrote:
> >
> > > -1 (binding), this may sound a bit FUD-y but I don't feel this has been
> > > thought through enough...
> > >
> > > Having both a SimpleDagBag representation and the JSON representation
> > > doesn't make sense to me at the moment: *"**Quoting from Airflow code,
> it
> > > is “a simplified representation of a DAG that contains all attributes
> > > required for instantiating and scheduling its associated tasks.”. It
> does
> > > not contain enough information required by the webserver.". *Why not
> > create
> > > a representation that can be used by both? This is going to be a big
> > > headache to both understand and work with in the codebase since it will
> > be
> > > another definition that we need to keep in sync.
> > >
> > > Not sure if fileloc/fileloc_hash is the right solution, the longterm
> > > solution I am imagining has clients responsible for uploading DAGs
> rather
> > > than retrieving them from the filesystem so fileloc/fileloc_hash
> wouldn't
> > > even exist (dag_id would be used for indexing here).
> > >
> > > Versioning isn't really addressed either (e.g. if a DAG topology
> changes
> > > with some update you want to be able to show both the old and new ones,
> > or
> > > at least have a way to deal with them), there is an argument that this
> is
> > > acceptable since it isn't currently addressed now, but I'm worried that
> > > large schema changes should think through the long term plan a bit
> more.
> > >
> > > I feel like getting this wrong is going to make it very hard to migrate
> > > things in the future, and make the codebase worse (another
> representation
> > > of DAGs that we need to support/understand/keep parity for). If I'm
> wrong
> > > about this then I would be more willing to +1 this change. This doc is
> a
> > > 1-2 pager and I feel like it is not thorough or deep enough and doesn't
> > > give me enough confidence that the work in the PR is going to make it
> > > easier to complete the future milestones instead of harder.
> > >
> > > On Tue, Oct 15, 2019 at 11:26 AM Kamil Breguła <
> > kamil.breg...@polidea.com>
> > > wrote:
> > >
> > > > +1 (binding)
> > > >
> > > > On Tue, Oct 15, 2019 at 2:57 AM Kaxil Naik <kaxiln...@gmail.com>
> > wrote:
> > > >
> > > > > Hello, Airflow community,
> > > > >
> > > > > This email calls for a vote to add the DAG Serialization feature at
> > > > > https://github.com/apache/airflow/pull/5743.
> > > > >
> > > > > *AIP*:
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-24+DAG+Persistence+in+DB+using+JSON+for+Airflow+Webserver+and+%28optional%29+Scheduler
> > > > >
> > > > > *Previous Mailing List discussion*:
> > > > >
> > > >
> > >
> >
> https://lists.apache.org/thread.html/65d282368e0a7c19815badb8b1c6c8d72b0975ce94f601e13af44f74@%3Cdev.airflow.apache.org%3E
> > > > >  .
> > > > >
> > > > > *Authors*: Kaxil Naik, Zhou Fang, Ash-Berlin Taylor
> > > > >
> > > > > *Summary*:
> > > > >
> > > > >    - DAGs are serialized using JSON format and stored in a
> > > SerializedDag
> > > > >    table
> > > > >    - The Webserver now instead of having to parse the DAG file
> again,
> > > > >    reads the serialized DAGs in JSON, de-serializes them and
> creates
> > > the
> > > > >    DagBag and uses it to show in the UI.
> > > > >    - Instead of loading an entire DagBag when the WebServer starts
> we
> > > > >    only load each DAG on demand from the Serialized Dag table. This
> > > helps
> > > > >    reduce Webserver startup time and memory. The reduction is
> notable
> > > > when you
> > > > >    have a large number of DAGs.
> > > > >    - A JSON Schema has been defined and we validate the serialized
> > dag
> > > > >    before writing it to the database
> > > > >
> > > > > [image: image.png]
> > > > >
> > > > > A PR (https://github.com/apache/airflow/pull/5743) is ready for
> > review
> > > > > from the committers and community.
> > > > >
> > > > > We also have a WIP PR (https://github.com/apache/airflow/pull/5992
> )
> > to
> > > > > backport this feature to 1.10.* branch.
> > > > >
> > > > > A big thank you to Zhou and Ash for their continuous help in
> > improving
> > > > > this feature/PR.
> > > > >
> > > > > This email is formally calling for a vote to accept the AIP and PR.
> > > > Please
> > > > > note that we will update the PR / feature to fix bugs if we find
> any.
> > > > >
> > > > > Cheers,
> > > > > Kaxil
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
>
>
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
>

Reply via email to