Tomasz and Ash got good points about the overhead of having separate repos.
But while we grow bigger and more mature, I would prefer to have what was
described in AIP-8. It shouldn't be extremely hard for us to come up with
good strategies to handle the overhead. AIP-8 already talked about how it
can benefit us. IMO on a high level, having clearly seperation on core vs.
hooks/operators would make the project much more scalable and the gains
would outweigh the cost we pay.

That being said, I'm supportive to this moving towards AIP-8 while learning
approach, quite a good practise to tackle a big project. Looking forward to
read the AIP.


Cheers,
Kevin Y

On Mon, Oct 28, 2019 at 6:21 AM Jarek Potiuk <jarek.pot...@polidea.com>
wrote:

> We are checking how we can use namespaces in back-portable way and we will
> have POC soon so that we all will be able to see how it will look like.
>
> J.
>
> On Mon, Oct 28, 2019 at 1:24 PM Ash Berlin-Taylor <a...@apache.org> wrote:
>
> > I'll have to read your proposal in detail (sorry, no time right now!),
> but
> > I'm broadly in favour of this approach, and I think keeping them _in_ the
> > same repo is the best plan -- that makes writing and  testing
> cross-cutting
> > changes  easier.
> >
> > -a
> >
> > > On 28 Oct 2019, at 12:14, Tomasz Urbaszek <tomasz.urbas...@polidea.com
> >
> > wrote:
> > >
> > > I think utilizing namespaces should reduce a lot of problems raised by
> > > using separate repos (who will manage it? how to release? where should
> be
> > > the repo?).
> > >
> > > Bests,
> > > Tomek
> > >
> > > On Sun, Oct 27, 2019 at 11:54 AM Jarek Potiuk <
> jarek.pot...@polidea.com>
> > > wrote:
> > >
> > >> Thanks Bas for comments! Let me share my thoughts below.
> > >>
> > >> On Sun, Oct 27, 2019 at 9:23 AM Bas Harenslak <
> > >> basharens...@godatadriven.com>
> > >> wrote:
> > >>
> > >>> Hi Jarek, I definitely see a future in creating separate installable
> > >>> packages for various operators/hooks/etc (as in AIP-8). This would
> IMO
> > >>> strip the “core” Airflow to only what’s needed and result in a small
> > >>> package without a ton of dependencies (and make it more maintainable,
> > >>> shorter tests, etc etc etc). Not exactly sure though what you’re
> > >> proposing
> > >>> in your e-mail, is it a new AIP for an intermediate step towards
> AIP-8?
> > >>>
> > >>
> > >> It's a new AIP I am proposing.  For now it's only for backporting the
> > new
> > >> 2.0 import paths to 1.10.* series.
> > >>
> > >> It's more of "incremental going in direction of AIP-8 and learning
> some
> > >> difficulties involved" than implementing AIP-8 fully. We are taking
> > >> advantage of changes in import paths from AIP-21 which make it
> possible
> > to
> > >> have both old and new (optional) operators available in 1.10.* series
> of
> > >> Airflow. I think there is a lot more to do for full implementation of
> > >> AIP-8: decisions how to maintain, install those operator groups
> > separately,
> > >> stewardship model/organisation for the separate groups, how to manage
> > >> cross-dependencies, procedures for releasing the packages etc.
> > >>
> > >> I think about this new AIP also as a learning effort - we would learn
> > more
> > >> how separate packaging works and then we can follow up with AIP-8 full
> > >> implementation for "modular" Airflow. Then AIP-8 could be implemented
> in
> > >> Airflow 2.1 for example - or 3.0 if we start following semantic
> > versioning
> > >> - based on those learnings. It's a bit of good example of having cake
> > and
> > >> eating it too. We can try out modularity in 1.10.* while cutting the
> > scope
> > >> of 2.0 and not implementing full management/release procedure for
> AIP-8
> > >> yet.
> > >>
> > >>
> > >>> Thinking about this, I think there are still a few grey areas (which
> > >> would
> > >>> be good to discuss in a new AIP, or continue on AIP-8):
> > >>>
> > >>>  *   In your email you only speak only about the 3 big cloud
> providers
> > >>> (btw I made a PR for migrating all AWS components ->
> > >>> https://github.com/apache/airflow/pull/6439). Is there a plan for
> > >>> splitting other components than Google/AWS/Azure?
> > >>>
> > >>
> > >> We could add more groups as part of this new AIP indeed (as an
> > extension to
> > >> AIP-21 and pre-requisite to AIP-8). We already see how
> > moving/deprecation
> > >> works for the providers package - it works for GCP/Google rather
> nicely.
> > >> But there is nothing to prevent us from extending it to cover other
> > groups
> > >> of operators/hooks. If you look at the current structure of
> > documentation
> > >> done by Kamil, we can follow the structure there and move the
> > >> operators/hooks accordingly (
> > >> https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html
> ):
> > >>
> > >>      Fundamentals, ASF: Apache Software Foundation, Azure: Microsoft
> > >> Azure, AWS: Amazon Web Services, GCP: Google Cloud Platform, Service
> > >> integrations, Software integrations, Protocol integrations.
> > >>
> > >> I am happy to include that in the AIP - if others agree it's a good
> > idea.
> > >> Out of those groups -  I think only Fundamentals should not be
> > back-ported.
> > >> Others should be rather easy to port (if we decide to). We already
> have
> > >> quite a lot of those in the new GCP operators for 2.0. So starting
> with
> > >> GCP/Google group is a good idea. Also following with Cloud Providers
> > first
> > >> is a good thing. For example we have now support from Google Composer
> > team
> > >> to do this separation for GCP (and we learn from it) and then we can
> > claim
> > >> the stewardship in our team for releasing the python 3/ Airflow
> > >> 1.10-compatible "airflow-google" packages. Possibly other Cloud
> > >> Providers/teams might follow this (if they see the value in it) and
> > there
> > >> could be different stewards for those. And then we can do other groups
> > if
> > >> we decide to. I think this way we can learn whether AIP-8 is
> manageable
> > and
> > >> what real problems we are going to face.
> > >>
> > >>  *   Each “plugin” e.g. GCP would be a separate repo, should we create
> > >>> some sort of blueprint for such packages?
> > >>>
> > >>
> > >> I think we do not need separate repos (at all) but in this new AIP we
> > can
> > >> test it before we decide to go for AIP-8. IMHO - monorepo approach
> will
> > >> work here rather nicely. We could use python-3 native namespaces
> > >> <https://packaging.python.org/guides/packaging-namespace-packages/>
> for
> > >> the
> > >> sub-packages when we go full AIP-8. For now we could simply package
> the
> > new
> > >> operators in separate pip package for Python 3 version 1.10.* series
> > only.
> > >> We only need to test if it works well with another package providing
> > >> 'airflow.providers.*' after apache-airflow is installed (providing
> > >> 'airflow' package). But I think we can make it work. I don't think we
> > >> really need to split the repos, namespaces will work just fine and has
> > >> easier management of cross-repository dependencies (but we can learn
> > >> otherwise). For sure we will not need it for the new proposed AIP of
> > >> backporting groups to 1.10 and we can defer that decision to AIP-8
> > >> implementation time.
> > >>
> > >>
> > >>>  *   In which Airflow version do we start raising deprecation
> warnings
> > >>> and in which version would we remove the original?
> > >>>
> > >>
> > >> I think we should do what we did in GCP case already. Those old
> > "imports"
> > >> for operators can be made as deprecated in Airflow 2.0 (and removed in
> > 2.1
> > >> or 3.0 if we start following semantic versioning). We can however do
> it
> > >> before in 1.10.7 or 1.10.8 if we release those (without removing the
> old
> > >> operators yet - just raise deprecation warnings and inform that for
> > python3
> > >> the new "airflow-google", "airflow-aws" etc. packages can be installed
> > and
> > >> users can switch to it).
> > >>
> > >> J.
> > >>
> > >>
> > >>>
> > >>> Cheers,
> > >>> Bas
> > >>>
> > >>> On 27 Oct 2019, at 08:33, Jarek Potiuk <jarek.pot...@polidea.com
> > <mailto:
> > >>> jarek.pot...@polidea.com>> wrote:
> > >>>
> > >>> Hello - any comments on that? I am happy to make it into an AIP :)?
> > >>>
> > >>> On Sun, Oct 13, 2019 at 5:53 PM Jarek Potiuk <
> jarek.pot...@polidea.com
> > >>> <mailto:jarek.pot...@polidea.com>>
> > >>> wrote:
> > >>>
> > >>> *Motivation*
> > >>>
> > >>> I think we really should start thinking about making it easier to
> > migrate
> > >>> to 2.0 for our users. After implementing some recent changes related
> to
> > >>> AIP-21-
> > >>> Changes in import paths
> > >>> <
> > >>>
> > >>
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-21%3A+Changes+in+import+paths
> > >>>
> > >>> I
> > >>> think I have an idea that might help with it.
> > >>>
> > >>> *Proposal*
> > >>>
> > >>> We could package some of the new and improved 2.0 operators (moved to
> > >>> "providers" package) and let them be used in Python 3 environment of
> > >>> airflow 1.10.x.
> > >>>
> > >>> This can be done case-by-case per "cloud provider". It should not be
> > >>> obligatory, should be largely driven by each provider. It's not yet
> > full
> > >>> AIP-8
> > >>> Split Hooks/Operators into separate packages
> > >>> <
> > >>>
> > >>
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=100827303
> > >>> .
> > >>> It's
> > >>> merely backporting of some operators/hooks to get it work in 1.10.
> But
> > by
> > >>> doing it we might try out the concept of splitting, learn about
> > >> maintenance
> > >>> problems and maybe implement full *AIP-8 *approach in 2.1
> consistently
> > >>> across the board.
> > >>>
> > >>> *Context*
> > >>>
> > >>> Part of the AIP-21 was to move import paths for Cloud providers to
> > >>> separate providers/<PROVIDER> package. An example for that (the first
> > >>> provider we already almost migrated) was providers/google package
> > >> (further
> > >>> divided into gcp/gsuite etc).
> > >>>
> > >>> We've done a massive migration of all the Google-related operators,
> > >>> created a few missing ones and retrofitted some old operators to
> follow
> > >> GCP
> > >>> best practices and fixing a number of problems - also implementing
> > >> Python3
> > >>> and Pylint compatibility. Some of these operators/hooks are not
> > backwards
> > >>> compatible. Those that are compatible are still available via the old
> > >>> imports with deprecation warning.
> > >>>
> > >>> We've added missing tests (including system tests) and missing
> > features -
> > >>> improving some of the Google operators - giving the users more
> > >> capabilities
> > >>> and fixing some issues. Those operators should pretty much "just
> work"
> > in
> > >>> Airflow 1.10.x (any recent version) for Python 3. We should be able
> to
> > >>> release a separate pip-installable package for those operators that
> > users
> > >>> should be able to install in Airflow 1.10.x.
> > >>>
> > >>> Any user will be able to install this separate package in their
> Airflow
> > >>> 1.10.x installation and start using those new "provider" operators in
> > >>> parallel to the old 1.10.x operators. Other providers ("microsoft",
> > >>> "amazon") might follow the same approach if they want. We could even
> at
> > >>> some point decide to move some of the core operators in similar
> fashion
> > >>> (for example following the structure proposed in the latest
> > >> documentation:
> > >>> fundamentals / software / etc.
> > >>>
> https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html)
> > >>>
> > >>> *Pros and cons*
> > >>>
> > >>> There are a number of pros:
> > >>>
> > >>>  - Users will have an easier migration path if they are deeply vested
> > >>>  into 1.10.* version
> > >>>  - It's possible to migrate in stages for people who are also vested
> in
> > >>>  py2: *py2 (1.10) -> py3 (1.10) -> py3 + new operators (1.10) -> py3
> +
> > >>>  2.0*
> > >>>  - Moving to new operators in py3 + new operators can be done
> > >>>  gradually. Old operators will continue to work while new can be used
> > >> more
> > >>>  and more
> > >>>  - People will get incentivised to migrate to python 3 before 2.0 is
> > >>>  out (by using new operators)
> > >>>  - Each provider "package" can have independent release schedule -
> and
> > >>>  add functionality in already released Airflow versions.
> > >>>  - We do not take out any functionality from the users - we just add
> > >>>  more options
> > >>>  - The releases can be - similarly as main airflow releases - voted
> > >>>  separately by PMC after "stewards" of the package (per provider)
> > >> perform
> > >>>  round of testing on 1.10.* versions.
> > >>>  - Users will start migrating to new operators earlier and have
> > >>>  smoother switch to 2.0 later
> > >>>  - The latest improved operators will start
> > >>>
> > >>> There are three cons I could think of:
> > >>>
> > >>>  - There will be quite a lot of duplication between old and new
> > >>>  operators (they will co-exist in 1.10). That might lead to confusion
> > of
> > >>>  users and problems with cooperation between different
> operators/hooks
> > >>>  - Having new operators in 1.10 python 3 might keep people from
> > >>>  migrating to 2.0
> > >>>  - It will require some maintenance and separate release overhead.
> > >>>
> > >>> I already spoke to Composer team @Google and they are very positive
> > about
> > >>> this. I also spoke to Ash and seems it might also be OK for
> Astronomer
> > >>> team. We have Google's backing and support, and we can provide
> > >> maintenance
> > >>> and support for those packages - being an example for other providers
> > how
> > >>> they can do it.
> > >>>
> > >>> Let me know what you think - and whether I should make it into an
> > >> official
> > >>> AIP maybe?
> > >>>
> > >>> J.
> > >>>
> > >>>
> > >>>
> > >>> --
> > >>>
> > >>> Jarek Potiuk
> > >>> Polidea <https://www.polidea.com/> | Principal Software Engineer
> > >>>
> > >>> M: +48 660 796 129 <+48660796129>
> > >>> [image: Polidea] <https://www.polidea.com/>
> > >>>
> > >>>
> > >>>
> > >>> --
> > >>>
> > >>> Jarek Potiuk
> > >>> Polidea <https://www.polidea.com/> | Principal Software Engineer
> > >>>
> > >>> M: +48 660 796 129 <+48660796129>
> > >>> [image: Polidea] <https://www.polidea.com/>
> > >>>
> > >>>
> > >>
> > >> --
> > >>
> > >> Jarek Potiuk
> > >> Polidea <https://www.polidea.com/> | Principal Software Engineer
> > >>
> > >> M: +48 660 796 129 <+48660796129>
> > >> [image: Polidea] <https://www.polidea.com/>
> > >>
> > >
> > >
> > > --
> > >
> > > Tomasz Urbaszek
> > > Polidea <https://www.polidea.com/> | Junior Software Engineer
> > >
> > > M: +48 505 628 493 <+48505628493>
> > > E: tomasz.urbas...@polidea.com <tomasz.urbasz...@polidea.com>
> > >
> > > Unique Tech
> > > Check out our projects! <https://www.polidea.com/our-work>
> >
> >
>
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
>

Reply via email to