Thanks Bas for looking at it. Much appreciated !

>
> (1) First general comment is the split in microsoft providers seems
unnecessary to me:
>
>   *
>   *   apache-airflow-backport-providers-microsoft-azure==2020.5.20rc2
>   *   apache-airflow-backport-providers-microsoft-mssql==2020.5.20rc2
>   *   apache-airflow-backport-providers-microsoft-winrm==2020.5.20rc2
> Amazon and Google have just 1 single providers package. Is this
intentional?

Initially, we had separate packages for Google cloud but we decided to join
them as they have a lot of common stuff and having them separated would
mean that they heavily depended on each other anyway,
For Amazon, there is only one "aws" sub-package so we could argue if it
should be an "amazon" or "amazon-aws" one. Both would be fine. For
Microsoft, it is a bit different.
Those three packages do not depend on each other at all. Even when you look
at their requirements you have this:

PROVIDERS_REQUIREMENTS: Dict[str, Iterable[str]] = {
    "amazon": amazon,
    ...
    "google": google,
    ...
    "microsoft.azure": azure,
    "microsoft.mssql": mssql,
    "microsoft.winrm": winrm,
    ...
}

And looking at the READMEs - "microsoft.azure" cross-depends on "oracle"
provider, "Microsoft.mssql" cross-depends on "odbc" one, and "winrm" does
not cross-depend on any other provider package. So it made perfect sense to
leave them as separate packages. One might argue, it might reflect the
"philosophical" difference between those companies. I believe Google And
Amazon - have a lot of cross-dependencies in their own products - even
different ones and often have a lot of common code/interfaces and whenever
Google or Amazon buys something they closely integrate it in their product
offering. With Microsoft it's a bit different - Azure and Office And MSSQL
and WINRM have very little to do with each other on a technology level -
they are not that well integrated (well Azure runs mostly on Linux, not
Windows :) ).

So for me, such a distinction makes sense. But I am happy to hear
other's opinions.

>
> (2) Second, the naming convention is now consistent which is great! Nit:
airflow.providers.google.suite.operators.sheets.GoogleSheetsCreateSpreadsheet
is an operator but doesn’t end with “Operator”.

Aaargh! Thanks! - That's a bummer :). I think that on its own is a reason
to cancel rc2 :D:D:D:D:D:D

>
> (3) Third, I wanted to validate at least importing all
hooks/sensors/operators/etc. works correctly. Therefore I wrote a test
script, which iterates over all providers and per backport package:
>
>   1.  Runs a Python 3.7 Docker container
>   2.  Does a fresh install of Airflow 1.10.10
>   3.  Does a fresh install of the backport package
>   4.  And tries importing each class (e.g. “from
airflow.providers.amazon.aws.operators.ecs import ECSOperator”)

Indeed that's a great idea and something that ACTUALLY can make us go for
rc3. I already do import all provider classes for 2.0. I am importing all
operators/hooks/classes/secrets in order to generate the
README's automatically (so I have the robust and tested script for that):
https://github.com/apache/airflow/blob/master/backport_packages/setup_backport_packages.py#L1070)
- the method finds and imports all
operators/hooks/sensors/secrets/protocols from providers. And with all the
CI/Breeze automation we can very easily plug this in the CI framework of
ours. I also already run automatically (in CI)  test the installation of
all packages one-by-one using 1.10.10 version of Airflow. You can see the
result here (for example):
https://github.com/apache/airflow/runs/701914153?check_suite_focus=true#step:6:9384)
but indeed we should combine the two - installation on 1.10 and importing!

> The script is not 100% fool-proof and wildly inefficient, but should give
us the bulk of the import errors. I did not test for cross-provider
installations. Also uncertain if all import errors are related to the
backports packages, or just fail in general, but we should at least check
them out.

Agreed. I will automate and review them all now.

Others? I think the last one is a valid reason to cancel rc2 and make rc3.
What do you think?

J.
-- 

Jarek Potiuk
Polidea | Principal Software Engineer

M: +48 660 796 129

Reply via email to