Ash is totally right - that's exactly the difficulty we face. Airflow is
both a library and end product and this makes the usual advice (pin if you
are end-product, don't pin if you are library) not really useful. From the
very beginning of my adventures with Airflow I was for pinning of
everything (and using dependabot or similar - I use it for other projects),
but over time I realised that this is very short-sighted approach. it does
not take into account the "library" point of view.. Depending which user of
airflow you are, you have contradicting requirements. If you are user of
airflow who just want to use it as "end product" you want pinning. If you
want to develop your own operators or extend existing ones - you use
airflow as "library" and you do not want pinning.

I also proposed at the beginning of that thread that we split core
requirements (and pin it) and non-core ones (and don't pin it). But it
ain't easy to separate those two sets in a clear way unfortunately.

That's why the idea of choosing at the installation time (and not at build
time) whether you want to install "loose" or "frozen" dependencies is so
appealing.
Possibly the best solution could be that you 'pip install airflow' and you
get the pinned versions and then some other way to get the loose one. But I
think we are a bit on the mercy of pip - this does not seem to be possible.

Then - it looks like using extras to add "pinning" mechanism is the next
best idea.

I am not afraid about complexity. We can fully automate generating those
pinned requirements. I already have some ideas how we can make sure that we
keep those requirements in sync while developing and how they can end up
frozen in release. I would like to run a POC on that but in short it is
another "by-product" of the CI image we have now. Our CI image is the
perfect source of frozen requirements - we know those requirements in CI
image are ok and we can use them to generate the standard
"requirements.txt" file and keep it updated via some local update script
(and pre-commit hooks) + we can verify that they are updated in the CI. We
can then write custom setup.py that will use that requirements.txt and the
existing "extras" and generate "pinned" extras automatically. That sounds
like fully doable and with very limited maintenance effort.

J.

On Thu, Aug 1, 2019 at 10:45 PM Qingping Hou <[email protected]> wrote:

> On Thu, Aug 1, 2019 at 1:33 PM Chen Tong <[email protected]> wrote:
> > It is sometimes hard to distinguish if it is a library or an application.
> > Take operator as an example, a non-tech people may think it as a
> well-built
> > application while an engineer may consider it as a library and extends
> > functionalities on it.
> >
>
> Yeah, I agree. Personally, I would consider operator to be a library
> due to the expectation that other people will import them in their own
> projects/source tree.
>
> For things like REST endpoint handlers and perhaps scheduler, it seems
> safe to assume all changes and improvements will happen within Airflow
> source tree. In that case, it's safe to classify that part of code as
> application and freeze all its dependencies. The current plugin system
> might make this slightly complicated because people can still extend
> the core with custom code. But again, in an ideal world, plugins
> should be self-contained and communicating with core through a well
> defined interface ;)
>


-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Reply via email to