Update - It seems that we won't need the -pinned version eventually. I
realized that we need to have slightly different requirements for different
python versions.

I just added PR for that: https://github.com/apache/airflow/pull/7841

I also found out (during production image exercise) that we can install
airflow predictably in a very simple way (once we release the requirements
in 1.10.10):

pip install apache-airflow[gcp]==1.10.10  --constraint
https://raw.githubusercontent.com/apache/airflow/1.10.10/requirements/requirements-python3.7.txt

I think this is simple enough to be used as installation method. I added it
to the documentation and I think I am ok with dropping -pinned package
altogether.

J.


On Sun, Mar 22, 2020 at 10:15 AM Jarek Potiuk <jarek.pot...@polidea.com>
wrote:

> Yesterday we had another master breakage - this time from elasticsearch
> releasing MINOR version 7.6 breaking our builds (not it was MINOR version
> so should be compatible .... it was not for us). I fixed it quickly
> yesterday by limiting it to < 7.6 but for me - this is quite clear that
> trying to rely on SemVer being followed by others is a futile effort (at
> least in python's world).
>
> The theory is nice, but it breaks in practice. And it's not really a fault
> of the library maintainers. It's simply sometimes not so easy to see how
> your APIs are used - and in Python, you cannot prevent using stuff that you
> think is an internal detail. This is what happened in elasticsearch case
> yesterday - apparently, our plugin was using an "internal" API unknowingly
> and some parameters from that API were dropped during refactoring of
> elasticsearch library.
>
> My observation (it's anecdotal though) is that the COVID-19 situation made
> people have more time, fewer distractions, fewer things to do, and we
> have higher frequency of OSS packages being released recently so we
> should protect a bit from more often breakages.
>
> I think  learning from yesterday is:
>
> * we should merge the requirements.txt solution quickly to prevent further
> breakages (I am reading and testing it now) - I think everyone agrees it's
> good to have it
> * I think we can continue discussing whether apache-airflow-pinned package
> should be released or not. I can leave the code building the package but we
> can decide about it after some more discussion
>
> Does it sound good?
>
> J
>
>
>
> On Fri, Mar 20, 2020 at 2:47 PM Jarek Potiuk <jarek.pot...@polidea.com>
> wrote:
>
>> And rebased it right now and fixed automated requirements update.
>>
>> On Fri, Mar 20, 2020 at 2:28 PM Jarek Potiuk <jarek.pot...@polidea.com>
>> wrote:
>>
>>> Ah BTW. I just noticed that for some reason I pasted an old PR earlier
>>> in the thread :(.
>>> This is the one with requirements.txt I am talking about:
>>> https://github.com/apache/airflow/pull/7730
>>>
>>> On Fri, Mar 20, 2020 at 2:26 PM Jarek Potiuk <jarek.pot...@polidea.com>
>>> wrote:
>>>
>>>> Nope. Not blocking. I can work with my branch just requirements.txt is
>>>> enough for that :)
>>>>
>>>> I think the problem with semver is that it is loosely followed - we had
>>>> a number of breakages in the past with minor version upgrades :(.
>>>>
>>>> J.
>>>>
>>>>
>>>>
>>>> On Fri, Mar 20, 2020 at 1:27 PM Kaxil Naik <kaxiln...@gmail.com> wrote:
>>>>
>>>>> Thanks for the detailed explanation Jarek.
>>>>>
>>>>> How about we have an upper limit for all our dependencies, example
>>>>> instead
>>>>> of "google-cloud-storage>=1.16", we have
>>>>> "google-cloud-storage>=1.16,<2.0" ?
>>>>>
>>>>> If a dependency breaks compatibility in minor versions, we can't do
>>>>> anything about it but if they follow SemVer, we should be safe and the
>>>>> first-time installers would have a non-breaking package. WDYT?
>>>>>
>>>>> Btw I hope this is not blocking you in building a production image as I
>>>>> think requirements.txt is solving that? Please let me know if it is
>>>>> blocking.
>>>>>
>>>>> PS: I am also just dumping my ideas to solve this issue. Love to hear
>>>>> what
>>>>> others think too.
>>>>>
>>>>> Regards,
>>>>> Kaxil
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Mar 19, 2020 at 2:43 PM Jarek Potiuk <jarek.pot...@polidea.com
>>>>> >
>>>>> wrote:
>>>>>
>>>>> > I think we have similar understanding. But let me just clarify
>>>>> because I
>>>>> > think we think about we think about solving two different problems
>>>>> > My proposal is not solving all problems with dependencies - quite the
>>>>> > contrary, I want to solve just one specific "repeatability" problem
>>>>> - read
>>>>> > on :)..
>>>>> >
>>>>> >    1. A potential source of confusion: using "-pinned" for
>>>>> installation but
>>>>> > >    using "non-pinned" for DAG development.
>>>>> > >
>>>>> >
>>>>> > This could be confusing indeed - but they are the same in fact -
>>>>> > just deps might be different over time.
>>>>> >
>>>>> >    2. Most of the users would still try to install "apache-airflow"
>>>>> package
>>>>> > >    that might have been broken for example because of a dependency
>>>>> > release,
>>>>> > >    either way, we would still have to suggest them to use "pinned"
>>>>> > version
>>>>> > >
>>>>> >
>>>>> > True.  I thought we might describe it in the README and make it
>>>>> prominently
>>>>> > explained. Usually people look at the readme in PyPI when they are
>>>>> > installing
>>>>> > stuff and it does not work: https://pypi.org/project/apache-airflow/
>>>>> .
>>>>> >
>>>>> > Also - we could of course explain how to use requirements.txt from
>>>>> the
>>>>> > released
>>>>> > version when they are installing it. That would be an extra friction
>>>>> point
>>>>> > though
>>>>> > and maybe having "always installable" version of airflow is a better
>>>>> > choice.
>>>>> >
>>>>> >    3. If they install "pinned" version, it is no longer a library
>>>>> again,
>>>>> > >    that is users won't be able to use new NumPy release or
>>>>> matplotlib for
>>>>> > >    example. In which case we are just circling back to the same
>>>>> problem,
>>>>> > >    "either we risk broken package" while releasing or we risk
>>>>> potentially
>>>>> > >    incompatible versions.
>>>>> > >
>>>>> >
>>>>> > Yep. But maybe it's just a question of naming. Maybe even we could
>>>>> name
>>>>> > this package differently to indicate that this version is a way to
>>>>> quickly
>>>>> > install
>>>>> >  airflow but not to do any serious development with it.
>>>>> >
>>>>> > So speaking about THE problem I want to solve with the
>>>>> > requirements.txt and apache-airflow-pinned package:
>>>>> >
>>>>> > I really only want to solve "first-time-user" experience here -
>>>>> nothing
>>>>> > more. I
>>>>> > definitely do not want to replace the current installation method for
>>>>> > experienced
>>>>> > users - for them using --constraint requirements.txt is exactly what
>>>>> they
>>>>> > need.
>>>>> > The only problem I am trying to solve with that is "repeatability" of
>>>>> > installation.
>>>>> >
>>>>> > Maybe "apache-airflow-quickinstall" or something like that would be
>>>>> better
>>>>> > than "apache-airflow-pinned" or "apache-airflow-repeatable-install"
>>>>> or
>>>>> > something like that. I think about it as a "flavour" of ariflow
>>>>> rather than
>>>>> > anything else. I even originally implemented it as [pinned] extra
>>>>> where I
>>>>> > pinned all requirements. Unfortunately I found that if you have
>>>>> > main requirement without limits, adding the same requirement as
>>>>> extra with
>>>>> > == does not make it pinned :(.  That was my original plan.
>>>>> >
>>>>> >
>>>>> > > Btw I have been on "we should have pinned dependency" camp as
>>>>> Airflow
>>>>> > > should definitely install without breaking since day-1 but I think
>>>>> a
>>>>> > > separate "-pinned" package won't solve that issue.
>>>>> > >
>>>>> >
>>>>> > Ah yeah we went the same route. I do not think we can solve the
>>>>> > "library vs. app" problem easily. This is a bit of
>>>>> "eat-and-have-cake"
>>>>> > at the same time. I know people have problems
>>>>> > with conflicting dependencies when they are trying to install
>>>>> libraries
>>>>> > with different requirements. And I am not even trying to solve that
>>>>> > problem now. Not even close. This requires some other solution
>>>>> > (for example separate virtualenvs with different dependencies
>>>>> > build from wheels on per-task basis). But that's something much
>>>>> further
>>>>> > in the future (if at all).
>>>>> >
>>>>> >
>>>>> > >
>>>>> > > WDYT? Also please do let me know if I have misunderstood something
>>>>> > > (definitely possible :D).
>>>>> > >
>>>>> > > Regards,
>>>>> > > Kaxil
>>>>> >
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Jarek Potiuk
>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>>>
>>>> M: +48 660 796 129 <+48660796129>
>>>> [image: Polidea] <https://www.polidea.com/>
>>>>
>>>>
>>>
>>> --
>>>
>>> Jarek Potiuk
>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>>
>>> M: +48 660 796 129 <+48660796129>
>>> [image: Polidea] <https://www.polidea.com/>
>>>
>>>
>>
>> --
>>
>> Jarek Potiuk
>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>
>> M: +48 660 796 129 <+48660796129>
>> [image: Polidea] <https://www.polidea.com/>
>>
>>
>
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
>
>

-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Reply via email to