The downside of keeping everything in one gigantic codebase is that it also becomes a monster in terms of dependencies and testing - something that Airflow is already experiencing issues with. This is exactly also why I initially had troubles contributing to Airflow, as tests are near impossible to run (and the CI also takes ages) + any hooks I tried to refactor had crappy sensor/operator classes in contrib that relied on internal workings of the hook. (The HdfsHook + Sensor classes are a clear example of this).
My experience with contrib is terrible, as the last time I used something from contrib it broke on the very first example I tried. (Not the first time this happened, mind you.) Things that do work are often poorly designed in terms of the implementation and/or API, of which I would argue the HDFS sensor classes are a good example. Based on these experiences, I tried developing a set of operators/hooks that we developed for a client as a separate package and released this work recently on PyPI: https://github.com/jrderuiter/airflow-fs. From the development experience, it was much easier to develop something like this separately than trying to get this integrated into Airflow (just look at the relevant PRs to see the discussions that resulted from trying to get it into Airflow core). With maybe a bit more thought on how to organise work such as this within the larger community, I think releasing operators/hooks separately in coherent, functionally-oriented packages would have many benefits, including: - easier development - more flexibility to use different approaches/interaces for the task at hand - the opportunity to release new operators etc separate from the main Airflow release cycle Best, Julian On 14 Apr 2019, at 21:48, Austin Bennett <[email protected]<mailto:[email protected]>> wrote: It seems naming of contrib is used in practice with what I thought it would be (as a novice), but incubator not bad either -- yes, it gets messy if anything can get contributed. If using for that purpose, maybe there is some sort of timeframe for graduation or it winds up being removed? Indeed, contrib is a place with hooks/operators/etc, that might work well enough, but not wholly 'graduated'. I have wondered what to do about the promotion as was already mentioned, in terms of changing imports of dags. But I guess that get's fixed pretty easily when imports break (and can have contrib point to the graduated ones in certain places). Things being part of the airflow repo is nice and shows that the community overall supports. Very much in favor of that approach, rather than wondering whether some other pypi or other code is sufficient. If I am using Airflow, I then trust that community and what winds up in that codebase ( and just one dev list to follow to stay up to date ). On Sun, Apr 14, 2019 at 12:21 PM Driesprong, Fokko <[email protected]<mailto:[email protected]>> wrote: I'm in favor of removing the Contrib folder. It doesn't really add value in my opinion, and moving the hooks/operators will break the import. While DAG'ing I always have to look up if the operator is in contrib or not. Also, I think we should keep the operators and hooks part of the Airflow package. Having this separately will make the testing of the hooks/operators much more complicated. That being said, I do think we need to have more people on the project that "own" certain operators. Maybe keep a list of the authors as well (or reintroduce the Mention-bot https://github.com/facebookarchive/mention-bot ? Loved that bot). Cheers, Fokko Op za 13 apr. 2019 om 12:36 schreef Jarek Potiuk <[email protected]<mailto:[email protected]> : I think there are quite a few contrib parts that are at least on-par with regards to code quality, testing and especially documentation. And yes, among those are GCP operators we developed which are not only unit but also system-tested and we put quite an effort into making documentation really useful and well structured ;). I'd rather move those "graduated" operators/hooks to core and maybe rename the "contrib" folder to "incubator" or something like that to indicate that those operators are not yet "core-quality" but aspire to become one. That would make a nice "intro" task for new contributors - to improve one of the incubating operators to become "core-ready". I am quite sceptical myself about AIP-8 and separating out the hooks and operators. There were already few discussions about that, but splitting the operators out might be quite difficult and it will only be possible if there is some way to quickly test compatibility of those split operators with various versions of Airflow and set of dependent requirements.. Otherwise it will very quickly become a mess - nobody will know which version of Airflow is needed to run which operators and there will be problems if someone will try to run different operators with different requirements in the same DAG (and different versions of airflow). Until we manage to isolate operators within the same DAG to potentially use different dependencies, this is straight road to dependency-hell. One solution to that that I have in mind for some time (but this is very long term) might be to make Airflow Docker-native and run every operator within it's own separate Docker instance with it's own dependencies. That would be quite possible to do (we would need to split operators into very light "proxy" (basically current __init__() - the part that is executed within DAG scanning) and heavy "execute" parts (where the operator's execute-related methods would be run in separate Docker on workers). J. On Sat, Apr 13, 2019 at 11:59 AM Felix Uellendall < [email protected]<mailto:[email protected]> wrote: +1 on deprecating the contrib folder. Bolk de Bruin the reason the core hooks and operators are properly tested because, for example I added some more tests to it and I am "only" a contributor. So do you really want to split up contributors work and core committers work? I personally think this is not the right way to go. It is true that the contrib hooks and operators have not the same level of code quality but we can do something about it. I am trying to improve our test coverage overall and add missing tests. I don't think an extra step is needed here where we first move properly tested ones into the core package and then moving new ones from time to time into it. Wouldn't that mean that we think the code quality of "contrib" (contributor) in general is worse than the code quality of committers? Every new contributor who comes along this project would think that, wouldn't he? -feluelle Am 13/04/2019 um 07:51 schrieb Beau Barker: A separate airflow-contrib repo, on a separate release cadence would be my preference. On 12 Apr 2019, at 11:17 pm, Julian De Ruiter < [email protected]> wrote: Isn’t this in contradiction with AIP-8, which is aimed at removing operators/hooks from the core Airflow package? Personally I would rather remove hooks/operators from Airflow than add even more to the Airflow core. This counts double for the contrib stuff, which is often poorly designed and/or tested. Best, Julian On 12 Apr 2019, at 10:23, Bolke de Bruin <[email protected]> wrote: That’s perfectly fine to me. Verstuurd vanaf mijn iPad Op 12 apr. 2019 om 10:20 heeft Kaxil Naik <[email protected]> het volgende geschreven: Ok. How about moving the properly tested and maintained hooks/ops from contrib to core? On Fri, Apr 12, 2019, 09:13 Bolke de Bruin <[email protected]> wrote: I disagree. Core signals “properly tested” and maintained. Ie. A kind of quality. I don’t think contrib has that. Verstuurd vanaf mijn iPad Op 12 apr. 2019 om 10:03 heeft Kaxil Naik <[email protected]> het volgende geschreven: Contrib folder was used when it was used at Airbnb. Currently, it doesn't make any sense and we have equal responsibility to maintain all the hooks, operators, sensors in contrib folder as we do for core. I would suggest to remove contrib folder and move all hooks, ops, and sensors to the core folder. Or reorganize the folder structure similar to what was discussed in a mailing thread few months ago. Regards, Kaxil -- Jarek Potiuk Polidea <https://www.polidea.com/> | Principal Software Engineer M: +48 660 796 129 <+48660796129> E: [email protected]
