I like the idea. I already raised the issue so we could refactor all the Google Cloud operators together and at that time make sure they are consistent. So a different repo would be a good idea here. And you can manage your own dependencies. Would be cool that the same thing happens to the AWS operators.
On Sat, Feb 4, 2017 at 7:46 PM Jeremiah Lowin <jlo...@apache.org> wrote: > Max made some great points on my dataflow PR and I wanted to continue the > conversation here to make sure the conversation was visible to all. > > While I think my dataflow implementation contains the basic requirements > for any more complicated extension (but that conversation can wait!), I had > to implement it by adding some very specific "dataflow-only" code to core > Operator logic. In retrospect, that makes me pause (as, I believe, it did > for Max). > > After thinking for a few days, what I really want to do is propose a very > small change to core Airflow: change BaseOperator.post_execute(context) to > BaseOperator.post_execute(result, context). I think the pre_execute and > post_execute hooks have generally been an afterthought, but with that > change (which, I think, is reasonable in and of itself) I could implement > entirely through those hooks. > > So that brings me to my next point: if the hook is changed, I could happily > drop a reworked dataflow implementation into contrib, rather than core. > That would alleviate some of the pressure for Airflow to officially decide > whether it's the right implementation or not (it is! :) ). I feel like that > would be the optimal situation at the moment. > > And that brings me to my next point: the future of "contrib" and the > Airflow community. > Having contrib in the core Airflow repo has some advantages: > - standardized access > - centralized repository for PRs > - at least a style review (if not unit tests) from the committers > But some big disadvantages as well: > - Very complicated dependency management [presumably, most contrib > operators need to add an extras_require entry for their specific > dependencies] > - No sense of ownership or even an easy way to raise issues (due to > friction of opening JIRA tickets vs github issues) > > One thought is to move the contrib directory to its own repo which would > keep the advantages but remove the disadvantages from core Airflow. Another > is to encourage individual airflow repos (Airflow-Docker, Airflow-Dataflow, > Airflow-YourExtensionHere) which could be installed a la carte. That would > leave maintenance up to the original author, but could lead to some > fracturing in the community as discovery becomes difficult. > -- _/ _/ Alex Van Boxel