I was involved in the Github discussion about the rename to aws_conn_id, and it prompted me to write http://mail-archives.apache.org/mod_mbox/airflow-dev/201801.mbox/%3cCABYbY7dPS8X6Z4mgbahevQwF5BnYYHXezFo=avoLBNxPzp5=b...@mail.gmail.com%3e <http://mail-archives.apache.org/mod_mbox/airflow-dev/201801.mbox/%3CCABYbY7dPS8X6Z4mgbahevQwF5BnYYHXezFo=avoLBNxPzp5=b...@mail.gmail.com%3E> (well, the thing Chris is quoting, for some reason I can't find my original mail in the archives)
But I never got around to writing up the mentioned proposal. :( Does anyone have any further thoughts on the discussion? > On 30 May 2018, at 06:43, Maxime Beauchemin <maximebeauche...@gmail.com> > wrote: > > The main reason for the conn_id prefix is to facilitate the use of > `default_args`. Because of this you can set all your connections at the top > of your script and from that point on you just instantiate tasks without > re-stating connections. It's common for people to define multiple > "operating contexts" (production/staging/dev) where default_args and > connections are defined conditionally based on some env vars and having > different names for different conn_ids is key in that case. > > Also as you mentioned many operators (all data transfers) take 2 conn_ids > which would require prefixing anyways. > > And yes, minor releases should never invalidate DAGs except for rare > exceptions (say a new feature that is still pre-release, or never worked > properly in previous release for some reason). Breaking changes should come > with an UPDATE.md message aligned with the release. Pretty names doesn't > justify breaking stuff and forcing people to grep and mass replace things. > If someone wants a prettier name for an argument or anything else that's in > the "obviously-public API" (DAG objects, operators, setting deps, ...) they > should need to make the change backward compatible and start > logging.warning() about next major release deprecation. > > I also think 2.0 should be as mild as possible on backward incompatible > changes or come with a compatibility layer (from airflow import LegacyDAG > as DAG). No one wants to go and mass update tons of scripts. > > It should be the case too for the less-obviously public API (hooks, methods > not prefixed with `_` or `__`), though I think it may be reasonable in some > cases (say a method that really should have been defined as private) but > avoided as much as possible. > > *Committers*, let's be vigilant about this. Breaking backward compatibility > is a big deal. An important part of code review is identifying backward > incompatible changes. > > Max > > On Tue, May 29, 2018 at 6:19 PM Daniel (Daniel Lamblin) [BDP - Seoul] < > lamb...@coupang.com> wrote: > >> The short of this email is: can we please name all the connection id named >> parameters to all hooks and operators as similarly as possible. EG just >> `conn_id`? >> >> So, when we started using Airflow I _had_ thought that minor versions >> would be compatible for a user's DAG, assuming no use of anything marked >> beta or deprecated, such that v1.7.1, 1.8.0, 1.8.1, 1.8.2 and 1.9.0 etc >> would all run dags from prior versions in that linage, each with possible >> stability and feature improvements and each with possibly more operators, >> hooks, executors (even) etc. >> >> This is (now) obviously not the case, and it's the group's choice about >> what should and what should not break on a release-by-release basis. I >> think a clear policy would be appropriate for full Apache status, but then >> I may have missed where the policy is defined. Though, if defined as not >> giving stability to the user's dags for most version changes isn't in my >> opinion going to grow confidence for Airflow being something you can grow >> with. >> >> Not to be overly dramatic, but currently the tiny change that the >> `s3Hook(…)` takes `aws_conn_id='any_string'` vs >> `s3_conn_id='still_any_string'` means that basically I have to maintain a >> 1.8.2 setup in perpetuity, because no one (here) wants to briefly code >> freeze before during and after an update so that we can update all the uses >> and be ready to roll back the update if something else breaks (also some >> people access the not-actually-private `_a_key and _s_key` and would need >> to switch to still-not-private `_get_credentials()[…]`). Instead we'll just >> run a second airflow setup at 1.9.0 (in each and every staged environment) >> and move the 8k dags piecemeal whe[never] people get the spare time and >> inclination. I mean, we might. It looks like 1.10.0 changes some config >> around s3 logging one more time… so maybe it's better to skip it? >> >> I saw the couple of PRs where the project itself had to make the changes >> to usages of the named field. There was momentary and passing concern that >> users' dags would need to do the same. In the PRs, of the options discussed >> (shims, supporting the deprecated name as deprecated, doing a hard rename), >> it wasn't brought up if the rename to aws_conn_id was the best name. [Was >> this discussed elsewhere?] >> >> And so I wondered why is there this virtual Hungarian notation on all the >> `conn_id`s? >> A hook generally takes one `conn_id`, most operators take only one. In >> these cases couldn't the named parameter have been `conn_id`? When an >> operator needs a couple conn_ids, could it not have `src_conn_id` and >> `dest_conn_id` instead of locking in either s3 or aws, mysql or maria, >> dataflow or beam etc? Those are hypotheticals, I believe. >> >> Could I propose to plan to break absolutely all uses of >> `{named}_conn_id`s, before or by version 2.0, with an eye towards never >> again having to break a named parameter for the rest of 2.x's life? There's >> probably more named parameters that should be fixed per major release, and >> if you agree, some work should be done to identify them all, but this just >> happens to be the one that seems most likely to change often, and has so >> recently. >> >> Thanks, >> -Daniel >> >>