They are, but we _can_ make a choice to remove that feature -- it is not
widely used and is confusing to many when they stumble on it.  

It's not something we should do lightly, but it is a possibility.

I think I'm probably leaning towards the "ordinal" concept:

/connections/hdfs_default -> list of connections with that ID
/connections/hdfs_default/0 first connection of that type

Something like that.

On Apr 9 2020, at 2:31 pm, Shaw, Damian P.
<damian.sha...@credit-suisse.com> wrote:

> FYI if you look back at the thread "Re: [2.0 spring cleaning] Require
> unique conn_id" on 2019-04-14 you can see a message from Kevin Yang
> stating that this random choice of connections is a "feature" used to
> load balance connections in AirBnB. So users are relying on this behavior.
>  
>  
> -----Original Message-----
> From: Daniel (Daniel Lamblin ) [Data Infrastructure]
> <lamb...@coupang.com>  
> Sent: Wednesday, April 8, 2020 20:01
> To: dev@airflow.apache.org
> Subject: Re: API spec questions
>  
> Having been bit by accidentally having two connections by the same
> name or conn_id, I'd prefer if were made unique. In my experience
> there's little utility in having multiple connections by the same
> name. Tasks that use a connection do to fairly randomly choose one,
> rather they seem pretty consistent in using the one created earliest,
> which often has the lower id integer.
>  
> Circling back to how this is used by the API, from a user perspective,
> the following in path integer fields were ones I'd have expected to be
> strings instead:
> ConnectionID 
> https://github.com/apache/airflow/blob/11f1e0cad996d5596e3e4fb440eb4ec52c024f70/openapi.yaml#L1845
> DAGID 
> https://github.com/apache/airflow/blob/11f1e0cad996d5596e3e4fb440eb4ec52c024f70/openapi.yaml#L1853
> PoolID 
> https://github.com/apache/airflow/blob/11f1e0cad996d5596e3e4fb440eb4ec52c024f70/openapi.yaml#L1901
>  
> Though it’s a url-encoding hassle, I also expected that DAGRunID would
> be more like the Run ID E.G.
> "scheduled__2020-04-08T23:10:00+00:00",
> "manual__2020-04-08T23:00:56+00:00",  
> "manual__2020-04-08T16:24:56.692868+00:00" ,
> "backfill_2020-04-08T22:05:00+00:00" etc
>  
> Then TaskID is confusing to me; AFAIK the PK to task instances are
> task_id, dag_id, and execution_date and the api call appears to align
> with that having the pattern:
> /dags/{dag_id}/taskInstances/{task_id}/{execution_date}:
> But if task_id is a numbered id, then… execution_date isn't even
> needed… I'm thinking it should have been a string.
>  
> An aside to this, I've always wondered what happens if an externally
> triggered DAG Run has the same execution date as a pre-existing
> scheduled DAG Run. They'd have different run_ids, EG
> "scheduled__2020-04-08T23:10:00+00:00" vs
> "manual__2020-04-08T23:10:00+00:00" but then task instances for those
> runs might not be unique.
>  
> Lastly, the UI and CLI operation of clearing tasks seems analogous to
> the delete task instance API end point. But probably it's not, and
> this could become confusing to users.
> There should be a non-db-model api call for clearing tasks like you
> can from the UI and the CLI. If I read it right, clearing does the
> following: it sets the state of the task instance to None unless the
> state was Running then instead it sets it and related job ids to
> Shutdown. It deletes reschedules of the TI and it sets the dag runs
> for those task instances back to Running.
> This would be a lot to do for a user using PATCH calls to change Task
> Instances and Dag Runs together (and there's no API for Jobs).
>  
> -Daniel L.
> P.S. as far as renaming all parameters on operators and hooks with
> *_conn_id, I do not want to see that breaking change happen. But IF it
> HAS TO, I'm still of the opinion that the default_args use for
> X_conn_id is not preferable to having all operators take simpler
> source_conn_id and optional target_conn_id parameter names that are
> consistent across all operators.
>  
> On 4/8/20, 9:47 AM, "Ash Berlin-Taylor" <a...@apache.org> wrote:
>  
>    [Warning]: This email originated from an external source. Do not
> open links or attachments unless you know the content is safe.
>    [경고]: 본 이메일은 회사 외부에서 유입되었습니다. 내용이 안전한지 확인하기 전까지는 링크나 첨부파일을 열지 마십시오.
>     
>     
>    To expand on the "so I think we need to do one of":
>     
>     
>    - we need to update the name of "conn_id" somehow. I can't think of a
>    better option, and given all the operators have "x_conn_id" I don't
>    think that change should be made lightly.
>     
>    - make conn_id unique (this "poor" HA has been a source of
> confusion in
>    the past) and the ID we use in the API
>     
>    Or a third option:
>     
>    - Have the API take conn_id and either return all conns for that
>    conn_id, or conn_id plus an ordinal (?idx=0 type thing) to return a
>    single value.
>     
>    On Apr 8 2020, at 5:42 pm, Ash Berlin-Taylor <a...@apache.org> wrote:
>     
>    > Hi everyone,
>    >
>    > So as I mentioned in the AIP voting thread, I think we need to give
>    > some more thought to how we are exposing connection ids in the API.
>    >
>    > Right now as proposed (and merged without approval, not cool. The AIP
>    > we voted on did not contain a PR against apache/airflow.) it has an
>    > end point of `/connections/{connection_id} `  
>    >
>    > My issue here is as I said in the previous thread: that is going
> to be
>    > mightly confusing to our users because there is a "conn_id" concept
>    > that is exposed, so people are going to try putting "aws_default" etc
>    > in there. I don't care what the API spec says, I care what our users
>    > expect, and having a connection_id/id and a conn_id fields is
> just confusing
>    >
>    > So I think we need to do one of:
>    > - we need to update the name of "conn_id" somehow, make conn_id unique
>    > (this "poor" HA has been a source of confusion in the past)
>    >
>    > There are similar problems for the DAG run -- the spec has the
> type as
>    > an integer, but everything else about the Airflow system uses the
>    > (unique) run_id parameter, and I would expect the API to use
> that. The
>    > autoinc. id on the run column is literally never used in the code
>    > base, so exposing that via the API seems odd.
>    >
>    > A few other smaller points:
>    >
>    > EventLog: Those are "audit"/action logs, so we probably shouldn't let
>    > people delete them via the API!
>    >
>    > pool_id: still an integer. It should be the "name".
>    >
>    > How should add/delete variables and connections work in the API  with
>    > the addition of the new "Secrets Backends"?
>    >
>    > xcomValues: task_id is listed as an integer.
>    >
>    >
>    > -ash
>     
>  
>  
>  
>  
>
=============================================================================== 
 
> Please access the attached hyperlink for an important electronic
> communications disclaimer:  
> http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html  
>
=============================================================================== 
 
>

Reply via email to