Hello, This can cause big problems with idempotence. According to RFC-7231, the DELETE method should be idempotent.
For example: If you want to delete items with index from 1 to 4, you should set the following request DELETE /connections/hdfs_default/4 DELETE /connections/hdfs_default/3 DELETE /connections/hdfs_default/2 DELETE /connections/hdfs_default/1 However, if the user sends these requests in a different order, they will only delete the first and third items. DELETE /connections/hdfs_default/1 DELETE /connections/hdfs_default/2 DELETE /connections/hdfs_default/3 DELETE /connections/hdfs_default/4 If you use asynchronous HTTP clients (a popular in Node), the order of requests will not be kept. It will also be a big problem with simultaneous modifications. I am also in favor of abandoning support for many connections. This can be solved on a different layer. Best regards, Kamil On Thu, Apr 9, 2020 at 6:47 PM Ash Berlin-Taylor <a...@apache.org> wrote: > > They are, but we _can_ make a choice to remove that feature -- it is not > widely used and is confusing to many when they stumble on it. > > It's not something we should do lightly, but it is a possibility. > > I think I'm probably leaning towards the "ordinal" concept: > > /connections/hdfs_default -> list of connections with that ID > /connections/hdfs_default/0 first connection of that type > > Something like that. > > On Apr 9 2020, at 2:31 pm, Shaw, Damian P. > <damian.sha...@credit-suisse.com> wrote: > > > FYI if you look back at the thread "Re: [2.0 spring cleaning] Require > > unique conn_id" on 2019-04-14 you can see a message from Kevin Yang > > stating that this random choice of connections is a "feature" used to > > load balance connections in AirBnB. So users are relying on this behavior. > > > > > > -----Original Message----- > > From: Daniel (Daniel Lamblin ) [Data Infrastructure] > > <lamb...@coupang.com> > > Sent: Wednesday, April 8, 2020 20:01 > > To: dev@airflow.apache.org > > Subject: Re: API spec questions > > > > Having been bit by accidentally having two connections by the same > > name or conn_id, I'd prefer if were made unique. In my experience > > there's little utility in having multiple connections by the same > > name. Tasks that use a connection do to fairly randomly choose one, > > rather they seem pretty consistent in using the one created earliest, > > which often has the lower id integer. > > > > Circling back to how this is used by the API, from a user perspective, > > the following in path integer fields were ones I'd have expected to be > > strings instead: > > ConnectionID > > https://github.com/apache/airflow/blob/11f1e0cad996d5596e3e4fb440eb4ec52c024f70/openapi.yaml#L1845 > > DAGID > > https://github.com/apache/airflow/blob/11f1e0cad996d5596e3e4fb440eb4ec52c024f70/openapi.yaml#L1853 > > PoolID > > https://github.com/apache/airflow/blob/11f1e0cad996d5596e3e4fb440eb4ec52c024f70/openapi.yaml#L1901 > > > > Though it’s a url-encoding hassle, I also expected that DAGRunID would > > be more like the Run ID E.G. > > "scheduled__2020-04-08T23:10:00+00:00", > > "manual__2020-04-08T23:00:56+00:00", > > "manual__2020-04-08T16:24:56.692868+00:00" , > > "backfill_2020-04-08T22:05:00+00:00" etc > > > > Then TaskID is confusing to me; AFAIK the PK to task instances are > > task_id, dag_id, and execution_date and the api call appears to align > > with that having the pattern: > > /dags/{dag_id}/taskInstances/{task_id}/{execution_date}: > > But if task_id is a numbered id, then… execution_date isn't even > > needed… I'm thinking it should have been a string. > > > > An aside to this, I've always wondered what happens if an externally > > triggered DAG Run has the same execution date as a pre-existing > > scheduled DAG Run. They'd have different run_ids, EG > > "scheduled__2020-04-08T23:10:00+00:00" vs > > "manual__2020-04-08T23:10:00+00:00" but then task instances for those > > runs might not be unique. > > > > Lastly, the UI and CLI operation of clearing tasks seems analogous to > > the delete task instance API end point. But probably it's not, and > > this could become confusing to users. > > There should be a non-db-model api call for clearing tasks like you > > can from the UI and the CLI. If I read it right, clearing does the > > following: it sets the state of the task instance to None unless the > > state was Running then instead it sets it and related job ids to > > Shutdown. It deletes reschedules of the TI and it sets the dag runs > > for those task instances back to Running. > > This would be a lot to do for a user using PATCH calls to change Task > > Instances and Dag Runs together (and there's no API for Jobs). > > > > -Daniel L. > > P.S. as far as renaming all parameters on operators and hooks with > > *_conn_id, I do not want to see that breaking change happen. But IF it > > HAS TO, I'm still of the opinion that the default_args use for > > X_conn_id is not preferable to having all operators take simpler > > source_conn_id and optional target_conn_id parameter names that are > > consistent across all operators. > > > > On 4/8/20, 9:47 AM, "Ash Berlin-Taylor" <a...@apache.org> wrote: > > > > [Warning]: This email originated from an external source. Do not > > open links or attachments unless you know the content is safe. > > [경고]: 본 이메일은 회사 외부에서 유입되었습니다. 내용이 안전한지 확인하기 전까지는 링크나 첨부파일을 열지 마십시오. > > > > > > To expand on the "so I think we need to do one of": > > > > > > - we need to update the name of "conn_id" somehow. I can't think of a > > better option, and given all the operators have "x_conn_id" I don't > > think that change should be made lightly. > > > > - make conn_id unique (this "poor" HA has been a source of > > confusion in > > the past) and the ID we use in the API > > > > Or a third option: > > > > - Have the API take conn_id and either return all conns for that > > conn_id, or conn_id plus an ordinal (?idx=0 type thing) to return a > > single value. > > > > On Apr 8 2020, at 5:42 pm, Ash Berlin-Taylor <a...@apache.org> wrote: > > > > > Hi everyone, > > > > > > So as I mentioned in the AIP voting thread, I think we need to give > > > some more thought to how we are exposing connection ids in the API. > > > > > > Right now as proposed (and merged without approval, not cool. The AIP > > > we voted on did not contain a PR against apache/airflow.) it has an > > > end point of `/connections/{connection_id} ` > > > > > > My issue here is as I said in the previous thread: that is going > > to be > > > mightly confusing to our users because there is a "conn_id" concept > > > that is exposed, so people are going to try putting "aws_default" etc > > > in there. I don't care what the API spec says, I care what our users > > > expect, and having a connection_id/id and a conn_id fields is > > just confusing > > > > > > So I think we need to do one of: > > > - we need to update the name of "conn_id" somehow, make conn_id unique > > > (this "poor" HA has been a source of confusion in the past) > > > > > > There are similar problems for the DAG run -- the spec has the > > type as > > > an integer, but everything else about the Airflow system uses the > > > (unique) run_id parameter, and I would expect the API to use > > that. The > > > autoinc. id on the run column is literally never used in the code > > > base, so exposing that via the API seems odd. > > > > > > A few other smaller points: > > > > > > EventLog: Those are "audit"/action logs, so we probably shouldn't let > > > people delete them via the API! > > > > > > pool_id: still an integer. It should be the "name". > > > > > > How should add/delete variables and connections work in the API with > > > the addition of the new "Secrets Backends"? > > > > > > xcomValues: task_id is listed as an integer. > > > > > > > > > -ash > > > > > > > > > > > > > =============================================================================== > > Please access the attached hyperlink for an important electronic > > communications disclaimer: > > http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html > > > =============================================================================== > >