I'm glad the general mood is that connection_id should be unique.
FWIW when I had multiple connections in v1.8.2 with mysql I didn't seem to be 
getting any randomized loadbalancing anyway. Then again, maybe random was just 
100 selections of 1 over 2.
There are many other ways to load balance connections, each specific to the 
service type, so I don't see it Airflow's place to provide a semi-generic 
option to do it.

+1 for connection ID being unique.

Pardon outlook for changing links to the ConnectionID, DagID and PoolID being 
integers in a version of the API.
Are we past that decision already; I'd expect to use string names.

I'd also asked about DAG run ID, task ID, and finally whether there'd be an 
endpoint with which to clear tasks, because crud operations don't model the 
interplay of task instance, jobs, and dag run state involved.
-Daniel

On 4/14/20, 8:49 AM, "Xinbin Huang" <bin.huan...@gmail.com> wrote:

    [Warning]: This email originated from an external source. Do not open links 
or attachments unless you know the content is safe.
    [경고]: 본 이메일은 회사 외부에서 유입되었습니다. 내용이 안전한지 확인하기 전까지는 링크나 첨부파일을 열지 마십시오.
    
    
    +1 on making connection IDs unique. It's confusing to have Airflow handled
    load balancing here.
    
    On Tue, Apr 14, 2020, 4:59 AM Kaxil Naik <kaxiln...@gmail.com> wrote:
    
    > +1 to make connection ids unique
    >
    > On Tue, Apr 14, 2020 at 11:59 AM Jarek Potiuk <jarek.pot...@polidea.com>
    > wrote:
    >
    > > I am also supporting converting the connection to be unique.
    > >
    > > I've worked with similar approach long time ago (10s of years) and it 
was
    > > fine then where we have not yet figured out how to scale client/server
    > > architecture and we did not have all the nice infrastructure like
    > > load/balancing, cloud services etc. I believe in most of the current
    > > services/systems load-balancing should be handled by the service itself
    > or
    > > some kind of proxy between - not by the client side - in production
    > > environments.
    > >
    > > It's far more robust and might provide much better control (you can
    > control
    > > multiple independent client's connection and do not rely on random
    > > distribution of connections). There are scenarios that would not work
    > well
    > > in this case - for example physical proximity of the workers, current
    > load
    > > on different services etc. etc. It is very limiting to rely on this
    > > feature.
    > >
    > > On Tue, Apr 14, 2020 at 11:36 AM Kamil Breguła <
    > kamil.breg...@polidea.com>
    > > wrote:
    > >
    > > > Hello,
    > > >
    > > > This can cause big problems with idempotence. According to RFC-7231,
    > > > the DELETE method should be idempotent.
    > > >
    > > > For example:
    > > > If you want to delete items with index from 1 to 4, you should set the
    > > > following request
    > > > DELETE /connections/hdfs_default/4
    > > > DELETE /connections/hdfs_default/3
    > > > DELETE /connections/hdfs_default/2
    > > > DELETE /connections/hdfs_default/1
    > > >
    > > > However, if the user sends these requests in a different order, they
    > > > will only delete the first and third items.
    > > > DELETE /connections/hdfs_default/1
    > > > DELETE /connections/hdfs_default/2
    > > > DELETE /connections/hdfs_default/3
    > > > DELETE /connections/hdfs_default/4
    > > >
    > > > If you use asynchronous HTTP clients (a popular in Node), the order of
    > > > requests will not be kept. It will also be a big problem with
    > > > simultaneous modifications.
    > > >
    > > > I am also in favor of abandoning support for many connections. This
    > > > can be solved on a different layer.
    > > >
    > > > Best regards,
    > > > Kamil
    > > >
    > > >
    > > > On Thu, Apr 9, 2020 at 6:47 PM Ash Berlin-Taylor <a...@apache.org>
    > wrote:
    > > > >
    > > > > They are, but we _can_ make a choice to remove that feature -- it is
    > > not
    > > > > widely used and is confusing to many when they stumble on it.
    > > > >
    > > > > It's not something we should do lightly, but it is a possibility.
    > > > >
    > > > > I think I'm probably leaning towards the "ordinal" concept:
    > > > >
    > > > > /connections/hdfs_default -> list of connections with that ID
    > > > > /connections/hdfs_default/0 first connection of that type
    > > > >
    > > > > Something like that.
    > > > >
    > > > > On Apr 9 2020, at 2:31 pm, Shaw, Damian P.
    > > > > <damian.sha...@credit-suisse.com> wrote:
    > > > >
    > > > > > FYI if you look back at the thread "Re: [2.0 spring cleaning]
    > Require
    > > > > > unique conn_id" on 2019-04-14 you can see a message from Kevin 
Yang
    > > > > > stating that this random choice of connections is a "feature" used
    > to
    > > > > > load balance connections in AirBnB. So users are relying on this
    > > > behavior.
    > > > > >
    > > > > >
    > > > > > -----Original Message-----
    > > > > > From: Daniel (Daniel Lamblin ) [Data Infrastructure]
    > > > > > <lamb...@coupang.com>
    > > > > > Sent: Wednesday, April 8, 2020 20:01
    > > > > > To: dev@airflow.apache.org
    > > > > > Subject: Re: API spec questions
    > > > > >
    > > > > > Having been bit by accidentally having two connections by the same
    > > > > > name or conn_id, I'd prefer if were made unique. In my experience
    > > > > > there's little utility in having multiple connections by the same
    > > > > > name. Tasks that use a connection do to fairly randomly choose 
one,
    > > > > > rather they seem pretty consistent in using the one created
    > earliest,
    > > > > > which often has the lower id integer.
    > > > > >
    > > > > > Circling back to how this is used by the API, from a user
    > > perspective,
    > > > > > the following in path integer fields were ones I'd have expected 
to
    > > be
    > > > > > strings instead:
    > > > > > ConnectionID
    > > >
    > >
    > 
https://kor01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fairflow%2Fblob%2F11f1e0cad996d5596e3e4fb440eb4ec52c024f70%2Fopenapi.yaml%23L1845&amp;data=02%7C01%7Clamblin%40coupang.com%7Cd8716c2432c54849f3d708d7e08b68f2%7Ce3098f96361b47c6a9f4ab7bafcaffe9%7C0%7C0%7C637224761681223746&amp;sdata=ur3AAggOlQ7BskKBxQxRq0i%2FS36Ol72c7H7ddwXo5C4%3D&amp;reserved=0
    > > > > > DAGID
    > > >
    > >
    > 
https://kor01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fairflow%2Fblob%2F11f1e0cad996d5596e3e4fb440eb4ec52c024f70%2Fopenapi.yaml%23L1853&amp;data=02%7C01%7Clamblin%40coupang.com%7Cd8716c2432c54849f3d708d7e08b68f2%7Ce3098f96361b47c6a9f4ab7bafcaffe9%7C0%7C0%7C637224761681223746&amp;sdata=vyhFiV6Fd3pnVvlAoxnscSvvcy2NX%2BMASGmuHKu%2Fy84%3D&amp;reserved=0
    > > > > > PoolID
    > > >
    > >
    > 
https://kor01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fairflow%2Fblob%2F11f1e0cad996d5596e3e4fb440eb4ec52c024f70%2Fopenapi.yaml%23L1901&amp;data=02%7C01%7Clamblin%40coupang.com%7Cd8716c2432c54849f3d708d7e08b68f2%7Ce3098f96361b47c6a9f4ab7bafcaffe9%7C0%7C0%7C637224761681223746&amp;sdata=4Fog92AZju0%2BN44FoBdU4fj2DwhgCbUfKYUbdjeWpgo%3D&amp;reserved=0
    > > > > >
    > > > > > Though it’s a url-encoding hassle, I also expected that DAGRunID
    > > would
    > > > > > be more like the Run ID E.G.
    > > > > > "scheduled__2020-04-08T23:10:00+00:00",
    > > > > > "manual__2020-04-08T23:00:56+00:00",
    > > > > > "manual__2020-04-08T16:24:56.692868+00:00" ,
    > > > > > "backfill_2020-04-08T22:05:00+00:00" etc
    > > > > >
    > > > > > Then TaskID is confusing to me; AFAIK the PK to task instances are
    > > > > > task_id, dag_id, and execution_date and the api call appears to
    > align
    > > > > > with that having the pattern:
    > > > > > /dags/{dag_id}/taskInstances/{task_id}/{execution_date}:
    > > > > > But if task_id is a numbered id, then… execution_date isn't even
    > > > > > needed… I'm thinking it should have been a string.
    > > > > >
    > > > > > An aside to this, I've always wondered what happens if an
    > externally
    > > > > > triggered DAG Run has the same execution date as a pre-existing
    > > > > > scheduled DAG Run. They'd have different run_ids, EG
    > > > > > "scheduled__2020-04-08T23:10:00+00:00" vs
    > > > > > "manual__2020-04-08T23:10:00+00:00" but then task instances for
    > those
    > > > > > runs might not be unique.
    > > > > >
    > > > > > Lastly, the UI and CLI operation of clearing tasks seems analogous
    > to
    > > > > > the delete task instance API end point. But probably it's not, and
    > > > > > this could become confusing to users.
    > > > > > There should be a non-db-model api call for clearing tasks like 
you
    > > > > > can from the UI and the CLI. If I read it right, clearing does the
    > > > > > following: it sets the state of the task instance to None unless
    > the
    > > > > > state was Running then instead it sets it and related job ids to
    > > > > > Shutdown. It deletes reschedules of the TI and it sets the dag 
runs
    > > > > > for those task instances back to Running.
    > > > > > This would be a lot to do for a user using PATCH calls to change
    > Task
    > > > > > Instances and Dag Runs together (and there's no API for Jobs).
    > > > > >
    > > > > > -Daniel L.
    > > > > > P.S. as far as renaming all parameters on operators and hooks with
    > > > > > *_conn_id, I do not want to see that breaking change happen. But 
IF
    > > it
    > > > > > HAS TO, I'm still of the opinion that the default_args use for
    > > > > > X_conn_id is not preferable to having all operators take simpler
    > > > > > source_conn_id and optional target_conn_id parameter names that 
are
    > > > > > consistent across all operators.
    > > > > >
    > > > > > On 4/8/20, 9:47 AM, "Ash Berlin-Taylor" <a...@apache.org> wrote:
    > > > > >
    > > > > >    [Warning]: This email originated from an external source. Do 
not
    > > > > > open links or attachments unless you know the content is safe.
    > > > > >    [경고]: 본 이메일은 회사 외부에서 유입되었습니다. 내용이 안전한지 확인하기 전까지는 링크나 첨부파일을 열지
    > > 마십시오.
    > > > > >
    > > > > >
    > > > > >    To expand on the "so I think we need to do one of":
    > > > > >
    > > > > >
    > > > > >    - we need to update the name of "conn_id" somehow. I can't 
think
    > > of
    > > > a
    > > > > >    better option, and given all the operators have "x_conn_id" I
    > > don't
    > > > > >    think that change should be made lightly.
    > > > > >
    > > > > >    - make conn_id unique (this "poor" HA has been a source of
    > > > > > confusion in
    > > > > >    the past) and the ID we use in the API
    > > > > >
    > > > > >    Or a third option:
    > > > > >
    > > > > >    - Have the API take conn_id and either return all conns for 
that
    > > > > >    conn_id, or conn_id plus an ordinal (?idx=0 type thing) to
    > return
    > > a
    > > > > >    single value.
    > > > > >
    > > > > >    On Apr 8 2020, at 5:42 pm, Ash Berlin-Taylor <a...@apache.org>
    > > > wrote:
    > > > > >
    > > > > >    > Hi everyone,
    > > > > >    >
    > > > > >    > So as I mentioned in the AIP voting thread, I think we need 
to
    > > > give
    > > > > >    > some more thought to how we are exposing connection ids in 
the
    > > > API.
    > > > > >    >
    > > > > >    > Right now as proposed (and merged without approval, not cool.
    > > The
    > > > AIP
    > > > > >    > we voted on did not contain a PR against apache/airflow.) it
    > has
    > > > an
    > > > > >    > end point of `/connections/{connection_id} `
    > > > > >    >
    > > > > >    > My issue here is as I said in the previous thread: that is
    > going
    > > > > > to be
    > > > > >    > mightly confusing to our users because there is a "conn_id"
    > > > concept
    > > > > >    > that is exposed, so people are going to try putting
    > > "aws_default"
    > > > etc
    > > > > >    > in there. I don't care what the API spec says, I care what 
our
    > > > users
    > > > > >    > expect, and having a connection_id/id and a conn_id fields is
    > > > > > just confusing
    > > > > >    >
    > > > > >    > So I think we need to do one of:
    > > > > >    > - we need to update the name of "conn_id" somehow, make
    > conn_id
    > > > unique
    > > > > >    > (this "poor" HA has been a source of confusion in the past)
    > > > > >    >
    > > > > >    > There are similar problems for the DAG run -- the spec has 
the
    > > > > > type as
    > > > > >    > an integer, but everything else about the Airflow system uses
    > > the
    > > > > >    > (unique) run_id parameter, and I would expect the API to use
    > > > > > that. The
    > > > > >    > autoinc. id on the run column is literally never used in the
    > > code
    > > > > >    > base, so exposing that via the API seems odd.
    > > > > >    >
    > > > > >    > A few other smaller points:
    > > > > >    >
    > > > > >    > EventLog: Those are "audit"/action logs, so we probably
    > > shouldn't
    > > > let
    > > > > >    > people delete them via the API!
    > > > > >    >
    > > > > >    > pool_id: still an integer. It should be the "name".
    > > > > >    >
    > > > > >    > How should add/delete variables and connections work in the
    > API
    > > > with
    > > > > >    > the addition of the new "Secrets Backends"?
    > > > > >    >
    > > > > >    > xcomValues: task_id is listed as an integer.
    > > > > >    >
    > > > > >    >
    > > > > >    > -ash
    > > > > >
    > > > > >
    > > > > >
    > > > > >
    > > > > >
    > > > > >
    > > > >
    > > >
    > >
    > 
===============================================================================
    > > > > > Please access the attached hyperlink for an important electronic
    > > > > > communications disclaimer:
    > > > > > 
https://kor01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.credit-suisse.com%2Flegal%2Fen%2Fdisclaimer_email_ib.html&amp;data=02%7C01%7Clamblin%40coupang.com%7Cd8716c2432c54849f3d708d7e08b68f2%7Ce3098f96361b47c6a9f4ab7bafcaffe9%7C0%7C0%7C637224761681223746&amp;sdata=6e9Ck952oFmXDkoDYnM0XgCWsplYb7Dua37BkAPt22A%3D&amp;reserved=0
    > > > > >
    > > > >
    > > >
    > >
    > 
===============================================================================
    > > > > >
    > > >
    > >
    > >
    > > --
    > >
    > > Jarek Potiuk
    > > Polidea 
<https://kor01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.polidea.com%2F&amp;data=02%7C01%7Clamblin%40coupang.com%7Cd8716c2432c54849f3d708d7e08b68f2%7Ce3098f96361b47c6a9f4ab7bafcaffe9%7C0%7C0%7C637224761681223746&amp;sdata=tMOZFKjLGoXh9eAVW5rXhmJeJw8kYhlWAfwFICN%2Fpfw%3D&amp;reserved=0>
 | Principal Software Engineer
    > >
    > > M: +48 660 796 129 <+48660796129>
    > > [image: Polidea] 
<https://kor01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.polidea.com%2F&amp;data=02%7C01%7Clamblin%40coupang.com%7Cd8716c2432c54849f3d708d7e08b68f2%7Ce3098f96361b47c6a9f4ab7bafcaffe9%7C0%7C0%7C637224761681223746&amp;sdata=tMOZFKjLGoXh9eAVW5rXhmJeJw8kYhlWAfwFICN%2Fpfw%3D&amp;reserved=0>
    > >
    >
    

Reply via email to