Re: API spec questions

Kamil Breguła Tue, 12 May 2020 08:43:03 -0700

GET /dags/DAG_A/dagRuns/2020-01-01/taskInstances
GET /dags/DAG_A/dagRuns/2020-01-01/taskInstances
On Tue, May 12, 2020 at 12:10 PM Ash Berlin-Taylor <[email protected]> wrote:
>
> Hi Kamil,
>
> Thanks for re-opening this discussion!
>
> My initial complaint about "putting multiple values in a single path
> element" is that it feels messy, and confusing, and very unlike any REST
> API I've consumed or written before.
>
> However I've just realised that I may have _wildy_ misunderstood the proposal.
>
> I (somehow!) thought you were suggesting this:
>
>   GET /dags/dag_id1-dag_id2-dag_id3/dagRuns/2020-01-01T00/taskInstances
>
> But on reading Google's https://aip.dev/159 that now makes more sense,
> and that isn't what you were suggesting, but instaed a single, litteral
> `-` to mean "any dag id". Is this correct?
>


Yes. In detail, it means that the SQL query will not have a condition.
/dag/{dag_Id}/taskInstanace
SELECT * FROM task_instance WHERE dag_id= {dag_id}
/dag/~/taskInstanace
SELECT * FROM task_instance

> So I think then my only ask is that we have a `dag_ids` parameter that
> we can use to filter on :)
>
>
>
>
> Additionally/related when selecting across multiple DAGs we would very
> quickly run in to that was fixed in github.com/apache/airflow/pull/7364
> -- where asking for all the task stats for the DAGs on a single page
> exceeded the HTTP request line limit.
>
> i.e. this could break if the DAG ids are large, or if they is a large
> number of dags to be fetched.
>
>   GET 
> /dags/-/dagRuns/2020-01-01T00/taskInstances?dag_ids=...&dag_ids=...&dag_ids=...
>
> How about this as an additional endpoint:
>
>   POST /dags/-/dagRuns/2020-01-01/taskInstances/list
>
>   dag_ids=...&dag_ids=...&dag_ids...
>
>
> (The /list suffix isn't needed here as there is no POST for
> taskInstances, but there is in other places, so making it explicit means
> we can use the same pattern for, say `POST /dags/-/dagRuns/list`)
>
>

We can add it, but I don't sure, I have the impression that this will
require a separate specification that will require the design of BATCH
API.

Currently, if you would like to download resources from several
collections, you can send many HTTP requests.
GET /dags/DAG_A/dagRuns/2020-01-01/taskInstances
GET /dags/DAG_B/dagRuns/2020-01-01/taskInstances
GET /dags/DAG_C/dagRuns/2020-01-01/taskInstances

I would like to focus on making all operations possible, but not
necessarily making the API efficient in all cases.

>
>
> If it is valid to pass just one of ExecutionDateStartRange and
> ExecutionDateEndRange could we call it ExecutionDateGTE etc. (for
> Greater-than-or-equal) so it makes more sense when a single parameter is 
> providerd.
>
>

I will change it.

>
>
> A minor style point: The filter/query string fields are all defined as
> BumpyCase, but I'd prefer us to use snake_case (`execution_date_gte`
> etc) - the only sort of applicable proposal I can find is https://aip.dev/140
>

I will change it.

> (As in I can't find anything in Google recs that talk about case for
> case of query string/filter params. I guess they suggest having it all
> as a single `?filter=` param :- https://aip.dev/160 )
>
The API guides discusses the latest recommendations for API design.
Google formerly used query parameters.
Example:
https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/list


> -ash
>
>
>
> On May 12 2020, at 12:12 am, Jarek Potiuk <[email protected]> wrote:
>
> > While I understand the benefits of the "clean"  "REST" API where resources
> > are identified by a nice hierarchy, I do not think it is necessary to
> > follow it for this kind of queries.I think the "resources" based approach
> > indeed breaks down when you want to make more complex queries and want to
> > do it efficiently, without running many API calls or returning superfluous
> > information. I think that's the reason where GraphQL shines compared to
> > REST-like APIs. I think in this case "purity" impacts performance a lot.
> >
> > I know we did not choose GraphQL (and for good reasons - it's
> > complexity is
> > not needed in our case) - but I think there are cases where we want
> > efficiency and we want to limit a number of requests or superfluous
> > information retrieved.
> >
> > But looking at the balance between purity and efficiency for asking about
> > "no matter which dag" or "no matter which executionDate"  - I think the
> > AIP-159 proposal strikes a good balance. It's pretty natural, easy to read
> > and does not break the structure.
> >
> > I would not use "*" because it can be interpreted differently when
> > URL-encoded/decoded because it is a reserved character (
> > https://tools.ietf.org/html/rfc3986#section-2.2) - and some languages
> > (Apparently
> > PHP
> > <https://stackoverflow.com/questions/6533561/urlencode-the-asterisk-star-character>
> > might have problems with encoding it accidentally)  . I'd use one of the
> > un-reserved ones (https://tools.ietf.org/html/rfc3986#section-2.3)).
> > If -
> > (dash) is out of the question due to compatibility, I'd use ~(tilde).
> >
> > So if we only want to focus on this kind of query, I think option II. (but
> > with ~) is good for me.
> >
> > I have only one doubt about it. Do we want to handle other queries to
> > taskInstances across dag_id or execution_date? For example
> > corresponding to
> > DAG_ID IN (DAG_1, DAG_2, DAG_3) or DAG_ID LIKE ("DAG_%") from typical SQL
> > queries?. I think that might be a useful type of queries. even more so for
> > executionDate. From what I see we do not have filters for those, but I
> > wonder if this is not something we could do with additional filters and
> > still keep the nice structure?
> >
> > For example:
> >
> > /dags/-/dagRuns/{execiton_date}/taskInstances?dag_id=DAG_1,DAG_2,DAG_3,
> >
> > Not sure if this is something we even want to follow?
> >
> > Ash - what are your thoughts about it? Why do you have doubts about the
> > AIP159 approach ? What problems do you foresee with it ?
> >
> > J.
> >
> >
> >
> > On Mon, May 11, 2020 at 7:37 PM QP Hou <[email protected]> wrote:
> >
> >> I would recommend option III.
> >>
> >> IMHO, hierarchy based endpoints are for CURD operations on a single
> >> entity. The flat endpoint, like /taskinstance, is for read-only
> >> multi-facet search queries.
> >>
> >> For example:
> >>
> >> * to create a resource Foo: "POST /api/parent/{parent_id}/foo"
> >> * to update a resource Foo: "PUT /api/parent/{parent_id}/foo/{foo_id}"
> >> * to get a single Foo entity: "GET /api/parent/{parent_id}/foo/{foo_id}"
> >> * to search for multiple Foo entities using multiple filters: "GET
> >> /api/foo?parent_id=id1,id2,id3&owner=user1"
> >>
> >> On Mon, May 11, 2020 at 5:08 AM Kamil Breguła <[email protected]>
> >> wrote:
> >> >
> >> > Hello,
> >> >
> >> > I apologize for my very negative behavior. I misunderstood the rules.
> >> > I updated the specification based on the discussion. I hope that
> >> now I
> >> > have not missed any suggestions that I had to make. I tried to make
> >> > every change in a separate commit, so you can review the changes one
> >> > by one.
> >> >
> >> > Here is new PR:  https://github.com/apache/airflow/pull/8721
> >> >
> >> > Short summary:
> >> > * Changed type for DAGID, DAGRunID, TaskID. Now we use string instead
> >> > of an integer.
> >> > * EventLog collection is read-only
> >> > * Used ExcutionDate in DagRuns endpoints
> >> > * Used custom action to control task instances - Removed PATCH
> >> > /dags/{dag_id}/taskInstances/{task_id}/{execution_date} endpoint and
> >> > added PATCH /dags/{dag_id}/clearTaskInstanaces.
> >> > * Added endpoint - POST /dagRuns "Trigger a DAG Run"
> >> > * Added filter parameters to GET /dags/{dag_id}/taskInstances.
> >> > * Added filter parameters to GET /dags/{dag_id}/dagRuns
> >> > * Removed DELETE /taskInstances endpoint
> >> > * The connection ID is a unique key that identifies the connection
> >> > * Moved TaskInstances resource under DagRuns resource
> >> > * Fixed many typos
> >> >
> >> > There is one more topic for discussion - reading resources across
> >> > multiple collections.
> >> > In other words - how do I retrieve task instances for multiple DAGs?
> >> > We have several solutions.
> >> > I) Currently, the endpoint that receives a list of task instances is
> >> > available at /dags/{dag_id}/dagRuns/{execiton_date}/taskInstances:
> >> > This endpoint support reading resources across multiple DAGs by
> >> > specifying a "-" as a dag_id or an execution_date.
> >> > This is based on Google recommendations - AIP159 [1]. I relied on
> >> > these recommendations because it is the most comprehensive study on
> >> > API design principles. Ash Berlin-Taylor rightly pointed out that this
> >> > would be a backward-incompatible change.
> >> > II) We can use a different character that will have the same role -
> >> > '*'. This character cannot be in Dag/Task ID, so it's safe.
> >> > III) Ash proposed to add a separate endpoint that will not include the
> >> > DAG ID in the address - /taskInstances
> >> >
> >> > If we want to choose one solution then I think it's worth looking at
> >> > what the endpoints for DAGs look like.
> >> > /dags
> >> > /dags/{dag_id}
> >> > /dags/{dag_id}/clearTaskInstanaces
> >> > /dags/{dag_id}/dagRuns
> >> > /dags/{dag_id}/dagRuns/{execiton_date}/taskInstances
> >> > /dags/{dag_id}/dagRuns/{execiton_date}/taskInstances/{task_id}/
> >> > /dags/{dag_id}/dagRuns/{execiton_date}/taskInstances/{task_id}/links
> >> >
> >> /dags/{dag_id}/dagRuns/{execiton_date}/taskInstances/{task_id}/logs/{task_try_number}
> >> >
> >> /dags/{dag_id}/dagRuns/{execiton_date}/taskInstances/{task_id}/xcomEntries
> >> >
> >> /dags/{dag_id}/dagRuns/{execiton_date}/taskInstances/{task_id}/xcomEntries/{key}
> >> > /dags/{dag_id}/dagRuns/{execution_date}
> >> > /dags/{dag_id}/structure
> >> > /dags/{dag_id}/tasks
> >> > /dags/{dag_id}/tasks/{task_id}
> >> > In my opinion, here is a clear hierarchy of resources that will
> >> > facilitate the use of API. The third solution causes that this
> >> > hierarchy is disturbed and the endpoints that are used to receive the
> >> > list of elements will be at the highest level.
> >> > We will have the following endpoints:
> >> > /dags
> >> > /dags/{dag_id}
> >> > /dags/{dag_id}/clearTaskInstanaces
> >> > /dagRuns
> >> > /dags/{dag_id}/dagRuns
> >> > /taskInstances
> >> > /dags/{dag_id}/dagRuns/{execiton_date}/taskInstances
> >> > /dags/{dag_id}/dagRuns/{execiton_date}/taskInstances/{task_id}/
> >> > /dags/{dag_id}/dagRuns/{execiton_date}/taskInstances/{task_id}/links
> >> >
> >> /dags/{dag_id}/dagRuns/{execiton_date}/taskInstances/{task_id}/logs/{task_try_number}
> >> > /xcomEntries
> >> >
> >> /dags/{dag_id}/dagRuns/{execiton_date}/taskInstances/{task_id}/xcomEntries
> >> >
> >> /dags/{dag_id}/dagRuns/{execiton_date}/taskInstances/{task_id}/xcomEntries/{key}
> >> > /dags/{dag_id}/dagRuns/{execution_date}
> >> > /dags/{dag_id}/structure
> >> > /dags/{dag_id}/tasks
> >> > /dags/{dag_id}/tasks/{task_id}
> >> >
> >> > 4) Some endpoints will have similar behavior, so we can delete them.
> >> > Then we will have the following list of endpoints.
> >> > /dags
> >> > /dags/{dag_id}
> >> > /dags/{dag_id}/clearTaskInstanaces
> >> > /dagRuns
> >> > /taskInstances
> >> > /dags/{dag_id}/dagRuns/{execiton_date}/taskInstances/{task_id}/
> >> > /dags/{dag_id}/dagRuns/{execiton_date}/taskInstances/{task_id}/links
> >> >
> >> /dags/{dag_id}/dagRuns/{execiton_date}/taskInstances/{task_id}/logs/{task_try_number}
> >> > /xcomEntries
> >> >
> >> /dags/{dag_id}/dagRuns/{execiton_date}/taskInstances/{task_id}/xcomEntries/{key}
> >> > /dags/{dag_id}/dagRuns/{execution_date}
> >> > /dags/{dag_id}/structure
> >> > /dags/{dag_id}/tasks
> >> > /dags/{dag_id}/tasks/{task_id}
> >> >
> >> > Which solution do you like the most? I, II, III, or IV?
> >> >
> >> > Best regards,
> >> > Kamil Breguła
> >> >
> >> > [1] https://aip.dev/159
> >> >
> >> > On Wed, Apr 15, 2020 at 8:32 PM Daniel (Daniel Lamblin ) [Data
> >> > Infrastructure] <[email protected]> wrote:
> >> > >
> >> > > I'm glad the general mood is that connection_id should be unique.
> >> > > FWIW when I had multiple connections in v1.8.2 with mysql I didn't
> >> seem to be getting any randomized loadbalancing anyway. Then again, maybe
> >> random was just 100 selections of 1 over 2.
> >> > > There are many other ways to load balance connections, each specific
> >> to the service type, so I don't see it Airflow's place to provide a
> >> semi-generic option to do it.
> >> > >
> >> > > +1 for connection ID being unique.
> >> > >
> >> > > Pardon outlook for changing links to the ConnectionID, DagID and
> >> PoolID being integers in a version of the API.
> >> > > Are we past that decision already; I'd expect to use string names.
> >> > >
> >> > > I'd also asked about DAG run ID, task ID, and finally whether there'd
> >> be an endpoint with which to clear tasks, because crud operations don't
> >> model the interplay of task instance, jobs, and dag run state involved.
> >> > > -Daniel
> >> > >
> >> > > On 4/14/20, 8:49 AM, "Xinbin Huang" <[email protected]> wrote:
> >> > >
> >> > >     [Warning]: This email originated from an external source. Do not
> >> open links or attachments unless you know the content is safe.
> >> > >     [경고]: 본 이메일은 회사 외부에서 유입되었습니다. 내용이 안전한지 확인하기 전까지는 링크나 첨부파일을 열지 마십시오.
> >> > >
> >> > >
> >> > >     +1 on making connection IDs unique. It's confusing to have Airflow
> >> handled
> >> > >     load balancing here.
> >> > >
> >> > >     On Tue, Apr 14, 2020, 4:59 AM Kaxil Naik <[email protected]>
> >> wrote:
> >> > >
> >> > >     > +1 to make connection ids unique
> >> > >     >
> >> > >     > On Tue, Apr 14, 2020 at 11:59 AM Jarek Potiuk <
> >> [email protected]>
> >> > >     > wrote:
> >> > >     >
> >> > >     > > I am also supporting converting the connection to be unique.
> >> > >     > >
> >> > >     > > I've worked with similar approach long time ago (10s of years)
> >> and it was
> >> > >     > > fine then where we have not yet figured out how to scale
> >> client/server
> >> > >     > > architecture and we did not have all the nice infrastructure
> >> like
> >> > >     > > load/balancing, cloud services etc. I believe in most of the
> >> current
> >> > >     > > services/systems load-balancing should be handled by the
> >> service itself
> >> > >     > or
> >> > >     > > some kind of proxy between - not by the client side - in
> >> production
> >> > >     > > environments.
> >> > >     > >
> >> > >     > > It's far more robust and might provide much better control
> >> (you can
> >> > >     > control
> >> > >     > > multiple independent client's connection and do not rely on
> >> random
> >> > >     > > distribution of connections). There are scenarios that would
> >> not work
> >> > >     > well
> >> > >     > > in this case - for example physical proximity of the workers,
> >> current
> >> > >     > load
> >> > >     > > on different services etc. etc. It is very limiting to
> >> rely on
> >> this
> >> > >     > > feature.
> >> > >     > >
> >> > >     > > On Tue, Apr 14, 2020 at 11:36 AM Kamil Breguła <
> >> > >     > [email protected]>
> >> > >     > > wrote:
> >> > >     > >
> >> > >     > > > Hello,
> >> > >     > > >
> >> > >     > > > This can cause big problems with idempotence. According to
> >> RFC-7231,
> >> > >     > > > the DELETE method should be idempotent.
> >> > >     > > >
> >> > >     > > > For example:
> >> > >     > > > If you want to delete items with index from 1 to 4, you
> >> should set the
> >> > >     > > > following request
> >> > >     > > > DELETE /connections/hdfs_default/4
> >> > >     > > > DELETE /connections/hdfs_default/3
> >> > >     > > > DELETE /connections/hdfs_default/2
> >> > >     > > > DELETE /connections/hdfs_default/1
> >> > >     > > >
> >> > >     > > > However, if the user sends these requests in a different
> >> order, they
> >> > >     > > > will only delete the first and third items.
> >> > >     > > > DELETE /connections/hdfs_default/1
> >> > >     > > > DELETE /connections/hdfs_default/2
> >> > >     > > > DELETE /connections/hdfs_default/3
> >> > >     > > > DELETE /connections/hdfs_default/4
> >> > >     > > >
> >> > >     > > > If you use asynchronous HTTP clients (a popular in Node),
> >> the order of
> >> > >     > > > requests will not be kept. It will also be a big
> >> problem with
> >> > >     > > > simultaneous modifications.
> >> > >     > > >
> >> > >     > > > I am also in favor of abandoning support for many
> >> connections. This
> >> > >     > > > can be solved on a different layer.
> >> > >     > > >
> >> > >     > > > Best regards,
> >> > >     > > > Kamil
> >> > >     > > >
> >> > >     > > >
> >> > >     > > > On Thu, Apr 9, 2020 at 6:47 PM Ash Berlin-Taylor <
> >> [email protected]>
> >> > >     > wrote:
> >> > >     > > > >
> >> > >     > > > > They are, but we _can_ make a choice to remove that
> >> feature -- it is
> >> > >     > > not
> >> > >     > > > > widely used and is confusing to many when they
> >> stumble on
> >> it.
> >> > >     > > > >
> >> > >     > > > > It's not something we should do lightly, but it is a
> >> possibility.
> >> > >     > > > >
> >> > >     > > > > I think I'm probably leaning towards the "ordinal" concept:
> >> > >     > > > >
> >> > >     > > > > /connections/hdfs_default -> list of connections with that
> >> ID
> >> > >     > > > > /connections/hdfs_default/0 first connection of that type
> >> > >     > > > >
> >> > >     > > > > Something like that.
> >> > >     > > > >
> >> > >     > > > > On Apr 9 2020, at 2:31 pm, Shaw, Damian P.
> >> > >     > > > > <[email protected]> wrote:
> >> > >     > > > >
> >> > >     > > > > > FYI if you look back at the thread "Re: [2.0 spring
> >> cleaning]
> >> > >     > Require
> >> > >     > > > > > unique conn_id" on 2019-04-14 you can see a message from
> >> Kevin Yang
> >> > >     > > > > > stating that this random choice of connections is a
> >> "feature" used
> >> > >     > to
> >> > >     > > > > > load balance connections in AirBnB. So users are relying
> >> on this
> >> > >     > > > behavior.
> >> > >     > > > > >
> >> > >     > > > > >
> >> > >     > > > > > -----Original Message-----
> >> > >     > > > > > From: Daniel (Daniel Lamblin ) [Data Infrastructure]
> >> > >     > > > > > <[email protected]>
> >> > >     > > > > > Sent: Wednesday, April 8, 2020 20:01
> >> > >     > > > > > To: [email protected]
> >> > >     > > > > > Subject: Re: API spec questions
> >> > >     > > > > >
> >> > >     > > > > > Having been bit by accidentally having two connections
> >> by the same
> >> > >     > > > > > name or conn_id, I'd prefer if were made unique. In my
> >> experience
> >> > >     > > > > > there's little utility in having multiple
> >> connections by
> >> the same
> >> > >     > > > > > name. Tasks that use a connection do to fairly randomly
> >> choose one,
> >> > >     > > > > > rather they seem pretty consistent in using the one
> >> created
> >> > >     > earliest,
> >> > >     > > > > > which often has the lower id integer.
> >> > >     > > > > >
> >> > >     > > > > > Circling back to how this is used by the API, from
> >> a user
> >> > >     > > perspective,
> >> > >     > > > > > the following in path integer fields were ones I'd have
> >> expected to
> >> > >     > > be
> >> > >     > > > > > strings instead:
> >> > >     > > > > > ConnectionID
> >> > >     > > >
> >> > >     > >
> >> > >     >
> >> https://kor01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fairflow%2Fblob%2F11f1e0cad996d5596e3e4fb440eb4ec52c024f70%2Fopenapi.yaml%23L1845&amp;data=02%7C01%7Clamblin%40coupang.com%7Cd8716c2432c54849f3d708d7e08b68f2%7Ce3098f96361b47c6a9f4ab7bafcaffe9%7C0%7C0%7C637224761681223746&amp;sdata=ur3AAggOlQ7BskKBxQxRq0i%2FS36Ol72c7H7ddwXo5C4%3D&amp;reserved=0
> >> > >     > > > > > DAGID
> >> > >     > > >
> >> > >     > >
> >> > >     >
> >> https://kor01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fairflow%2Fblob%2F11f1e0cad996d5596e3e4fb440eb4ec52c024f70%2Fopenapi.yaml%23L1853&amp;data=02%7C01%7Clamblin%40coupang.com%7Cd8716c2432c54849f3d708d7e08b68f2%7Ce3098f96361b47c6a9f4ab7bafcaffe9%7C0%7C0%7C637224761681223746&amp;sdata=vyhFiV6Fd3pnVvlAoxnscSvvcy2NX%2BMASGmuHKu%2Fy84%3D&amp;reserved=0
> >> > >     > > > > > PoolID
> >> > >     > > >
> >> > >     > >
> >> > >     >
> >> https://kor01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fairflow%2Fblob%2F11f1e0cad996d5596e3e4fb440eb4ec52c024f70%2Fopenapi.yaml%23L1901&amp;data=02%7C01%7Clamblin%40coupang.com%7Cd8716c2432c54849f3d708d7e08b68f2%7Ce3098f96361b47c6a9f4ab7bafcaffe9%7C0%7C0%7C637224761681223746&amp;sdata=4Fog92AZju0%2BN44FoBdU4fj2DwhgCbUfKYUbdjeWpgo%3D&amp;reserved=0
> >> > >     > > > > >
> >> > >     > > > > > Though it’s a url-encoding hassle, I also expected that
> >> DAGRunID
> >> > >     > > would
> >> > >     > > > > > be more like the Run ID E.G.
> >> > >     > > > > > "scheduled__2020-04-08T23:10:00+00:00",
> >> > >     > > > > > "manual__2020-04-08T23:00:56+00:00",
> >> > >     > > > > > "manual__2020-04-08T16:24:56.692868+00:00" ,
> >> > >     > > > > > "backfill_2020-04-08T22:05:00+00:00" etc
> >> > >     > > > > >
> >> > >     > > > > > Then TaskID is confusing to me; AFAIK the PK to task
> >> instances are
> >> > >     > > > > > task_id, dag_id, and execution_date and the api call
> >> appears to
> >> > >     > align
> >> > >     > > > > > with that having the pattern:
> >> > >     > > > > > /dags/{dag_id}/taskInstances/{task_id}/{execution_date}:
> >> > >     > > > > > But if task_id is a numbered id, then… execution_date
> >> isn't even
> >> > >     > > > > > needed… I'm thinking it should have been a string.
> >> > >     > > > > >
> >> > >     > > > > > An aside to this, I've always wondered what happens
> >> if an
> >> > >     > externally
> >> > >     > > > > > triggered DAG Run has the same execution date as a
> >> pre-existing
> >> > >     > > > > > scheduled DAG Run. They'd have different run_ids, EG
> >> > >     > > > > > "scheduled__2020-04-08T23:10:00+00:00" vs
> >> > >     > > > > > "manual__2020-04-08T23:10:00+00:00" but then task
> >> instances for
> >> > >     > those
> >> > >     > > > > > runs might not be unique.
> >> > >     > > > > >
> >> > >     > > > > > Lastly, the UI and CLI operation of clearing tasks seems
> >> analogous
> >> > >     > to
> >> > >     > > > > > the delete task instance API end point. But probably
> >> it's not, and
> >> > >     > > > > > this could become confusing to users.
> >> > >     > > > > > There should be a non-db-model api call for clearing
> >> tasks like you
> >> > >     > > > > > can from the UI and the CLI. If I read it right,
> >> clearing does the
> >> > >     > > > > > following: it sets the state of the task instance to
> >> None unless
> >> > >     > the
> >> > >     > > > > > state was Running then instead it sets it and related
> >> job ids to
> >> > >     > > > > > Shutdown. It deletes reschedules of the TI and it sets
> >> the dag runs
> >> > >     > > > > > for those task instances back to Running.
> >> > >     > > > > > This would be a lot to do for a user using PATCH calls
> >> to change
> >> > >     > Task
> >> > >     > > > > > Instances and Dag Runs together (and there's no API for
> >> Jobs).
> >> > >     > > > > >
> >> > >     > > > > > -Daniel L.
> >> > >     > > > > > P.S. as far as renaming all parameters on operators and
> >> hooks with
> >> > >     > > > > > *_conn_id, I do not want to see that breaking change
> >> happen. But IF
> >> > >     > > it
> >> > >     > > > > > HAS TO, I'm still of the opinion that the default_args
> >> use for
> >> > >     > > > > > X_conn_id is not preferable to having all operators take
> >> simpler
> >> > >     > > > > > source_conn_id and optional target_conn_id parameter
> >> names that are
> >> > >     > > > > > consistent across all operators.
> >> > >     > > > > >
> >> > >     > > > > > On 4/8/20, 9:47 AM, "Ash Berlin-Taylor" <[email protected]>
> >> wrote:
> >> > >     > > > > >
> >> > >     > > > > >    [Warning]: This email originated from an external
> >> source. Do not
> >> > >     > > > > > open links or attachments unless you know the
> >> content is
> >> safe.
> >> > >     > > > > >    [경고]: 본 이메일은 회사 외부에서 유입되었습니다. 내용이 안전한지 확인하기 전까지는 링크나
> >> 첨부파일을 열지
> >> > >     > > 마십시오.
> >> > >     > > > > >
> >> > >     > > > > >
> >> > >     > > > > >    To expand on the "so I think we need to do one of":
> >> > >     > > > > >
> >> > >     > > > > >
> >> > >     > > > > >    - we need to update the name of "conn_id"
> >> somehow. I
> >> can't think
> >> > >     > > of
> >> > >     > > > a
> >> > >     > > > > >    better option, and given all the operators have
> >> "x_conn_id" I
> >> > >     > > don't
> >> > >     > > > > >    think that change should be made lightly.
> >> > >     > > > > >
> >> > >     > > > > >    - make conn_id unique (this "poor" HA has been a
> >> source of
> >> > >     > > > > > confusion in
> >> > >     > > > > >    the past) and the ID we use in the API
> >> > >     > > > > >
> >> > >     > > > > >    Or a third option:
> >> > >     > > > > >
> >> > >     > > > > >    - Have the API take conn_id and either return all
> >> conns for that
> >> > >     > > > > >    conn_id, or conn_id plus an ordinal (?idx=0 type
> >> thing) to
> >> > >     > return
> >> > >     > > a
> >> > >     > > > > >    single value.
> >> > >     > > > > >
> >> > >     > > > > >    On Apr 8 2020, at 5:42 pm, Ash Berlin-Taylor <
> >> [email protected]>
> >> > >     > > > wrote:
> >> > >     > > > > >
> >> > >     > > > > >    > Hi everyone,
> >> > >     > > > > >    >
> >> > >     > > > > >    > So as I mentioned in the AIP voting thread, I think
> >> we need to
> >> > >     > > > give
> >> > >     > > > > >    > some more thought to how we are exposing connection
> >> ids in the
> >> > >     > > > API.
> >> > >     > > > > >    >
> >> > >     > > > > >    > Right now as proposed (and merged without approval,
> >> not cool.
> >> > >     > > The
> >> > >     > > > AIP
> >> > >     > > > > >    > we voted on did not contain a PR against
> >> apache/airflow.) it
> >> > >     > has
> >> > >     > > > an
> >> > >     > > > > >    > end point of `/connections/{connection_id} `
> >> > >     > > > > >    >
> >> > >     > > > > >    > My issue here is as I said in the previous thread:
> >> that is
> >> > >     > going
> >> > >     > > > > > to be
> >> > >     > > > > >    > mightly confusing to our users because there
> >> is a
> >> "conn_id"
> >> > >     > > > concept
> >> > >     > > > > >    > that is exposed, so people are going to try putting
> >> > >     > > "aws_default"
> >> > >     > > > etc
> >> > >     > > > > >    > in there. I don't care what the API spec says, I
> >> care what our
> >> > >     > > > users
> >> > >     > > > > >    > expect, and having a connection_id/id and a conn_id
> >> fields is
> >> > >     > > > > > just confusing
> >> > >     > > > > >    >
> >> > >     > > > > >    > So I think we need to do one of:
> >> > >     > > > > >    > - we need to update the name of "conn_id" somehow,
> >> make
> >> > >     > conn_id
> >> > >     > > > unique
> >> > >     > > > > >    > (this "poor" HA has been a source of confusion in
> >> the past)
> >> > >     > > > > >    >
> >> > >     > > > > >    > There are similar problems for the DAG run -- the
> >> spec has the
> >> > >     > > > > > type as
> >> > >     > > > > >    > an integer, but everything else about the Airflow
> >> system uses
> >> > >     > > the
> >> > >     > > > > >    > (unique) run_id parameter, and I would expect the
> >> API to use
> >> > >     > > > > > that. The
> >> > >     > > > > >    > autoinc. id on the run column is literally never
> >> used in the
> >> > >     > > code
> >> > >     > > > > >    > base, so exposing that via the API seems odd.
> >> > >     > > > > >    >
> >> > >     > > > > >    > A few other smaller points:
> >> > >     > > > > >    >
> >> > >     > > > > >    > EventLog: Those are "audit"/action logs, so we
> >> probably
> >> > >     > > shouldn't
> >> > >     > > > let
> >> > >     > > > > >    > people delete them via the API!
> >> > >     > > > > >    >
> >> > >     > > > > >    > pool_id: still an integer. It should be the "name".
> >> > >     > > > > >    >
> >> > >     > > > > >    > How should add/delete variables and connections
> >> work in the
> >> > >     > API
> >> > >     > > > with
> >> > >     > > > > >    > the addition of the new "Secrets Backends"?
> >> > >     > > > > >    >
> >> > >     > > > > >    > xcomValues: task_id is listed as an integer.
> >> > >     > > > > >    >
> >> > >     > > > > >    >
> >> > >     > > > > >    > -ash
> >> > >     > > > > >
> >> > >     > > > > >
> >> > >     > > > > >
> >> > >     > > > > >
> >> > >     > > > > >
> >> > >     > > > > >
> >> > >     > > > >
> >> > >     > > >
> >> > >     > >
> >> > >     >
> >> ===============================================================================
> >> > >     > > > > > Please access the attached hyperlink for an important
> >> electronic
> >> > >     > > > > > communications disclaimer:
> >> > >     > > > > >
> >> https://kor01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.credit-suisse.com%2Flegal%2Fen%2Fdisclaimer_email_ib.html&amp;data=02%7C01%7Clamblin%40coupang.com%7Cd8716c2432c54849f3d708d7e08b68f2%7Ce3098f96361b47c6a9f4ab7bafcaffe9%7C0%7C0%7C637224761681223746&amp;sdata=6e9Ck952oFmXDkoDYnM0XgCWsplYb7Dua37BkAPt22A%3D&amp;reserved=0
> >> > >     > > > > >
> >> > >     > > > >
> >> > >     > > >
> >> > >     > >
> >> > >     >
> >> ===============================================================================
> >> > >     > > > > >
> >> > >     > > >
> >> > >     > >
> >> > >     > >
> >> > >     > > --
> >> > >     > >
> >> > >     > > Jarek Potiuk
> >> > >     > > Polidea <
> >> https://kor01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.polidea.com%2F&amp;data=02%7C01%7Clamblin%40coupang.com%7Cd8716c2432c54849f3d708d7e08b68f2%7Ce3098f96361b47c6a9f4ab7bafcaffe9%7C0%7C0%7C637224761681223746&amp;sdata=tMOZFKjLGoXh9eAVW5rXhmJeJw8kYhlWAfwFICN%2Fpfw%3D&amp;reserved=0>
> >> | Principal Software Engineer
> >> > >     > >
> >> > >     > > M: +48 660 796 129 <+48660796129>
> >> > >     > > [image: Polidea] <
> >> https://kor01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.polidea.com%2F&amp;data=02%7C01%7Clamblin%40coupang.com%7Cd8716c2432c54849f3d708d7e08b68f2%7Ce3098f96361b47c6a9f4ab7bafcaffe9%7C0%7C0%7C637224761681223746&amp;sdata=tMOZFKjLGoXh9eAVW5rXhmJeJw8kYhlWAfwFICN%2Fpfw%3D&amp;reserved=0
> >> >
> >> > >     > >
> >> > >     >
> >> > >
> >> > >
> >>
> >
> >
> > --
> >
> > Jarek Potiuk
> > Polidea <https://www.polidea.com/> | Principal Software Engineer
> >
> > M: +48 660 796 129 <+48660796129>
> > [image: Polidea] <https://www.polidea.com/>
> >

Re: API spec questions

Reply via email to