Such a batch endpoint is much much harder for API clients to build requests for, and consume (you can no longer just use cURL/requests/any http client), so I'm not a fan of that
On 12 May 2020 17:07:12 BST, "Kamil Breguła" <kamil.breg...@polidea.com> wrote: >On Tue, May 12, 2020 at 3:49 PM Jarek Potiuk <jarek.pot...@polidea.com> >wrote: >> >> My 3 cents: >> >> >> > But on reading Google's https://aip.dev/159 that now makes more >sense, >> > and that isn't what you were suggesting, but instaed a single, >litteral >> > `-` to mean "any dag id". Is this correct? >> > >> >> That's also my understanding. >> >> >> > So I think then my only ask is that we have a `dag_ids` parameter >that >> > we can use to filter on :) >> > >> >> Yep. This is definitely needed. >> >> Additionally/related when selecting across multiple DAGs we would >very >> > quickly run in to that was fixed in >github.com/apache/airflow/pull/7364 >> > -- where asking for all the task stats for the DAGs on a single >page >> > exceeded the HTTP request line limit. >> > >> > i.e. this could break if the DAG ids are large, or if they is a >large >> > number of dags to be fetched. >> > >> > GET >> > >/dags/-/dagRuns/2020-01-01T00/taskInstances?dag_ids=...&dag_ids=...&dag_ids=... >> > >> > How about this as an additional endpoint: >> > >> > POST /dags/-/dagRuns/2020-01-01/taskInstances/list >> > >> > dag_ids=...&dag_ids=...&dag_ids... >> > >> >> Very sensible proposal. The URL max length very much depends on the >client >> (especially when it is run from browser). where POST is virtually >> unlimited. It's common thing to use POST in this case. >> > >I'm not sure ... I think the following conventions are most commonly >used >GET /items/ - fetch list of items >POST /items/ - create item > >Google recommends using the batch API endpoint in these cases. This >involves sending a single HTTP request using the POST method, but the >request body has many HTTP requests for multiple collections. > >POST /batch/v1 HTTP/1.1 >Authorization: Bearer your_auth_token >Content-Type: multipart/mixed; boundary=batch_foobarbaz >Content-Length: total_content_length > >--batch_foobarbaz >Content-Type: application/http >Content-ID: <item1:12930...@barnyard.example.com> > >GET /dag/DAG_A/taskInstance/ > >--batch_foobarbaz-- > >GET /dag/DAG_B/taskInstance/ > >--batch_foobarbaz-- > >GET /dag/DAG_B/taskInstance/ > >--batch_foobarbaz-- > >Here is more information: >https://developers.google.com/gmail/api/guides/batch > >We can, of course, come up with our approach and not rely on the idea >of Google solutions. I refer to Google because I have experience with >API and design principles. I would like this API to be permanent and >there was no need to introduce breaking changes. It is always possible >to add filtering and endpoint later, but removing it is a breaking >change. > >Kubernetes doesn't provide batch API and the user build their >solutions without it. > >I still looked at how Microsoft implements this. >https://docs.microsoft.com/en-us/graph/json-batching > >I will only quote a fragment that can be valuable to us. This only >convinces me that if we don't have Batch API then we should carefully >add the filtering option in the query string. > >> Bypassing URL length limitations with batching >> >> An additional use case for JSON batching is to bypass URL length >limitations. In cases >> where the filter clause is complex, the URL length might surpass >limitations built into >> browsers or other HTTP clients. You can use >> JSON batching as a workaround for running these requests because the >> lengthy URL simply becomes part of the request payload. > > > > >> >> > >> > (The /list suffix isn't needed here as there is no POST for >> > taskInstances, but there is in other places, so making it explicit >means >> > we can use the same pattern for, say `POST /dags/-/dagRuns/list`) >> > >> > >> >> > >> > If it is valid to pass just one of ExecutionDateStartRange and >> > ExecutionDateEndRange could we call it ExecutionDateGTE etc. (for >> > Greater-than-or-equal) so it makes more sense when a single >parameter is >> > providerd. >> > >> > +1 >> >> >> > A minor style point: The filter/query string fields are all defined >as >> > BumpyCase, but I'd prefer us to use snake_case >(`execution_date_gte` >> > etc) - the only sort of applicable proposal I can find is >> > https://aip.dev/140 >> > >> > >> Yeah. In this case I also like snakes better than Bumpys or Camels. >> Especially python ;). >> >> >> > (As in I can't find anything in Google recs that talk about case >for >> > case of query string/filter params. I guess they suggest having it >all >> > as a single `?filter=` param :- https://aip.dev/160 ) >> > >> >> Yeah. I think I'd also want to understand better how filter examples >> woudl look like. It's not obvious from the spec (no time to dig >deeply). >> >> J.