Hello, It depends on the specific implementation of the Batch API. The Microsoft API can be used with requests. https://docs.microsoft.com/en-us/graph/json-batching
I would not like to add new endpoints to the API due to premature optimization. Adding a new endpoint - /dags/{dag_id}/dagRuns/lisst/ s opens up many questions. 1. We currently have an endpoint that has a similar URL - /dags/{dag_id}/dagRuns/{execution_date}/ so we have a collision. /dags/{dag_id}/dagRuns/list/ can be interpreter as two way: 1. /dags/{dag_id}/dagRuns/list/ 2. /dags/{dag_id}/dagRuns/{execution_date}/ with execution_date = "list" First, the endpoint URL pattern is matched, and then the parameters are verified in the next step. 2. This will also make it difficult to use the API, since DAG_ID will appear in several places, which can cause confusion First in the URL with the wildcard - "~" (dag_id) and the second time in body requests (dag_iids). POST /dags/{dag_id}/dagRuns/list/ dag_ids=DAG_A&dag_ids=DAG_B&&dag_ids=DAG_B 3. We will have two endpoints that will have similar behavior. GET /dags/{dag_id}/dagRuns/ POST /dags/{dag_id}/dagRuns/list/ This will cause confusion among users. Are there any reasons why you should add the dag_ids filter in addition to performance? Currently, the user can realize his use case and this is probably the most important thing. Best regards On Tue, May 12, 2020 at 6:34 PM Ash Berlin-Taylor <a...@apache.org> wrote: > > Such a batch endpoint is much much harder for API clients to build requests > for, and consume (you can no longer just use cURL/requests/any http client), > so I'm not a fan of that > > On 12 May 2020 17:07:12 BST, "Kamil Breguła" <kamil.breg...@polidea.com> > wrote: >> >> On Tue, May 12, 2020 at 3:49 PM Jarek Potiuk <jarek.pot...@polidea.com> >> wrote: >>> >>> >>> My 3 cents: >>> >>> >>>> But on reading Google's https://aip.dev/159 that now makes more sense, >>>> and that isn't what you were suggesting, but instaed a single, litteral >>>> `-` to mean "any dag id". Is this correct? >>>> >>> >>> That's also my understanding. >>> >>> >>>> So I think then my only ask is that we have a `dag_ids` parameter that >>>> we can use to filter on :) >>>> >>> >>> Yep. This is definitely needed. >>> >>> Additionally/related when selecting across multiple DAGs we would very >>>> >>>> quickly run in to that was fixed in github.com/apache/airflow/pull/7364 >>>> -- where asking for all the task stats for the DAGs on a single page >>>> exceeded the HTTP request line limit. >>>> >>>> i.e. this could break if the DAG ids are large, or if they is a large >>>> number of dags to be fetched. >>>> >>>> GET >>>> >>>> /dags/-/dagRuns/2020-01-01T00/taskInstances?dag_ids=...&dag_ids=...&dag_ids=... >>>> >>>> How about this as an additional endpoint: >>>> >>>> POST /dags/-/dagRuns/2020-01-01/taskInstances/list >>>> >>>> dag_ids=...&dag_ids=...&dag_ids... >>>> >>> >>> Very sensible proposal. The URL max length very much depends on the client >>> (especially when it is run from browser). where POST is virtually >>> unlimited. It's common thing to use POST in this case. >>> >> >> I'm not sure ... I think the following conventions are most commonly used >> GET /items/ - fetch list of items >> POST /items/ - create item >> >> Google recommends using the batch API endpoint in these cases. This >> involves sending a single HTTP request using the POST method, but the >> request body has many HTTP requests for multiple collections. >> >> POST /batch/v1 HTTP/1.1 >> Authorization: Bearer your_auth_token >> Content-Type: multipart/mixed; boundary=batch_foobarbaz >> Content-Length: total_content_length >> >> --batch_foobarbaz >> Content-Type: application/http >> Content-ID: <item1:12930...@barnyard.example.com> >> >> GET /dag/DAG_A/taskInstance/ >> >> --batch_foobarbaz-- >> >> GET /dag/DAG_B/taskInstance/ >> >> --batch_foobarbaz-- >> >> GET /dag/DAG_B/taskInstance/ >> >> --batch_foobarbaz-- >> >> Here is more information: >> https://developers.google.com/gmail/api/guides/batch >> >> We can, of course, come up with our approach and not rely on the idea >> of Google solutions. I refer to Google because I have experience with >> API and design principles. I would like this API to be permanent and >> there was no need to introduce breaking changes. It is always possible >> to add filtering and endpoint later, but removing it is a breaking >> change. >> >> Kubernetes doesn't provide batch API and the user build their >> solutions without it. >> >> I still looked at how Microsoft implements this. >> https://docs.microsoft.com/en-us/graph/json-batching >> >> I will only quote a fragment that can be valuable to us. This only >> convinces me that if we don't have Batch API then we should carefully >> add the filtering option in the query string. >> >>> Bypassing URL length limitations with batching >>> >>> An additional use case for JSON batching is to bypass URL length >>> limitations. In cases >>> where the filter clause is complex, the URL length might surpass >>> limitations built into >>> browsers or other HTTP clients. You can use >>> JSON batching as a workaround for running these requests because the >>> lengthy URL simply becomes part of the request payload. >> >> >> >> >> >>> >>>> >>>> (The /list suffix isn't needed here as there is no POST for >>>> taskInstances, but there is in other places, so making it explicit means >>>> we can use the same pattern for, say `POST /dags/-/dagRuns/list`) >>>> >>>> >>> >>>> >>>> If it is valid to pass just one of ExecutionDateStartRange and >>>> ExecutionDateEndRange could we call it ExecutionDateGTE etc. (for >>>> Greater-than-or-equal) so it makes more sense when a single parameter is >>>> providerd. >>>> >>>> +1 >>> >>> >>> >>>> A minor style point: The filter/query string fields are all defined as >>>> BumpyCase, but I'd prefer us to use snake_case (`execution_date_gte` >>>> etc) - the only sort of applicable proposal I can find is >>>> https://aip.dev/140 >>>> >>>> >>> Yeah. In this case I also like snakes better than Bumpys or Camels. >>> Especially python ;). >>> >>> >>>> (As in I can't find anything in Google recs that talk about case for >>>> case of query string/filter params. I guess they suggest having it all >>>> as a single `?filter=` param :- https://aip.dev/160 ) >>>> >>> >>> Yeah. I think I'd also want to understand better how filter examples >>> woudl look like. It's not obvious from the spec (no time to dig deeply). >>> >>> J.