On Tue, May 12, 2020 at 3:49 PM Jarek Potiuk <jarek.pot...@polidea.com> wrote: > > My 3 cents: > > > > But on reading Google's https://aip.dev/159 that now makes more sense, > > and that isn't what you were suggesting, but instaed a single, litteral > > `-` to mean "any dag id". Is this correct? > > > > That's also my understanding. > > > > So I think then my only ask is that we have a `dag_ids` parameter that > > we can use to filter on :) > > > > Yep. This is definitely needed. > > Additionally/related when selecting across multiple DAGs we would very > > quickly run in to that was fixed in github.com/apache/airflow/pull/7364 > > -- where asking for all the task stats for the DAGs on a single page > > exceeded the HTTP request line limit. > > > > i.e. this could break if the DAG ids are large, or if they is a large > > number of dags to be fetched. > > > > GET > > /dags/-/dagRuns/2020-01-01T00/taskInstances?dag_ids=...&dag_ids=...&dag_ids=... > > > > How about this as an additional endpoint: > > > > POST /dags/-/dagRuns/2020-01-01/taskInstances/list > > > > dag_ids=...&dag_ids=...&dag_ids... > > > > Very sensible proposal. The URL max length very much depends on the client > (especially when it is run from browser). where POST is virtually > unlimited. It's common thing to use POST in this case. >
I'm not sure ... I think the following conventions are most commonly used GET /items/ - fetch list of items POST /items/ - create item Google recommends using the batch API endpoint in these cases. This involves sending a single HTTP request using the POST method, but the request body has many HTTP requests for multiple collections. POST /batch/v1 HTTP/1.1 Authorization: Bearer your_auth_token Content-Type: multipart/mixed; boundary=batch_foobarbaz Content-Length: total_content_length --batch_foobarbaz Content-Type: application/http Content-ID: <item1:12930...@barnyard.example.com> GET /dag/DAG_A/taskInstance/ --batch_foobarbaz-- GET /dag/DAG_B/taskInstance/ --batch_foobarbaz-- GET /dag/DAG_B/taskInstance/ --batch_foobarbaz-- Here is more information: https://developers.google.com/gmail/api/guides/batch We can, of course, come up with our approach and not rely on the idea of Google solutions. I refer to Google because I have experience with API and design principles. I would like this API to be permanent and there was no need to introduce breaking changes. It is always possible to add filtering and endpoint later, but removing it is a breaking change. Kubernetes doesn't provide batch API and the user build their solutions without it. I still looked at how Microsoft implements this. https://docs.microsoft.com/en-us/graph/json-batching I will only quote a fragment that can be valuable to us. This only convinces me that if we don't have Batch API then we should carefully add the filtering option in the query string. > Bypassing URL length limitations with batching > > An additional use case for JSON batching is to bypass URL length limitations. > In cases > where the filter clause is complex, the URL length might surpass limitations > built into > browsers or other HTTP clients. You can use > JSON batching as a workaround for running these requests because the > lengthy URL simply becomes part of the request payload. > > > > > (The /list suffix isn't needed here as there is no POST for > > taskInstances, but there is in other places, so making it explicit means > > we can use the same pattern for, say `POST /dags/-/dagRuns/list`) > > > > > > > > > If it is valid to pass just one of ExecutionDateStartRange and > > ExecutionDateEndRange could we call it ExecutionDateGTE etc. (for > > Greater-than-or-equal) so it makes more sense when a single parameter is > > providerd. > > > > +1 > > > > A minor style point: The filter/query string fields are all defined as > > BumpyCase, but I'd prefer us to use snake_case (`execution_date_gte` > > etc) - the only sort of applicable proposal I can find is > > https://aip.dev/140 > > > > > Yeah. In this case I also like snakes better than Bumpys or Camels. > Especially python ;). > > > > (As in I can't find anything in Google recs that talk about case for > > case of query string/filter params. I guess they suggest having it all > > as a single `?filter=` param :- https://aip.dev/160 ) > > > > Yeah. I think I'd also want to understand better how filter examples > woudl look like. It's not obvious from the spec (no time to dig deeply). > > J.