1. list is not even close to a valid execution date, so I don't see that as a clash.
2. Respond with 400 Bad Request if URL dag id is not whatever wildcard char we pick. (Is ~ URL safe?) when dag_ids query param is provided. 3. Docs is the answer to that. "This is the same as GET, but allows for longer request bodies" Reason I'd like such a filter is that without it out request requires N http requests and N db queries instead of 1. Even doing it as batch requests would still need N queries. On 12 May 2020 18:00:50 BST, "Kamil Breguła" <kamil.breg...@polidea.com> wrote: >Hello, > >It depends on the specific implementation of the Batch API. The >Microsoft API can be used with requests. >https://docs.microsoft.com/en-us/graph/json-batching > >I would not like to add new endpoints to the API due to premature >optimization. Adding a new endpoint - /dags/{dag_id}/dagRuns/lisst/ >s opens up many questions. >1. We currently have an endpoint that has a similar URL - >/dags/{dag_id}/dagRuns/{execution_date}/ so we have a collision. >/dags/{dag_id}/dagRuns/list/ can be interpreter as two way: >1. /dags/{dag_id}/dagRuns/list/ >2. /dags/{dag_id}/dagRuns/{execution_date}/ with execution_date = >"list" >First, the endpoint URL pattern is matched, and then the parameters >are verified in the next step. > >2. This will also make it difficult to use the API, since DAG_ID will >appear in several places, which can cause confusion First in the URL >with the wildcard - "~" (dag_id) and the second time in body requests >(dag_iids). >POST /dags/{dag_id}/dagRuns/list/ > >dag_ids=DAG_A&dag_ids=DAG_B&&dag_ids=DAG_B > >3. We will have two endpoints that will have similar behavior. >GET /dags/{dag_id}/dagRuns/ >POST /dags/{dag_id}/dagRuns/list/ >This will cause confusion among users. > >Are there any reasons why you should add the dag_ids filter in >addition to performance? Currently, the user can realize his use case >and this is probably the most important thing. > >Best regards > >On Tue, May 12, 2020 at 6:34 PM Ash Berlin-Taylor <a...@apache.org> >wrote: >> >> Such a batch endpoint is much much harder for API clients to build >requests for, and consume (you can no longer just use cURL/requests/any >http client), so I'm not a fan of that >> >> On 12 May 2020 17:07:12 BST, "Kamil Breguła" ><kamil.breg...@polidea.com> wrote: >>> >>> On Tue, May 12, 2020 at 3:49 PM Jarek Potiuk ><jarek.pot...@polidea.com> wrote: >>>> >>>> >>>> My 3 cents: >>>> >>>> >>>>> But on reading Google's https://aip.dev/159 that now makes more >sense, >>>>> and that isn't what you were suggesting, but instaed a single, >litteral >>>>> `-` to mean "any dag id". Is this correct? >>>>> >>>> >>>> That's also my understanding. >>>> >>>> >>>>> So I think then my only ask is that we have a `dag_ids` parameter >that >>>>> we can use to filter on :) >>>>> >>>> >>>> Yep. This is definitely needed. >>>> >>>> Additionally/related when selecting across multiple DAGs we would >very >>>>> >>>>> quickly run in to that was fixed in >github.com/apache/airflow/pull/7364 >>>>> -- where asking for all the task stats for the DAGs on a single >page >>>>> exceeded the HTTP request line limit. >>>>> >>>>> i.e. this could break if the DAG ids are large, or if they is a >large >>>>> number of dags to be fetched. >>>>> >>>>> GET >>>>> >/dags/-/dagRuns/2020-01-01T00/taskInstances?dag_ids=...&dag_ids=...&dag_ids=... >>>>> >>>>> How about this as an additional endpoint: >>>>> >>>>> POST /dags/-/dagRuns/2020-01-01/taskInstances/list >>>>> >>>>> dag_ids=...&dag_ids=...&dag_ids... >>>>> >>>> >>>> Very sensible proposal. The URL max length very much depends on >the client >>>> (especially when it is run from browser). where POST is virtually >>>> unlimited. It's common thing to use POST in this case. >>>> >>> >>> I'm not sure ... I think the following conventions are most commonly >used >>> GET /items/ - fetch list of items >>> POST /items/ - create item >>> >>> Google recommends using the batch API endpoint in these cases. This >>> involves sending a single HTTP request using the POST method, but >the >>> request body has many HTTP requests for multiple collections. >>> >>> POST /batch/v1 HTTP/1.1 >>> Authorization: Bearer your_auth_token >>> Content-Type: multipart/mixed; boundary=batch_foobarbaz >>> Content-Length: total_content_length >>> >>> --batch_foobarbaz >>> Content-Type: application/http >>> Content-ID: <item1:12930...@barnyard.example.com> >>> >>> GET /dag/DAG_A/taskInstance/ >>> >>> --batch_foobarbaz-- >>> >>> GET /dag/DAG_B/taskInstance/ >>> >>> --batch_foobarbaz-- >>> >>> GET /dag/DAG_B/taskInstance/ >>> >>> --batch_foobarbaz-- >>> >>> Here is more information: >https://developers.google.com/gmail/api/guides/batch >>> >>> We can, of course, come up with our approach and not rely on the >idea >>> of Google solutions. I refer to Google because I have experience >with >>> API and design principles. I would like this API to be permanent >and >>> there was no need to introduce breaking changes. It is always >possible >>> to add filtering and endpoint later, but removing it is a breaking >>> change. >>> >>> Kubernetes doesn't provide batch API and the user build their >>> solutions without it. >>> >>> I still looked at how Microsoft implements this. >>> https://docs.microsoft.com/en-us/graph/json-batching >>> >>> I will only quote a fragment that can be valuable to us. This only >>> convinces me that if we don't have Batch API then we should >carefully >>> add the filtering option in the query string. >>> >>>> Bypassing URL length limitations with batching >>>> >>>> An additional use case for JSON batching is to bypass URL length >limitations. In cases >>>> where the filter clause is complex, the URL length might surpass >limitations built into >>>> browsers or other HTTP clients. You can use >>>> JSON batching as a workaround for running these requests because >the >>>> lengthy URL simply becomes part of the request payload. >>> >>> >>> >>> >>> >>>> >>>>> >>>>> (The /list suffix isn't needed here as there is no POST for >>>>> taskInstances, but there is in other places, so making it >explicit means >>>>> we can use the same pattern for, say `POST /dags/-/dagRuns/list`) >>>>> >>>>> >>>> >>>>> >>>>> If it is valid to pass just one of ExecutionDateStartRange and >>>>> ExecutionDateEndRange could we call it ExecutionDateGTE etc. (for >>>>> Greater-than-or-equal) so it makes more sense when a single >parameter is >>>>> providerd. >>>>> >>>>> +1 >>>> >>>> >>>> >>>>> A minor style point: The filter/query string fields are all >defined as >>>>> BumpyCase, but I'd prefer us to use snake_case >(`execution_date_gte` >>>>> etc) - the only sort of applicable proposal I can find is >>>>> https://aip.dev/140 >>>>> >>>>> >>>> Yeah. In this case I also like snakes better than Bumpys or >Camels. >>>> Especially python ;). >>>> >>>> >>>>> (As in I can't find anything in Google recs that talk about case >for >>>>> case of query string/filter params. I guess they suggest having >it all >>>>> as a single `?filter=` param :- https://aip.dev/160 ) >>>>> >>>> >>>> Yeah. I think I'd also want to understand better how filter >examples >>>> woudl look like. It's not obvious from the spec (no time to dig >deeply). >>>> >>>> J.