Re: API spec questions

Ash Berlin-Taylor Tue, 12 May 2020 10:39:22 -0700

1. list is not even close to a valid execution date, so I don't see that as a 
clash.


2. Respond with 400 Bad Request if URL dag id is not whatever wildcard char we 
pick. (Is ~ URL safe?) when dag_ids query param is provided.

3. Docs is the answer to that. "This is the same as GET, but allows for longer 
request bodies"

Reason I'd like such a filter is that without it out request requires N http 
requests and N db queries instead of 1. Even doing it as batch requests would 
still need N queries.

On 12 May 2020 18:00:50 BST, "Kamil Breguła" <kamil.breg...@polidea.com> wrote:
>Hello,
>
>It depends on the specific implementation of the Batch API. The
>Microsoft API can be used with requests.
>https://docs.microsoft.com/en-us/graph/json-batching
>
>I would not like to add new endpoints to the API due to premature
>optimization.  Adding a new endpoint - /dags/{dag_id}/dagRuns/lisst/
>s opens up many questions.
>1. We currently have an endpoint that has a similar URL -
>/dags/{dag_id}/dagRuns/{execution_date}/  so we have a collision.
>/dags/{dag_id}/dagRuns/list/ can be interpreter as two way:
>1. /dags/{dag_id}/dagRuns/list/
>2. /dags/{dag_id}/dagRuns/{execution_date}/ with execution_date =
>"list"
>First, the endpoint URL pattern is matched, and then the parameters
>are verified in the next step.
>
>2. This will also make it difficult to use the API, since DAG_ID will
>appear in several places, which can cause confusion First in the URL
>with the wildcard - "~" (dag_id) and the second time in body requests
>(dag_iids).
>POST /dags/{dag_id}/dagRuns/list/
>
>dag_ids=DAG_A&dag_ids=DAG_B&&dag_ids=DAG_B
>
>3.  We will have two endpoints that will have similar behavior.
>GET /dags/{dag_id}/dagRuns/
>POST /dags/{dag_id}/dagRuns/list/
>This will cause confusion among users.
>
>Are there any reasons why you should add the dag_ids filter in
>addition to performance?  Currently, the user can realize his use case
>and this is probably the most important thing.
>
>Best regards
>
>On Tue, May 12, 2020 at 6:34 PM Ash Berlin-Taylor <a...@apache.org>
>wrote:
>>
>> Such a batch endpoint is much much harder for API clients to build
>requests for, and consume (you can no longer just use cURL/requests/any
>http client), so I'm not a fan of that
>>
>> On 12 May 2020 17:07:12 BST, "Kamil Breguła"
><kamil.breg...@polidea.com> wrote:
>>>
>>> On Tue, May 12, 2020 at 3:49 PM Jarek Potiuk
><jarek.pot...@polidea.com> wrote:
>>>>
>>>>
>>>>  My 3 cents:
>>>>
>>>>
>>>>>  But on reading Google's https://aip.dev/159 that now makes more
>sense,
>>>>>  and that isn't what you were suggesting, but instaed a single,
>litteral
>>>>>  `-` to mean "any dag id". Is this correct?
>>>>>
>>>>
>>>>  That's also my understanding.
>>>>
>>>>
>>>>>  So I think then my only ask is that we have a `dag_ids` parameter
>that
>>>>>  we can use to filter on :)
>>>>>
>>>>
>>>>  Yep. This is definitely needed.
>>>>
>>>>  Additionally/related when selecting across multiple DAGs we would
>very
>>>>>
>>>>>  quickly run in to that was fixed in
>github.com/apache/airflow/pull/7364
>>>>>  -- where asking for all the task stats for the DAGs on a single
>page
>>>>>  exceeded the HTTP request line limit.
>>>>>
>>>>>  i.e. this could break if the DAG ids are large, or if they is a
>large
>>>>>  number of dags to be fetched.
>>>>>
>>>>>    GET
>>>>> 
>/dags/-/dagRuns/2020-01-01T00/taskInstances?dag_ids=...&dag_ids=...&dag_ids=...
>>>>>
>>>>>  How about this as an additional endpoint:
>>>>>
>>>>>    POST /dags/-/dagRuns/2020-01-01/taskInstances/list
>>>>>
>>>>>    dag_ids=...&dag_ids=...&dag_ids...
>>>>>
>>>>
>>>>  Very sensible proposal. The URL max length very much depends on
>the client
>>>>  (especially when it is run from browser). where POST is virtually
>>>>  unlimited. It's common thing to use POST in this case.
>>>>
>>>
>>> I'm not sure ... I think the following conventions are most commonly
>used
>>> GET /items/ - fetch list of items
>>> POST /items/ - create item
>>>
>>> Google recommends using the batch API endpoint in these cases. This
>>> involves sending a single HTTP request using the POST method, but
>the
>>> request body has many HTTP requests for multiple collections.
>>>
>>> POST /batch/v1 HTTP/1.1
>>> Authorization: Bearer your_auth_token
>>> Content-Type: multipart/mixed; boundary=batch_foobarbaz
>>> Content-Length: total_content_length
>>>
>>> --batch_foobarbaz
>>> Content-Type: application/http
>>> Content-ID: <item1:12930...@barnyard.example.com>
>>>
>>> GET /dag/DAG_A/taskInstance/
>>>
>>> --batch_foobarbaz--
>>>
>>> GET /dag/DAG_B/taskInstance/
>>>
>>> --batch_foobarbaz--
>>>
>>> GET /dag/DAG_B/taskInstance/
>>>
>>> --batch_foobarbaz--
>>>
>>> Here is more information:
>https://developers.google.com/gmail/api/guides/batch
>>>
>>> We can, of course, come up with our approach and not rely on the
>idea
>>> of Google solutions. I refer to Google because I have experience
>with
>>> API and design principles.   I would like this API to be permanent
>and
>>> there was no need to introduce breaking changes. It is always
>possible
>>> to add filtering and endpoint later, but removing it is a breaking
>>> change.
>>>
>>> Kubernetes doesn't provide batch API and the user build their
>>> solutions without it.
>>>
>>> I still looked at how Microsoft implements this.
>>> https://docs.microsoft.com/en-us/graph/json-batching
>>>
>>> I will only quote a fragment that can be valuable to us. This only
>>> convinces me that if we don't have Batch API then we should
>carefully
>>> add the filtering option in the query string.
>>>
>>>>  Bypassing URL length limitations with batching
>>>>
>>>>  An additional use case for JSON batching is to bypass URL length
>limitations. In cases
>>>>  where the filter clause is complex, the URL length might surpass
>limitations built into
>>>>  browsers or other HTTP clients. You can use
>>>>  JSON batching as a workaround for running these requests because
>the
>>>>  lengthy URL simply becomes part of the request payload.
>>>
>>>
>>>
>>>
>>>
>>>>
>>>>>
>>>>>  (The /list suffix isn't needed here as there is no POST for
>>>>>  taskInstances, but there is in other places, so making it
>explicit means
>>>>>  we can use the same pattern for, say `POST /dags/-/dagRuns/list`)
>>>>>
>>>>>
>>>>
>>>>>
>>>>>  If it is valid to pass just one of ExecutionDateStartRange and
>>>>>  ExecutionDateEndRange could we call it ExecutionDateGTE etc. (for
>>>>>  Greater-than-or-equal) so it makes more sense when a single
>parameter is
>>>>>  providerd.
>>>>>
>>>>>  +1
>>>>
>>>>
>>>>
>>>>>  A minor style point: The filter/query string fields are all
>defined as
>>>>>  BumpyCase, but I'd prefer us to use snake_case
>(`execution_date_gte`
>>>>>  etc) - the only sort of applicable proposal I can find is
>>>>>  https://aip.dev/140
>>>>>
>>>>>
>>>>  Yeah. In this case I also like snakes better than Bumpys or
>Camels.
>>>>  Especially python ;).
>>>>
>>>>
>>>>>  (As in I can't find anything in Google recs that talk about case
>for
>>>>>  case of query string/filter params. I guess they suggest having
>it all
>>>>>  as a single `?filter=` param :- https://aip.dev/160 )
>>>>>
>>>>
>>>>  Yeah. I think I'd also want to understand better how filter
>examples
>>>>  woudl look like. It's not obvious from the spec (no time to dig
>deeply).
>>>>
>>>>  J.

Re: API spec questions

Reply via email to