Re: API spec questions

Kamil Breguła Tue, 12 May 2020 10:01:22 -0700

Hello,

It depends on the specific implementation of the Batch API. The
Microsoft API can be used with requests.
https://docs.microsoft.com/en-us/graph/json-batching


I would not like to add new endpoints to the API due to premature
optimization.  Adding a new endpoint - /dags/{dag_id}/dagRuns/lisst/
s opens up many questions.
1. We currently have an endpoint that has a similar URL -
/dags/{dag_id}/dagRuns/{execution_date}/  so we have a collision.
/dags/{dag_id}/dagRuns/list/ can be interpreter as two way:
1. /dags/{dag_id}/dagRuns/list/
2. /dags/{dag_id}/dagRuns/{execution_date}/ with execution_date = "list"
First, the endpoint URL pattern is matched, and then the parameters
are verified in the next step.

2. This will also make it difficult to use the API, since DAG_ID will
appear in several places, which can cause confusion First in the URL
with the wildcard - "~" (dag_id) and the second time in body requests
(dag_iids).
POST /dags/{dag_id}/dagRuns/list/

dag_ids=DAG_A&dag_ids=DAG_B&&dag_ids=DAG_B

3.  We will have two endpoints that will have similar behavior.
GET /dags/{dag_id}/dagRuns/
POST /dags/{dag_id}/dagRuns/list/
This will cause confusion among users.

Are there any reasons why you should add the dag_ids filter in
addition to performance?  Currently, the user can realize his use case
and this is probably the most important thing.

Best regards

On Tue, May 12, 2020 at 6:34 PM Ash Berlin-Taylor <[email protected]> wrote:
>
> Such a batch endpoint is much much harder for API clients to build requests 
> for, and consume (you can no longer just use cURL/requests/any http client), 
> so I'm not a fan of that
>
> On 12 May 2020 17:07:12 BST, "Kamil Breguła" <[email protected]> 
> wrote:
>>
>> On Tue, May 12, 2020 at 3:49 PM Jarek Potiuk <[email protected]> 
>> wrote:
>>>
>>>
>>>  My 3 cents:
>>>
>>>
>>>>  But on reading Google's https://aip.dev/159 that now makes more sense,
>>>>  and that isn't what you were suggesting, but instaed a single, litteral
>>>>  `-` to mean "any dag id". Is this correct?
>>>>
>>>
>>>  That's also my understanding.
>>>
>>>
>>>>  So I think then my only ask is that we have a `dag_ids` parameter that
>>>>  we can use to filter on :)
>>>>
>>>
>>>  Yep. This is definitely needed.
>>>
>>>  Additionally/related when selecting across multiple DAGs we would very
>>>>
>>>>  quickly run in to that was fixed in github.com/apache/airflow/pull/7364
>>>>  -- where asking for all the task stats for the DAGs on a single page
>>>>  exceeded the HTTP request line limit.
>>>>
>>>>  i.e. this could break if the DAG ids are large, or if they is a large
>>>>  number of dags to be fetched.
>>>>
>>>>    GET
>>>>  
>>>> /dags/-/dagRuns/2020-01-01T00/taskInstances?dag_ids=...&dag_ids=...&dag_ids=...
>>>>
>>>>  How about this as an additional endpoint:
>>>>
>>>>    POST /dags/-/dagRuns/2020-01-01/taskInstances/list
>>>>
>>>>    dag_ids=...&dag_ids=...&dag_ids...
>>>>
>>>
>>>  Very sensible proposal. The URL max length very much depends on the client
>>>  (especially when it is run from browser). where POST is virtually
>>>  unlimited. It's common thing to use POST in this case.
>>>
>>
>> I'm not sure ... I think the following conventions are most commonly used
>> GET /items/ - fetch list of items
>> POST /items/ - create item
>>
>> Google recommends using the batch API endpoint in these cases. This
>> involves sending a single HTTP request using the POST method, but the
>> request body has many HTTP requests for multiple collections.
>>
>> POST /batch/v1 HTTP/1.1
>> Authorization: Bearer your_auth_token
>> Content-Type: multipart/mixed; boundary=batch_foobarbaz
>> Content-Length: total_content_length
>>
>> --batch_foobarbaz
>> Content-Type: application/http
>> Content-ID: <item1:[email protected]>
>>
>> GET /dag/DAG_A/taskInstance/
>>
>> --batch_foobarbaz--
>>
>> GET /dag/DAG_B/taskInstance/
>>
>> --batch_foobarbaz--
>>
>> GET /dag/DAG_B/taskInstance/
>>
>> --batch_foobarbaz--
>>
>> Here is more information: 
>> https://developers.google.com/gmail/api/guides/batch
>>
>> We can, of course, come up with our approach and not rely on the idea
>> of Google solutions. I refer to Google because I have experience with
>> API and design principles.   I would like this API to be permanent and
>> there was no need to introduce breaking changes. It is always possible
>> to add filtering and endpoint later, but removing it is a breaking
>> change.
>>
>> Kubernetes doesn't provide batch API and the user build their
>> solutions without it.
>>
>> I still looked at how Microsoft implements this.
>> https://docs.microsoft.com/en-us/graph/json-batching
>>
>> I will only quote a fragment that can be valuable to us. This only
>> convinces me that if we don't have Batch API then we should carefully
>> add the filtering option in the query string.
>>
>>>  Bypassing URL length limitations with batching
>>>
>>>  An additional use case for JSON batching is to bypass URL length 
>>> limitations. In cases
>>>  where the filter clause is complex, the URL length might surpass 
>>> limitations built into
>>>  browsers or other HTTP clients. You can use
>>>  JSON batching as a workaround for running these requests because the
>>>  lengthy URL simply becomes part of the request payload.
>>
>>
>>
>>
>>
>>>
>>>>
>>>>  (The /list suffix isn't needed here as there is no POST for
>>>>  taskInstances, but there is in other places, so making it explicit means
>>>>  we can use the same pattern for, say `POST /dags/-/dagRuns/list`)
>>>>
>>>>
>>>
>>>>
>>>>  If it is valid to pass just one of ExecutionDateStartRange and
>>>>  ExecutionDateEndRange could we call it ExecutionDateGTE etc. (for
>>>>  Greater-than-or-equal) so it makes more sense when a single parameter is
>>>>  providerd.
>>>>
>>>>  +1
>>>
>>>
>>>
>>>>  A minor style point: The filter/query string fields are all defined as
>>>>  BumpyCase, but I'd prefer us to use snake_case (`execution_date_gte`
>>>>  etc) - the only sort of applicable proposal I can find is
>>>>  https://aip.dev/140
>>>>
>>>>
>>>  Yeah. In this case I also like snakes better than Bumpys or Camels.
>>>  Especially python ;).
>>>
>>>
>>>>  (As in I can't find anything in Google recs that talk about case for
>>>>  case of query string/filter params. I guess they suggest having it all
>>>>  as a single `?filter=` param :- https://aip.dev/160 )
>>>>
>>>
>>>  Yeah. I think I'd also want to understand better how filter examples
>>>  woudl look like. It's not obvious from the spec (no time to dig deeply).
>>>
>>>  J.

Re: API spec questions

Reply via email to