Re: API spec questions

Ash Berlin-Taylor Tue, 12 May 2020 09:35:24 -0700

Such a batch endpoint is much much harder for API clients to build requests 
for, and consume (you can no longer just use cURL/requests/any http client), so 
I'm not a fan of that


On 12 May 2020 17:07:12 BST, "Kamil Breguła" <[email protected]> wrote:
>On Tue, May 12, 2020 at 3:49 PM Jarek Potiuk <[email protected]>
>wrote:
>>
>> My 3 cents:
>>
>>
>> > But on reading Google's https://aip.dev/159 that now makes more
>sense,
>> > and that isn't what you were suggesting, but instaed a single,
>litteral
>> > `-` to mean "any dag id". Is this correct?
>> >
>>
>> That's also my understanding.
>>
>>
>> > So I think then my only ask is that we have a `dag_ids` parameter
>that
>> > we can use to filter on :)
>> >
>>
>> Yep. This is definitely needed.
>>
>> Additionally/related when selecting across multiple DAGs we would
>very
>> > quickly run in to that was fixed in
>github.com/apache/airflow/pull/7364
>> > -- where asking for all the task stats for the DAGs on a single
>page
>> > exceeded the HTTP request line limit.
>> >
>> > i.e. this could break if the DAG ids are large, or if they is a
>large
>> > number of dags to be fetched.
>> >
>> >   GET
>> >
>/dags/-/dagRuns/2020-01-01T00/taskInstances?dag_ids=...&dag_ids=...&dag_ids=...
>> >
>> > How about this as an additional endpoint:
>> >
>> >   POST /dags/-/dagRuns/2020-01-01/taskInstances/list
>> >
>> >   dag_ids=...&dag_ids=...&dag_ids...
>> >
>>
>> Very sensible proposal. The URL max length very much depends on the
>client
>> (especially when it is run from browser). where POST is virtually
>> unlimited. It's common thing to use POST in this case.
>>
>
>I'm not sure ... I think the following conventions are most commonly
>used
>GET /items/ - fetch list of items
>POST /items/ - create item
>
>Google recommends using the batch API endpoint in these cases. This
>involves sending a single HTTP request using the POST method, but the
>request body has many HTTP requests for multiple collections.
>
>POST /batch/v1 HTTP/1.1
>Authorization: Bearer your_auth_token
>Content-Type: multipart/mixed; boundary=batch_foobarbaz
>Content-Length: total_content_length
>
>--batch_foobarbaz
>Content-Type: application/http
>Content-ID: <item1:[email protected]>
>
>GET /dag/DAG_A/taskInstance/
>
>--batch_foobarbaz--
>
>GET /dag/DAG_B/taskInstance/
>
>--batch_foobarbaz--
>
>GET /dag/DAG_B/taskInstance/
>
>--batch_foobarbaz--
>
>Here is more information:
>https://developers.google.com/gmail/api/guides/batch
>
>We can, of course, come up with our approach and not rely on the idea
>of Google solutions. I refer to Google because I have experience with
>API and design principles.   I would like this API to be permanent and
>there was no need to introduce breaking changes. It is always possible
>to add filtering and endpoint later, but removing it is a breaking
>change.
>
>Kubernetes doesn't provide batch API and the user build their
>solutions without it.
>
>I still looked at how Microsoft implements this.
>https://docs.microsoft.com/en-us/graph/json-batching
>
>I will only quote a fragment that can be valuable to us. This only
>convinces me that if we don't have Batch API then we should carefully
>add the filtering option in the query string.
>
>> Bypassing URL length limitations with batching
>>
>> An additional use case for JSON batching is to bypass URL length
>limitations. In cases
>> where the filter clause is complex, the URL length might surpass
>limitations built into
>> browsers or other HTTP clients. You can use
>> JSON batching as a workaround for running these requests because the
>> lengthy URL simply becomes part of the request payload.
>
>
>
>
>>
>> >
>> > (The /list suffix isn't needed here as there is no POST for
>> > taskInstances, but there is in other places, so making it explicit
>means
>> > we can use the same pattern for, say `POST /dags/-/dagRuns/list`)
>> >
>> >
>>
>> >
>> > If it is valid to pass just one of ExecutionDateStartRange and
>> > ExecutionDateEndRange could we call it ExecutionDateGTE etc. (for
>> > Greater-than-or-equal) so it makes more sense when a single
>parameter is
>> > providerd.
>> >
>> > +1
>>
>>
>> > A minor style point: The filter/query string fields are all defined
>as
>> > BumpyCase, but I'd prefer us to use snake_case
>(`execution_date_gte`
>> > etc) - the only sort of applicable proposal I can find is
>> > https://aip.dev/140
>> >
>> >
>> Yeah. In this case I also like snakes better than Bumpys or Camels.
>> Especially python ;).
>>
>>
>> > (As in I can't find anything in Google recs that talk about case
>for
>> > case of query string/filter params. I guess they suggest having it
>all
>> > as a single `?filter=` param :- https://aip.dev/160 )
>> >
>>
>> Yeah. I think I'd also want to understand better how filter examples
>> woudl look like. It's not obvious from the spec (no time to dig
>deeply).
>>
>> J.

Re: API spec questions

Reply via email to