On Tue, May 12, 2020 at 3:49 PM Jarek Potiuk <jarek.pot...@polidea.com> wrote:
>
> My 3 cents:
>
>
> > But on reading Google's https://aip.dev/159 that now makes more sense,
> > and that isn't what you were suggesting, but instaed a single, litteral
> > `-` to mean "any dag id". Is this correct?
> >
>
> That's also my understanding.
>
>
> > So I think then my only ask is that we have a `dag_ids` parameter that
> > we can use to filter on :)
> >
>
> Yep. This is definitely needed.
>
> Additionally/related when selecting across multiple DAGs we would very
> > quickly run in to that was fixed in github.com/apache/airflow/pull/7364
> > -- where asking for all the task stats for the DAGs on a single page
> > exceeded the HTTP request line limit.
> >
> > i.e. this could break if the DAG ids are large, or if they is a large
> > number of dags to be fetched.
> >
> >   GET
> > /dags/-/dagRuns/2020-01-01T00/taskInstances?dag_ids=...&dag_ids=...&dag_ids=...
> >
> > How about this as an additional endpoint:
> >
> >   POST /dags/-/dagRuns/2020-01-01/taskInstances/list
> >
> >   dag_ids=...&dag_ids=...&dag_ids...
> >
>
> Very sensible proposal. The URL max length very much depends on the client
> (especially when it is run from browser). where POST is virtually
> unlimited. It's common thing to use POST in this case.
>

I'm not sure ... I think the following conventions are most commonly used
GET /items/ - fetch list of items
POST /items/ - create item

Google recommends using the batch API endpoint in these cases. This
involves sending a single HTTP request using the POST method, but the
request body has many HTTP requests for multiple collections.

POST /batch/v1 HTTP/1.1
Authorization: Bearer your_auth_token
Content-Type: multipart/mixed; boundary=batch_foobarbaz
Content-Length: total_content_length

--batch_foobarbaz
Content-Type: application/http
Content-ID: <item1:12930...@barnyard.example.com>

GET /dag/DAG_A/taskInstance/

--batch_foobarbaz--

GET /dag/DAG_B/taskInstance/

--batch_foobarbaz--

GET /dag/DAG_B/taskInstance/

--batch_foobarbaz--

Here is more information: https://developers.google.com/gmail/api/guides/batch

We can, of course, come up with our approach and not rely on the idea
of Google solutions. I refer to Google because I have experience with
API and design principles.   I would like this API to be permanent and
there was no need to introduce breaking changes. It is always possible
to add filtering and endpoint later, but removing it is a breaking
change.

Kubernetes doesn't provide batch API and the user build their
solutions without it.

I still looked at how Microsoft implements this.
https://docs.microsoft.com/en-us/graph/json-batching

I will only quote a fragment that can be valuable to us. This only
convinces me that if we don't have Batch API then we should carefully
add the filtering option in the query string.

> Bypassing URL length limitations with batching
>
> An additional use case for JSON batching is to bypass URL length limitations. 
> In cases
> where the filter clause is complex, the URL length might surpass limitations 
> built into
> browsers or other HTTP clients. You can use
> JSON batching as a workaround for running these requests because the
> lengthy URL simply becomes part of the request payload.




>
> >
> > (The /list suffix isn't needed here as there is no POST for
> > taskInstances, but there is in other places, so making it explicit means
> > we can use the same pattern for, say `POST /dags/-/dagRuns/list`)
> >
> >
>
> >
> > If it is valid to pass just one of ExecutionDateStartRange and
> > ExecutionDateEndRange could we call it ExecutionDateGTE etc. (for
> > Greater-than-or-equal) so it makes more sense when a single parameter is
> > providerd.
> >
> > +1
>
>
> > A minor style point: The filter/query string fields are all defined as
> > BumpyCase, but I'd prefer us to use snake_case (`execution_date_gte`
> > etc) - the only sort of applicable proposal I can find is
> > https://aip.dev/140
> >
> >
> Yeah. In this case I also like snakes better than Bumpys or Camels.
> Especially python ;).
>
>
> > (As in I can't find anything in Google recs that talk about case for
> > case of query string/filter params. I guess they suggest having it all
> > as a single `?filter=` param :- https://aip.dev/160 )
> >
>
> Yeah. I think I'd also want to understand better how filter examples
> woudl look like. It's not obvious from the spec (no time to dig deeply).
>
> J.

Reply via email to