Re: HTTP client API

Tobias Thierer Mon, 31 Oct 2016 11:14:37 -0700

Hi Michael -

thanks a lot for your response! Further comments below.

How would you rank simplicity, versatility and performance as goals for the
HTTP Client API? For example the blocking vs. non-blocking API choices will
largely depend on trade-offs, so I'd like to understand the target
applications and requirements that these trade-offs were based on.

On Fri, Oct 28, 2016 at 12:28 PM, Michael McMahon <
michael.x.mcma...@oracle.com> wrote:

>
> The intention behind those interfaces is primarily to provide a conversion
> between ByteBuffers and higher level objects that users
> are interested in. So, HttpRequest.BodyProcessor generates ByteBuffers for
> output and HttpResponse.ByteBuffer consumes them
> on input. For response bodies, there is this additional implicit
> requirement of needing to do something with the incoming data
> and that is why the option of discarding it exists (the details of that
> API are certainly open to discussion)
>
> On Publisher/Subscriber (Flow) versus InputStream/OutputStream, we wanted
> a non blocking API
>

I can understand the desire to have a non blocking API, especially if
performance is seen as more important for the API than usability. Cronet
(the Chromium networking stack) API [documentation
<https://chromium.googlesource.com/chromium/src/+/lkgr/components/cronet>]
has a

    UrlRequest.read(ByteBuffer) <-> callback.onReadCompleted(UrlRequest,
UrlResponseInfo, ByteBuffer)

call cycle that is not too different from OpenJDK 9 HTTP client's

   subscription.request(1) <-> BodyProcessor.onNext(ByteBuffer)

so the general idea is shared by others (I'm meeting with some Cronet folks
later this week so I hope to gather their input / views soon). Note that in
Cronet, the ByteBuffer is provided as part of each call requesting more
data, whereas in OpenJDK 9 it's returned once BodyProcessor.getBuffer().

At the same time, the first and third goal mentioned in JEP 110
<http://openjdk.java.net/jeps/110> are

   - *Must be easy to use for common cases, including a simple blocking
   mode.*
   -
*A simple and concise API which caters for 80-90% of application needs.  *

Blocking, pull-based APIs tend to be simpler to use because (a) one can
just step through a for loop in a debugger rather than needing breakpoints,
and (b) state can be held on the stack rather than in fields [see related
blog post
<http://code-o-matic.blogspot.co.uk/2011/01/note-on-api-design-call-stack-as-source.html>],
so I'm struggling to follow the trade-offs here. Obviously, blocking APIs
have some disadvantages as well - e.g. context switching overhead in the
case of separate Threads per request. Is there any place other than JEP 110
where I could read up on principles or target applications that drove the
trade-offs for this API?

A concrete example that I tried to prototype was reading an image file
header & body from the network, potentially aborting the request depending
on some metadata in the file header; using the OpenJDK 9 HTTP client API,
this involved a ton of boilerplate. Part of the boilerplate was because I
wasn't sure if onNext() might get a partially filled ByteBuffer that might
cut off the image file header or even an individual int32 value in that
header. Is it guaranteed that the ByteBuffer returned by
BodyProcessor.getBuffer will be full on each call to onNext(ByteBuffer),
except possibly the last? If not, perhaps it should be guaranteed?

Related to this, it appears that the documentation for
Subscription.request(long
n)
<http://cr.openjdk.java.net/~michaelm/httpclient/api.1/java/util/concurrent/Flow.Subscription.html#request-long->
specifies
that *n* specifically *denotes the number of future calls* to onNext()
<http://cr.openjdk.java.net/~michaelm/httpclient/api.1/java/util/concurrent/Flow.Subscriber.html#onNext-T->.
If this wasn't the case, BodyProcessor could have defined *n* to refer to a
number of bytes (delivered as a single onNext() call if they fit into the
supplied ByteBuffer), which might have been a really neat way for the
application to control the sizes of chunks that it gets passed. For
example, an application might want to get the file header in one go, and
then afterwards process some fixed size data record for each call. Oh well.
Do you happen to have an idea for how the application could be more in
control of the chunk sizes?

> and had originally defined
> something that was quite similar to the asynchronous model used by Flow,
> but the Flow types have a nice mechanism for
> managing flow control asynchronously. So, it made sense to use the
> standard API.
>

Regarding managing flow control, I assume you're referring to
subscription.request() which I assume is tied to WINDOW_UPDATE frames in
HTTP 2? Also, I assume that Subscription.cancel() will result in a
RST_STREAM frame being sent to the server, is that correct?

Is there a recommended way for an application to limit stream concurrency (RFC
7540 section 5.1.2 <https://tools.ietf.org/html/rfc7540#section-5.1.2>)?
Example use case: An application that loads and decodes .jpeg images from
the network might be limited by how many decompressed images it can hold in
memory at a time (the application may want to re-compress the images in a
different format, e.g. for file format conversion or to send them to the
GPU in ETC2 texture compression format). Would such an application manually
maintain a limited set of "active" streams, making sure to start calling
request() only for those streams that have become active? (With a blocking
API, an application could simply use a fixed-size thread pool, or could use
a Semaphore to guard access to a limited resource, such as memory).

We're certainly open to suggestions for improved names, but I think it is
> useful to have a way to explicitly discard a response body, especially
> since HTTP/2 has an efficient way to do that.
>

Why can't the response body instead be discarded by calling
subscription.cancel() during the call to onSubscribe() (before the first
ByteBuffer of the body data has been delivered)?

On the question of push versus pull, there seems to be differing
> philosophies around this, but
> actually the publish/subscribe model provided by the Flow types has
> characteristics of both push and pull.
> It is primarily push() as in Subscriber.onNext() delivers the next buffer
> to the subscriber. But, the subscriber
> can effectively control it through the Subscription object.
>

What does "effectively control" means? As far as I can see, the subscriber
can stop calls temporarily by not calling request(), but e.g. it can not
request a minimum number of bytes to deliver (if available before EOF) on
the next call. Is that correct or am I missing something? (See above
question about whether ByteBuffers are completely filled).

There are other benefits to having the body available synchronously with
> the rest of the response object
> and it is still possible to implement arbitrary streaming of response body
> data without having to buffer it.
> A while ago I prototyped a response processor which delivers the body
> through an InputStream.
>

What's the reason that implementation got dropped? FWIW, I suspect that a
lot of applications will be tempted to use
HttpResponse.asByteArray().getBody();
if this happens, it'll be strictly worse than a potential
HttpResponse.asInputStream() because

   - it holds the entire response body in memory as a byte[], and
   - processing only starts after the last byte of the response is received
   from the network (this means that the request can't be aborted part way
   through)

2.) HttpHeaders: I love that there is a type for this abstraction! But:
>
>    -
>
>    Why is the type an interface rather than a concrete, final class?
>    Since this is a pure value type, there doesn’t seem to be much point in
>    allowing alternative implementations?
>
> You could conceive of different implementations, with different tradeoffs
> between parsing early or lazily for instance.
>

What parsing are you referring to? I don't see any parsing logic in
HttpHeadersImpl other than in firstValueAsLong. It does however have logic
that alternative implementations could get wrong / be inconsistent with,
such as

   - whether to use case insensitive comparison for header names
   - whether the Map returned from map() is unmodifiable
   - whether the List returned by allValues() is unmodifiable (it is
   modifiable in the current implementation, but I think that's a bug)
   - the behavior of firstValueAsLong when the value exists but is not
   parseable as a long

Have you considered making HttpHeaders immutable and adding a Builder
implementation? That would make it thread safe at a small cost to the
semantic weight of the API.

>    -
>
>    Do the methods other than map() really pull their weight
>    -
>
>    (provide enough value relative to the semantic API weight that they
>    introduce)?
>    -
>
>       firstValueAsLong() looks particularly suspect: why would anyone
>       care particularly about long values? Especially since the current
>       implementation seems to throw NumberFormatException rather than 
> returning
>       an empty Optional?
>
>
> Numeric header values would be fairly common I think. Long was just chosen
> as the widest possible integer type.
> NumberFormatException is only thrown if the value is present but can't be
> parsed as a number.
>

But then you'd have to document that NFE is thrown, which an application
would have been much less likely to miss if they had called Long.valueOf()
themselves. It's also not clear that throwing NFE, rather than returning an
empty Optional, is the behavior that most applications prefer.

These methods seem to add semantic weight to the API for very little
benefit. Perhaps it'd be better to leave them out, but add them in a future
revision of the API if it turns out that a lot of applications duplicate
exactly this logic?

>    -
>
>    I haven’t made up my mind about whether the existing choices are the
>    right ones / sufficient. Perhaps if this class used the typesafe enum
>    pattern from Effective Java 1st edition rather than being an actual enum,
>    the API would maintain the option in a future version to allow
>    client-configured Redirect policies, allowing Redirect for URLs as long as
>    they are within the same host/domain?
>
>
> As Anthony mentioned, there was a more complicated API originally, but it
> was felt to be too complicated, which
> was why we settled on enums (which can be extended in future).
>

The choice of enums limits the versatility because all enum instances need
to be known at compile time; for example, it'd be impossible to provide a
Redirect policy that allows redirects to a given set of hosts.

The only benefit I can see from using specifically an enum class is that
this gives us a serialization format for free. But a conversion to/from
String values for the standard policy instances (via a valueOf(String)
method or similar) would offer the same at a fairly small cost of semantic
weight.

For example, doesn't the existence of the convenience method
HttpHeaders.firstValueAsLong() have a much smaller benefit, but similar
semantic weight, to e.g. a potential non-enum Redirect.valueOf() method
(and even so, that method could be left to a future API version)?

> 4.) HttpClient.Version:
>
>    -
>
>    Why does a HttpClient need to commit to using one HTTP version or the
>    other? What if an application wants to use HTTP/2 for those servers that
>    support it, but fall back to HTTP/1.1 for those that don’t?
>
>
> Answered.
>

Given that HttpClientBuilder.version(Version) does not actually restrict
the client to only that version, should it perhaps be called
setMaximumVersion(Version), or changed to enableVersion(Version) /
disableVersion(Version), or similar?

> 5.) CookieManager
>
>    -
>
>    Is there a common interface we could add without making the API much
>    more complex to allow us both RFC 2965 (outdated, implemented by
>    CookieManager) and RFC 6265 (new, real world, actually used) cookies? Needs
>    prototyping. I think it’s likely we’ll be able to do something similar to 
> OkHttp’s
>    CookieJar
>    <https://square.github.io/okhttp/3.x/okhttp/okhttp3/CookieJar.html>
>    which can be adapted to RFC 2965 - not 100%, but close enough that most
>    implementations of CookieManager could be reused by the new HTTP API, while
>    still taking advantage of RFC 6265 cookies.
>
>
> One suggestion has been to use the low-level CookieHandler API. But,
> another option would be just to
> evolve CookieManager to the latest RFCs, which sounds a lot like what you
> are suggesting.
>

Not exactly what I was suggesting, but similar and might work. For what
it's worth, even CookieHandler mentions "Cookie2" headers in the
documentation so this would need to be changed and the backwards
compatibility risk would need to be estimated. For reference, RFC 7540
(HTTP 2) normative [COOKIE] reference is to the (new) cookie RFC 6265 which
no longer has Cookie2 / Set-Cookie2 headers.

Tobias

Re: HTTP client API

Reply via email to