RE: HTTP client API

Mark A. Claassen Tue, 29 Nov 2016 08:55:51 -0800

I realize that no one probably knows who I am, however I have been using Java 
since the 1.x days.  I have probably spoken to several of you at previous 
JavaOne’s.  I am also a member of the CAP program.


Anyway, I would like to say that I like the idea of the incubator modules.  
These things are hard to get right the first time.  The idea of having 
something “locked down” on the first release seems to raise the bar so high 
that helpful features can end up getting delayed.

Mark

Disclaimer:
The opinions provided herein do not necessarily state or reflect those of 
Donnell Systems, Inc.(DSI). DSI makes no warranty for and assumes no legal 
liability or responsibility for the posting.

From: net-dev [mailto:net-dev-boun...@openjdk.java.net] On Behalf Of Michael 
McMahon
Sent: Tuesday, November 29, 2016 10:47 AM
To: Tobias Thierer
Cc: OpenJDK Network Dev list
Subject: Re: HTTP client API

Hi Tobias,

Apologies for the long delay in responding. I'll hopefully catch up on your 
subsequent comments soon.
Just to give an overall update on the status of this work first. There is a 
plan afoot to make the JDK 9
module system support something we are calling "incubator modules". These are 
modules that
can be delivered with the JDK, but are not standard parts of the platform (Java 
SE). So, they do not
resolve by default and will have runtime/compile time warnings to indicate that 
the contained APIs
are potentially subject to change. The plan is to make this new http client API 
one of these
incubator modules in JDK 9. It will use a different package namespace to 
reflect this.

The upshot is that while we would like the API to be as complete as possible, 
it will not
be fixed or standardized in JDK 9. But by delivering the API and implementation 
with JDK 9, it
will get some exposure and use, so that we can standardize it in JDK 10, and 
move it from
its incubator module and namespace back into Java SE in the java.net.http 
package.

I've added some answers to your questions below, and will get to the followup 
messages soon.

Thanks,
Michael

On 31/10/2016, 18:13, Tobias Thierer wrote:
Hi Michael -

thanks a lot for your response! Further comments below.

How would you rank simplicity, versatility and performance as goals for the 
HTTP Client API? For example the blocking vs. non-blocking API choices will 
largely depend on trade-offs, so I'd like to understand the target applications 
and requirements that these trade-offs were based on.

To my mind, we should support both blocking and non-blocking in the API, and by 
specifying them both in
the API (rather than relying on blocking behavior being a wrapper around 
non-blocking) then we get the
potential benefits of both (reduced thread switching with blocking, versus less 
thread usage with non-blocking).


On Fri, Oct 28, 2016 at 12:28 PM, Michael McMahon 
<michael.x.mcma...@oracle.com<mailto:michael.x.mcma...@oracle.com>> wrote:

The intention behind those interfaces is primarily to provide a conversion 
between ByteBuffers and higher level objects that users
are interested in. So, HttpRequest.BodyProcessor generates ByteBuffers for 
output and HttpResponse.ByteBuffer consumes them
on input. For response bodies, there is this additional implicit requirement of 
needing to do something with the incoming data
and that is why the option of discarding it exists (the details of that API are 
certainly open to discussion)

On Publisher/Subscriber (Flow) versus InputStream/OutputStream, we wanted a non 
blocking API

I can understand the desire to have a non blocking API, especially if 
performance is seen as more important for the API than usability. Cronet (the 
Chromium networking stack) API 
[documentation<https://chromium.googlesource.com/chromium/src/+/lkgr/components/cronet>]
 has a

    UrlRequest.read(ByteBuffer) <-> callback.onReadCompleted(UrlRequest, 
UrlResponseInfo, ByteBuffer)

call cycle that is not too different from OpenJDK 9 HTTP client's

   subscription.request(1) <-> BodyProcessor.onNext(ByteBuffer)

so the general idea is shared by others (I'm meeting with some Cronet folks 
later this week so I hope to gather their input / views soon). Note that in 
Cronet, the ByteBuffer is provided as part of each call requesting more data, 
whereas in OpenJDK 9 it's returned once BodyProcessor.getBuffer().

On this point, I'd say that there is a high degree of usability with the 
proposed API when using
the standard handlers/processors. Again, the blocking variant is very easy to 
use. The non-blocking
variant with CompletableFuture is not much harder, once the programmer invests 
some effort in understanding that
model.

Implementing your own processor does require familiarity with the Flow types 
and their underlying philosophy, but I see
those cases as outside the core use cases. It's likely that additional 
processor/handlers would be added in time also.


At the same time, the first and third goal mentioned in JEP 
110<http://openjdk.java.net/jeps/110> are

  *   Must be easy to use for common cases, including a simple blocking mode.
  *   A simple and concise API which caters for 80-90% of application needs.
Blocking, pull-based APIs tend to be simpler to use because (a) one can just 
step through a for loop in a debugger rather than needing breakpoints, and (b) 
state can be held on the stack rather than in fields [see related blog 
post<http://code-o-matic.blogspot.co.uk/2011/01/note-on-api-design-call-stack-as-source.html>],
 so I'm struggling to follow the trade-offs here. Obviously, blocking APIs have 
some disadvantages as well - e.g. context switching overhead in the case of 
separate Threads per request. Is there any place other than JEP 110 where I 
could read up on principles or target applications that drove the trade-offs 
for this API?

But, I think you are talking about debugging the processor or handler there 
rather than the calling application code?


A concrete example that I tried to prototype was reading an image file header & 
body from the network, potentially aborting the request depending on some 
metadata in the file header; using the OpenJDK 9 HTTP client API, this involved 
a ton of boilerplate. Part of the boilerplate was because I wasn't sure if 
onNext() might get a partially filled ByteBuffer that might cut off the image 
file header or even an individual int32 value in that header. Is it guaranteed 
that the ByteBuffer returned by BodyProcessor.getBuffer will be full on each 
call to onNext(ByteBuffer), except possibly the last? If not, perhaps it should 
be guaranteed?

Related to this, it appears that the documentation for 
Subscription.request(long 
n)<http://cr.openjdk.java.net/%7Emichaelm/httpclient/api.1/java/util/concurrent/Flow.Subscription.html#request-long->
 specifies that n specifically denotes the number of future calls to 
onNext()<http://cr.openjdk.java.net/%7Emichaelm/httpclient/api.1/java/util/concurrent/Flow.Subscriber.html#onNext-T->.
 If this wasn't the case, BodyProcessor could have defined n to refer to a 
number of bytes (delivered as a single onNext() call if they fit into the 
supplied ByteBuffer), which might have been a really neat way for the 
application to control the sizes of chunks that it gets passed. For example, an 
application might want to get the file header in one go, and then afterwards 
process some fixed size data record for each call. Oh well. Do you happen to 
have an idea for how the application could be more in control of the chunk 
sizes?
Yes, Subscriptions relate to the number of <T> items that can be transferred 
rather than the number of bytes. There was some discussion on that exact 
question and we need to tighten that up, so that it is easier to map from the 
byte oriented flow control scheme used by HTTP/2 to the item oriented model 
that Flow uses. I think basically that BodyProcessor needs to commit to 
specific buffer sizes, and guarantee that all calls of onNext() will transfer 
that exact number of bytes in the ByteBuffer (except for the final call of a 
subscription). With that guarantee, it is possible to have a seamless mapping 
between the two flow control schemes. Whether the processor decides the buffer 
size or the http library, I'm not certain, but I think it probably needs to be 
fixed at the start of a particular request/response, rather than be allowed to 
vary.

I'd be interested to look at your prototype to see what other pain points there 
might be.



and had originally defined
something that was quite similar to the asynchronous model used by Flow, but 
the Flow types have a nice mechanism for
managing flow control asynchronously. So, it made sense to use the standard API.

Regarding managing flow control, I assume you're referring to 
subscription.request() which I assume is tied to WINDOW_UPDATE frames in HTTP 
2? Also, I assume that Subscription.cancel() will result in a RST_STREAM frame 
being sent to the server, is that correct?

Yes.

Is there a recommended way for an application to limit stream concurrency (RFC 
7540 section 5.1.2<https://tools.ietf.org/html/rfc7540#section-5.1.2>)? Example 
use case: An application that loads and decodes .jpeg images from the network 
might be limited by how many decompressed images it can hold in memory at a 
time (the application may want to re-compress the images in a different format, 
e.g. for file format conversion or to send them to the GPU in ETC2 texture 
compression format). Would such an application manually maintain a limited set 
of "active" streams, making sure to start calling request() only for those 
streams that have become active? (With a blocking API, an application could 
simply use a fixed-size thread pool, or could use a Semaphore to guard access 
to a limited resource, such as memory).

I really wouldn't use the request/response processors for anything other than 
encoding/decoding of bodies. There are a variety of ways of limiting 
request/response concurrency. As you suggest, it is obvious for the blocking 
style. For asynchronous code using CompletableFuture there are nice ways to do 
it also. In one example I wrote a small class called RequestLimiter specified 
as follows:
(it is kind of like an asynchronous version of a semaphore)

// Asynchronous semaphore. Implementation just uses a 
LinkedList<CompletableFuture<Void>>

// Very simple

class RequestLimiter {

    RequestLimiter(int concurrency);



    // returned CF completes when caller is permitted to send request.

    // 'concurrency' requests are allowed simultaneously

    synchronized CompletableFuture<Void> obtainPermit();



    // signals that a response has been received and processed

    // permitting another request to start

    synchronized void returnPermit();

}

// The calling code might look as follows



{

    HttpClient client = ... // build the client

    int REQUESTS = 1000; // say

    int CONCURRENCY = 10; // say

    RequestLimiter limiter = new RequestLimiter(CONCURRENCY);



    CompletableFuture<Path> futures[] = new CompletableFuture[REQUESTS];



    for (int i=0; i<REQUESTS; i++) {

        HttpRequest request = .... // build the request

        Path path = Paths.get("/someroot", request.uri().getPath());

        futures[i] = limiter

            .obtainPermit()

            .thenCompose((v)-> client.sendAsync(request, 
ResponseBodyHandler.asFile(path)))

            .thenApply((HttpResponse<Path> response) -> {

                limiter.returnPermit();

                return response.body();

            })

    }

    // wait for all to complete

    CompletableFuture.allOf(futures).join();
We're certainly open to suggestions for improved names, but I think it is 
useful to have a way to explicitly discard a response body, especially since 
HTTP/2 has an efficient way to do that.

Why can't the response body instead be discarded by calling 
subscription.cancel() during the call to onSubscribe() (before the first 
ByteBuffer of the body data has been delivered)?

That would have a similar effect, except that the subscription wouldn't 
normally be visible to the user of the response processor,
but some processor implementations could expose that capability if it makes 
sense. For instance, a processor which returns
the response body as an InputStream would map InputStream.close() to 
subscription.cancel()


On the question of push versus pull, there seems to be differing philosophies 
around this, but
actually the publish/subscribe model provided by the Flow types has 
characteristics of both push and pull.
It is primarily push() as in Subscriber.onNext() delivers the next buffer to 
the subscriber. But, the subscriber
can effectively control it through the Subscription object.

What does "effectively control" means? As far as I can see, the subscriber can 
stop calls temporarily by not calling request(), but e.g. it can not request a 
minimum number of bytes to deliver (if available before EOF) on the next call. 
Is that correct or am I missing something? (See above question about whether 
ByteBuffers are completely filled).

Yes, I mean by regulating the calls to request() you control the rate of data 
flow and can stop it temporarily. I think
the question you raised above about minimum numbers of bytes can be addressed 
as discussed above.


There are other benefits to having the body available synchronously with the 
rest of the response object
and it is still possible to implement arbitrary streaming of response body data 
without having to buffer it.
A while ago I prototyped a response processor which delivers the body through 
an InputStream.

What's the reason that implementation got dropped? FWIW, I suspect that a lot 
of applications will be tempted to use HttpResponse.asByteArray().getBody(); if 
this happens, it'll be strictly worse than a potential 
HttpResponse.asInputStream() because

  *   it holds the entire response body in memory as a byte[], and
  *   processing only starts after the last byte of the response is received 
from the network (this means that the request can't be aborted part way through)

But, if you know in advance that the response body will be small then I don't 
see the problem with reading it all in
to a byte[] or String, or some other application defined type. If there is a 
possibility that response bodies might be very
large or unbounded, then it would be better to use a response processor that 
streams the data,



2.) HttpHeaders: I love that there is a type for this abstraction! But:

·         Why is the type an interface rather than a concrete, final class? 
Since this is a pure value type, there doesn’t seem to be much point in 
allowing alternative implementations?
You could conceive of different implementations, with different tradeoffs 
between parsing early or lazily for instance.

What parsing are you referring to? I don't see any parsing logic in 
HttpHeadersImpl other than in firstValueAsLong. It does however have logic that 
alternative implementations could get wrong / be inconsistent with, such as

  *   whether to use case insensitive comparison for header names
  *   whether the Map returned from map() is unmodifiable
  *   whether the List returned by allValues() is unmodifiable (it is 
modifiable in the current implementation, but I think that's a bug)
  *   the behavior of firstValueAsLong when the value exists but is not 
parseable as a long
Have you considered making HttpHeaders immutable and adding a Builder 
implementation? That would make it thread safe at a small cost to the semantic 
weight of the API.

With HTTP/1 you can read the headers in as a blob of bytes and then just go 
looking for headers on demand, or you can parse
the entire header block up front.  We should certainly make sure that we don't 
preclude doing what you suggest here
that HttpHeaders might be buildable and an immutable class.



·         Do the methods other than map() really pull their weight

·         (provide enough value relative to the semantic API weight that they 
introduce)?

o    firstValueAsLong() looks particularly suspect: why would anyone care 
particularly about long values? Especially since the current implementation 
seems to throw NumberFormatException rather than returning an empty Optional?

Numeric header values would be fairly common I think. Long was just chosen as 
the widest possible integer type.
NumberFormatException is only thrown if the value is present but can't be 
parsed as a number.

But then you'd have to document that NFE is thrown, which an application would 
have been much less likely to miss if they had called Long.valueOf() 
themselves. It's also not clear that throwing NFE, rather than returning an 
empty Optional, is the behavior that most applications prefer.

NFE is an unchecked exception. So, I don't think there is much overhead. If you 
are reading a header that is supposed to be
a numeric value I think NFE is reasonable. That is effectively a protocol error.

These methods seem to add semantic weight to the API for very little benefit. 
Perhaps it'd be better to leave them out, but add them in a future revision of 
the API if it turns out that a lot of applications duplicate exactly this logic?


·         I haven’t made up my mind about whether the existing choices are the 
right ones / sufficient. Perhaps if this class used the typesafe enum pattern 
from Effective Java 1st edition rather than being an actual enum, the API would 
maintain the option in a future version to allow client-configured Redirect 
policies, allowing Redirect for URLs as long as they are within the same 
host/domain?

As Anthony mentioned, there was a more complicated API originally, but it was 
felt to be too complicated, which
was why we settled on enums (which can be extended in future).

The choice of enums limits the versatility because all enum instances need to 
be known at compile time; for example, it'd be impossible to provide a Redirect 
policy that allows redirects to a given set of hosts.

The only benefit I can see from using specifically an enum class is that this 
gives us a serialization format for free. But a conversion to/from String 
values for the standard policy instances (via a valueOf(String) method or 
similar) would offer the same at a fairly small cost of semantic weight.

For example, doesn't the existence of the convenience method 
HttpHeaders.firstValueAsLong() have a much smaller benefit, but similar 
semantic weight, to e.g. a potential non-enum Redirect.valueOf() method (and 
even so, that method could be left to a future API version)?

4.) HttpClient.Version:

·         Why does a HttpClient need to commit to using one HTTP version or the 
other? What if an application wants to use HTTP/2 for those servers that 
support it, but fall back to HTTP/1.1 for those that don’t?

Answered.

Given that HttpClientBuilder.version(Version) does not actually restrict the 
client to only that version, should it perhaps be called 
setMaximumVersion(Version), or changed to enableVersion(Version) / 
disableVersion(Version), or similar?
Perhaps, requestedVersion() ?


5.) CookieManager

·         Is there a common interface we could add without making the API much 
more complex to allow us both RFC 2965 (outdated, implemented by CookieManager) 
and RFC 6265 (new, real world, actually used) cookies? Needs prototyping. I 
think it’s likely we’ll be able to do something similar to OkHttp’s 
CookieJar<https://square.github.io/okhttp/3.x/okhttp/okhttp3/CookieJar.html> 
which can be adapted to RFC 2965 - not 100%, but close enough that most 
implementations of CookieManager could be reused by the new HTTP API, while 
still taking advantage of RFC 6265 cookies.

One suggestion has been to use the low-level CookieHandler API. But, another 
option would be just to
evolve CookieManager to the latest RFCs, which sounds a lot like what you are 
suggesting.

Not exactly what I was suggesting, but similar and might work. For what it's 
worth, even CookieHandler mentions "Cookie2" headers in the documentation so 
this would need to be changed and the backwards compatibility risk would need 
to be estimated. For reference, RFC 7540 (HTTP 2) normative [COOKIE] reference 
is to the (new) cookie RFC 6265 which no longer has Cookie2 / Set-Cookie2 
headers.

Tobias

RE: HTTP client API

Reply via email to