Hi Tobias,
Apologies for the long delay in responding. I'll hopefully catch up on
your subsequent comments soon.
Just to give an overall update on the status of this work first. There
is a plan afoot to make the JDK 9
module system support something we are calling "incubator modules".
These are modules that
can be delivered with the JDK, but are not standard parts of the
platform (Java SE). So, they do not
resolve by default and will have runtime/compile time warnings to
indicate that the contained APIs
are potentially subject to change. The plan is to make this new http
client API one of these
incubator modules in JDK 9. It will use a different package namespace to
reflect this.
The upshot is that while we would like the API to be as complete as
possible, it will not
be fixed or standardized in JDK 9. But by delivering the API and
implementation with JDK 9, it
will get some exposure and use, so that we can standardize it in JDK 10,
and move it from
its incubator module and namespace back into Java SE in the
java.net.http package.
I've added some answers to your questions below, and will get to the
followup messages soon.
Thanks,
Michael
On 31/10/2016, 18:13, Tobias Thierer wrote:
Hi Michael -
thanks a lot for your response! Further comments below.
How would you rank simplicity, versatility and performance as goals
for the HTTP Client API? For example the blocking vs. non-blocking API
choices will largely depend on trade-offs, so I'd like to understand
the target applications and requirements that these trade-offs were
based on.
To my mind, we should support both blocking and non-blocking in the API,
and by specifying them both in
the API (rather than relying on blocking behavior being a wrapper around
non-blocking) then we get the
potential benefits of both (reduced thread switching with blocking,
versus less thread usage with non-blocking).
On Fri, Oct 28, 2016 at 12:28 PM, Michael McMahon
<michael.x.mcma...@oracle.com <mailto:michael.x.mcma...@oracle.com>>
wrote:
The intention behind those interfaces is primarily to provide a
conversion between ByteBuffers and higher level objects that users
are interested in. So, HttpRequest.BodyProcessor generates
ByteBuffers for output and HttpResponse.ByteBuffer consumes them
on input. For response bodies, there is this additional implicit
requirement of needing to do something with the incoming data
and that is why the option of discarding it exists (the details of
that API are certainly open to discussion)
On Publisher/Subscriber (Flow) versus InputStream/OutputStream, we
wanted a non blocking API
I can understand the desire to have a non blocking API, especially if
performance is seen as more important for the API than usability.
Cronet (the Chromium networking stack) API [documentation
<https://chromium.googlesource.com/chromium/src/+/lkgr/components/cronet>]
has a
UrlRequest.read(ByteBuffer) <->
callback.onReadCompleted(UrlRequest, UrlResponseInfo, ByteBuffer)
call cycle that is not too different from OpenJDK 9 HTTP client's
subscription.request(1) <-> BodyProcessor.onNext(ByteBuffer)
so the general idea is shared by others (I'm meeting with some Cronet
folks later this week so I hope to gather their input / views soon).
Note that in Cronet, the ByteBuffer is provided as part of each call
requesting more data, whereas in OpenJDK 9 it's returned once
BodyProcessor.getBuffer().
On this point, I'd say that there is a high degree of usability with the
proposed API when using
the standard handlers/processors. Again, the blocking variant is very
easy to use. The non-blocking
variant with CompletableFuture is not much harder, once the programmer
invests some effort in understanding that
model.
Implementing your own processor does require familiarity with the Flow
types and their underlying philosophy, but I see
those cases as outside the core use cases. It's likely that additional
processor/handlers would be added in time also.
At the same time, the first and third goal mentioned in JEP 110
<http://openjdk.java.net/jeps/110> are
* /Must be easy to use for common cases, including a simple blocking
mode./
* /A simple and concise API which caters for 80-90% of application
needs.
/
Blocking, pull-based APIs tend to be simpler to use because (a) one
can just step through a for loop in a debugger rather than needing
breakpoints, and (b) state can be held on the stack rather than in
fields [see related blog post
<http://code-o-matic.blogspot.co.uk/2011/01/note-on-api-design-call-stack-as-source.html>],
so I'm struggling to follow the trade-offs here. Obviously, blocking
APIs have some disadvantages as well - e.g. context switching overhead
in the case of separate Threads per request. Is there any place other
than JEP 110 where I could read up on principles or target
applications that drove the trade-offs for this API?
But, I think you are talking about debugging the processor or handler
there rather than the calling application code?
A concrete example that I tried to prototype was reading an image file
header & body from the network, potentially aborting the request
depending on some metadata in the file header; using the OpenJDK 9
HTTP client API, this involved a ton of boilerplate. Part of the
boilerplate was because I wasn't sure if onNext() might get a
partially filled ByteBuffer that might cut off the image file header
or even an individual int32 value in that header. Is it guaranteed
that the ByteBuffer returned by BodyProcessor.getBuffer will be full
on each call to onNext(ByteBuffer), except possibly the last? If not,
perhaps it should be guaranteed?
Related to this, it appears that the documentation for
Subscription.request(long n)
<http://cr.openjdk.java.net/%7Emichaelm/httpclient/api.1/java/util/concurrent/Flow.Subscription.html#request-long-> specifies
that /n/ specifically /denotes the number of future calls/ to onNext()
<http://cr.openjdk.java.net/%7Emichaelm/httpclient/api.1/java/util/concurrent/Flow.Subscriber.html#onNext-T->.
If this wasn't the case, BodyProcessor could have defined /n/ to refer
to a number of bytes (delivered as a single onNext() call if they fit
into the supplied ByteBuffer), which might have been a really neat way
for the application to control the sizes of chunks that it gets
passed. For example, an application might want to get the file header
in one go, and then afterwards process some fixed size data record for
each call. Oh well. Do you happen to have an idea for how the
application could be more in control of the chunk sizes?
Yes, Subscriptions relate to the number of <T> items that can be
transferred rather than the number of bytes. There was some discussion
on that exact question and we need to tighten that up, so that it is
easier to map from the byte oriented flow control scheme used by HTTP/2
to the item oriented model that Flow uses. I think basically that
BodyProcessor needs to commit to specific buffer sizes, and guarantee
that all calls of onNext() will transfer that exact number of bytes in
the ByteBuffer (except for the final call of a subscription). With that
guarantee, it is possible to have a seamless mapping between the two
flow control schemes. Whether the processor decides the buffer size or
the http library, I'm not certain, but I think it probably needs to be
fixed at the start of a particular request/response, rather than be
allowed to vary.
I'd be interested to look at your prototype to see what other pain
points there might be.
and had originally defined
something that was quite similar to the asynchronous model used by
Flow, but the Flow types have a nice mechanism for
managing flow control asynchronously. So, it made sense to use the
standard API.
Regarding managing flow control, I assume you're referring to
subscription.request() which I assume is tied to WINDOW_UPDATE frames
in HTTP 2? Also, I assume that Subscription.cancel() will result in a
RST_STREAM frame being sent to the server, is that correct?
Yes.
Is there a recommended way for an application to limit stream
concurrency (RFC 7540 section 5.1.2
<https://tools.ietf.org/html/rfc7540#section-5.1.2>)? Example use
case: An application that loads and decodes .jpeg images from the
network might be limited by how many decompressed images it can hold
in memory at a time (the application may want to re-compress the
images in a different format, e.g. for file format conversion or to
send them to the GPU in ETC2 texture compression format). Would such
an application manually maintain a limited set of "active" streams,
making sure to start calling request() only for those streams that
have become active? (With a blocking API, an application could simply
use a fixed-size thread pool, or could use a Semaphore to guard access
to a limited resource, such as memory).
I really wouldn't use the request/response processors for anything other
than encoding/decoding of bodies. There are a variety of ways of
limiting request/response concurrency. As you suggest, it is obvious for
the blocking style. For asynchronous code using CompletableFuture there
are nice ways to do it also. In one example I wrote a small class called
RequestLimiter specified as follows:
(it is kind of like an asynchronous version of a semaphore)
// Asynchronous semaphore. Implementation just uses a
LinkedList<CompletableFuture<Void>>
// Very simple
class RequestLimiter {
RequestLimiter(int concurrency);
// returned CF completes when caller is permitted to send request.
// 'concurrency' requests are allowed simultaneously
synchronized CompletableFuture<Void> obtainPermit();
// signals that a response has been received and processed
// permitting another request to start
synchronized void returnPermit();
}
// The calling code might look as follows
{
HttpClient client = ... // build the client
int REQUESTS = 1000; // say
int CONCURRENCY = 10; // say
RequestLimiter limiter = new RequestLimiter(CONCURRENCY);
CompletableFuture<Path> futures[] = new CompletableFuture[REQUESTS];
for (int i=0; i<REQUESTS; i++) {
HttpRequest request = .... // build the request
Path path = Paths.get("/someroot", request.uri().getPath());
futures[i] = limiter
.obtainPermit()
.thenCompose((v)-> client.sendAsync(request,
ResponseBodyHandler.asFile(path)))
.thenApply((HttpResponse<Path> response) -> {
limiter.returnPermit();
return response.body();
})
}
// wait for all to complete
CompletableFuture.allOf(futures).join();
We're certainly open to suggestions for improved names, but I
think it is useful to have a way to explicitly discard a response
body, especially since HTTP/2 has an efficient way to do that.
Why can't the response body instead be discarded by calling
subscription.cancel() during the call to onSubscribe() (before the
first ByteBuffer of the body data has been delivered)?
That would have a similar effect, except that the subscription wouldn't
normally be visible to the user of the response processor,
but some processor implementations could expose that capability if it
makes sense. For instance, a processor which returns
the response body as an InputStream would map InputStream.close() to
subscription.cancel()
On the question of push versus pull, there seems to be differing
philosophies around this, but
actually the publish/subscribe model provided by the Flow types
has characteristics of both push and pull.
It is primarily push() as in Subscriber.onNext() delivers the next
buffer to the subscriber. But, the subscriber
can effectively control it through the Subscription object.
What does "effectively control" means? As far as I can see, the
subscriber can stop calls temporarily by not calling request(), but
e.g. it can not request a minimum number of bytes to deliver (if
available before EOF) on the next call. Is that correct or am I
missing something? (See above question about whether ByteBuffers are
completely filled).
Yes, I mean by regulating the calls to request() you control the rate of
data flow and can stop it temporarily. I think
the question you raised above about minimum numbers of bytes can be
addressed as discussed above.
There are other benefits to having the body available
synchronously with the rest of the response object
and it is still possible to implement arbitrary streaming of
response body data without having to buffer it.
A while ago I prototyped a response processor which delivers the
body through an InputStream.
What's the reason that implementation got dropped? FWIW, I suspect
that a lot of applications will be tempted to use
HttpResponse.asByteArray().getBody(); if this happens, it'll be
strictly worse than a potential HttpResponse.asInputStream() because
* it holds the entire response body in memory as a byte[], and
* processing only starts after the last byte of the response is
received from the network (this means that the request can't be
aborted part way through)
But, if you know in advance that the response body will be small then I
don't see the problem with reading it all in
to a byte[] or String, or some other application defined type. If there
is a possibility that response bodies might be very
large or unbounded, then it would be better to use a response processor
that streams the data,
2.) HttpHeaders: I love that there is a type for this
abstraction! But:
*
Why is the type an interface rather than a concrete, final
class? Since this is a pure value type, there doesn’t seem to
be much point in allowing alternative implementations?
You could conceive of different implementations, with different
tradeoffs between parsing early or lazily for instance.
What parsing are you referring to? I don't see any parsing logic in
HttpHeadersImpl other than in firstValueAsLong. It does however have
logic that alternative implementations could get wrong / be
inconsistent with, such as
* whether to use case insensitive comparison for header names
* whether the Map returned from map() is unmodifiable
* whether the List returned by allValues() is unmodifiable (it is
modifiable in the current implementation, but I think that's a bug)
* the behavior of firstValueAsLong when the value exists but is not
parseable as a long
Have you considered making HttpHeaders immutable and adding a Builder
implementation? That would make it thread safe at a small cost to the
semantic weight of the API.
With HTTP/1 you can read the headers in as a blob of bytes and then just
go looking for headers on demand, or you can parse
the entire header block up front. We should certainly make sure that we
don't preclude doing what you suggest here
that HttpHeaders might be buildable and an immutable class.
*
Dothe methods other than map() really pull their weight
*
(provide enough value relative to the semantic API weight
that they introduce)?
o
firstValueAsLong() looks particularly suspect: why would
anyone care particularly about long values?
Especiallysince the current implementation seems to throw
NumberFormatException rather than returning an empty
Optional?
Numeric header values would be fairly common I think. Long was
just chosen as the widest possible integer type.
NumberFormatException is only thrown if the value is present but
can't be parsed as a number.
But then you'd have to document that NFE is thrown, which an
application would have been much less likely to miss if they had
called Long.valueOf() themselves. It's also not clear that throwing
NFE, rather than returning an empty Optional, is the behavior that
most applications prefer.
NFE is an unchecked exception. So, I don't think there is much overhead.
If you are reading a header that is supposed to be
a numeric value I think NFE is reasonable. That is effectively a
protocol error.
These methods seem to add semantic weight to the API for very little
benefit. Perhaps it'd be better to leave them out, but add them in a
future revision of the API if it turns out that a lot of applications
duplicate exactly this logic?
*
I haven’t made up my mind about whether the existing choices
are the right ones / sufficient. Perhaps if this class used
the typesafe enum pattern from Effective Java 1st edition
rather than being an actual enum, the API would maintain the
option in a future version to allow client-configured
Redirect policies, allowing Redirect for URLs as long as they
are within the same host/domain?
As Anthony mentioned, there was a more complicated API originally,
but it was felt to be too complicated, which
was why we settled on enums (which can be extended in future).
The choice of enums limits the versatility because all enum instances
need to be known at compile time; for example, it'd be impossible to
provide a Redirect policy that allows redirects to a given set of hosts.
The only benefit I can see from using specifically an enum class is
that this gives us a serialization format for free. But a conversion
to/from String values for the standard policy instances (via a
valueOf(String) method or similar) would offer the same at a fairly
small cost of semantic weight.
For example, doesn't the existence of the convenience method
HttpHeaders.firstValueAsLong() have a much smaller benefit, but
similar semantic weight, to e.g. a potential non-enum
Redirect.valueOf() method (and even so, that method could be left to a
future API version)?
4.) HttpClient.Version:
*
Why does a HttpClient need to commit to using one HTTP
version or the other? What if an application wants to use
HTTP/2 for those servers that support it, but fall back to
HTTP/1.1 for those that don’t?
Answered.
Given that HttpClientBuilder.version(Version) does not actually
restrict the client to only that version, should it perhaps be called
setMaximumVersion(Version), or changed to enableVersion(Version) /
disableVersion(Version), or similar?
Perhaps, requestedVersion() ?
5.) CookieManager
*
Is there a common interface we could add without making the
API much more complex to allow us both RFC 2965 (outdated,
implemented by CookieManager) and RFC 6265 (new, real world,
actually used) cookies? Needs prototyping. I think it’s
likely we’ll be able to do something similar to OkHttp’s
CookieJar
<https://square.github.io/okhttp/3.x/okhttp/okhttp3/CookieJar.html>which
can be adapted to RFC 2965 - not 100%, but close enough that
most implementations of CookieManager could be reused by the
new HTTP API, while still taking advantage of RFC 6265 cookies.
One suggestion has been to use the low-level CookieHandler API.
But, another option would be just to
evolve CookieManager to the latest RFCs, which sounds a lot like
what you are suggesting.
Not exactly what I was suggesting, but similar and might work. For
what it's worth, even CookieHandler mentions "Cookie2" headers in the
documentation so this would need to be changed and the backwards
compatibility risk would need to be estimated. For reference, RFC 7540
(HTTP 2) normative [COOKIE] reference is to the (new) cookie RFC 6265
which no longer has Cookie2 / Set-Cookie2 headers.
Tobias