Hi Tobias,

Apologies for the long delay in responding. I'll hopefully catch up on your subsequent comments soon. Just to give an overall update on the status of this work first. There is a plan afoot to make the JDK 9 module system support something we are calling "incubator modules". These are modules that can be delivered with the JDK, but are not standard parts of the platform (Java SE). So, they do not resolve by default and will have runtime/compile time warnings to indicate that the contained APIs are potentially subject to change. The plan is to make this new http client API one of these incubator modules in JDK 9. It will use a different package namespace to reflect this.

The upshot is that while we would like the API to be as complete as possible, it will not be fixed or standardized in JDK 9. But by delivering the API and implementation with JDK 9, it will get some exposure and use, so that we can standardize it in JDK 10, and move it from its incubator module and namespace back into Java SE in the java.net.http package.

I've added some answers to your questions below, and will get to the followup messages soon.

Thanks,
Michael

On 31/10/2016, 18:13, Tobias Thierer wrote:
Hi Michael -

thanks a lot for your response! Further comments below.

How would you rank simplicity, versatility and performance as goals for the HTTP Client API? For example the blocking vs. non-blocking API choices will largely depend on trade-offs, so I'd like to understand the target applications and requirements that these trade-offs were based on.

To my mind, we should support both blocking and non-blocking in the API, and by specifying them both in the API (rather than relying on blocking behavior being a wrapper around non-blocking) then we get the potential benefits of both (reduced thread switching with blocking, versus less thread usage with non-blocking).

On Fri, Oct 28, 2016 at 12:28 PM, Michael McMahon <michael.x.mcma...@oracle.com <mailto:michael.x.mcma...@oracle.com>> wrote:


    The intention behind those interfaces is primarily to provide a
    conversion between ByteBuffers and higher level objects that users
    are interested in. So, HttpRequest.BodyProcessor generates
    ByteBuffers for output and HttpResponse.ByteBuffer consumes them
    on input. For response bodies, there is this additional implicit
    requirement of needing to do something with the incoming data
    and that is why the option of discarding it exists (the details of
    that API are certainly open to discussion)

    On Publisher/Subscriber (Flow) versus InputStream/OutputStream, we
    wanted a non blocking API


I can understand the desire to have a non blocking API, especially if performance is seen as more important for the API than usability. Cronet (the Chromium networking stack) API [documentation <https://chromium.googlesource.com/chromium/src/+/lkgr/components/cronet>] has a

UrlRequest.read(ByteBuffer) <-> callback.onReadCompleted(UrlRequest, UrlResponseInfo, ByteBuffer)

call cycle that is not too different from OpenJDK 9 HTTP client's

   subscription.request(1) <-> BodyProcessor.onNext(ByteBuffer)

so the general idea is shared by others (I'm meeting with some Cronet folks later this week so I hope to gather their input / views soon). Note that in Cronet, the ByteBuffer is provided as part of each call requesting more data, whereas in OpenJDK 9 it's returned once BodyProcessor.getBuffer().

On this point, I'd say that there is a high degree of usability with the proposed API when using the standard handlers/processors. Again, the blocking variant is very easy to use. The non-blocking variant with CompletableFuture is not much harder, once the programmer invests some effort in understanding that
model.

Implementing your own processor does require familiarity with the Flow types and their underlying philosophy, but I see those cases as outside the core use cases. It's likely that additional processor/handlers would be added in time also.

At the same time, the first and third goal mentioned in JEP 110 <http://openjdk.java.net/jeps/110> are

  * /Must be easy to use for common cases, including a simple blocking
    mode./
  * /A simple and concise API which caters for 80-90% of application
    needs.
    /

Blocking, pull-based APIs tend to be simpler to use because (a) one can just step through a for loop in a debugger rather than needing breakpoints, and (b) state can be held on the stack rather than in fields [see related blog post <http://code-o-matic.blogspot.co.uk/2011/01/note-on-api-design-call-stack-as-source.html>], so I'm struggling to follow the trade-offs here. Obviously, blocking APIs have some disadvantages as well - e.g. context switching overhead in the case of separate Threads per request. Is there any place other than JEP 110 where I could read up on principles or target applications that drove the trade-offs for this API?

But, I think you are talking about debugging the processor or handler there rather than the calling application code?

A concrete example that I tried to prototype was reading an image file header & body from the network, potentially aborting the request depending on some metadata in the file header; using the OpenJDK 9 HTTP client API, this involved a ton of boilerplate. Part of the boilerplate was because I wasn't sure if onNext() might get a partially filled ByteBuffer that might cut off the image file header or even an individual int32 value in that header. Is it guaranteed that the ByteBuffer returned by BodyProcessor.getBuffer will be full on each call to onNext(ByteBuffer), except possibly the last? If not, perhaps it should be guaranteed?

Related to this, it appears that the documentation for Subscription.request(long n) <http://cr.openjdk.java.net/%7Emichaelm/httpclient/api.1/java/util/concurrent/Flow.Subscription.html#request-long-> specifies that /n/ specifically /denotes the number of future calls/ to onNext() <http://cr.openjdk.java.net/%7Emichaelm/httpclient/api.1/java/util/concurrent/Flow.Subscriber.html#onNext-T->. If this wasn't the case, BodyProcessor could have defined /n/ to refer to a number of bytes (delivered as a single onNext() call if they fit into the supplied ByteBuffer), which might have been a really neat way for the application to control the sizes of chunks that it gets passed. For example, an application might want to get the file header in one go, and then afterwards process some fixed size data record for each call. Oh well. Do you happen to have an idea for how the application could be more in control of the chunk sizes?
Yes, Subscriptions relate to the number of <T> items that can be transferred rather than the number of bytes. There was some discussion on that exact question and we need to tighten that up, so that it is easier to map from the byte oriented flow control scheme used by HTTP/2 to the item oriented model that Flow uses. I think basically that BodyProcessor needs to commit to specific buffer sizes, and guarantee that all calls of onNext() will transfer that exact number of bytes in the ByteBuffer (except for the final call of a subscription). With that guarantee, it is possible to have a seamless mapping between the two flow control schemes. Whether the processor decides the buffer size or the http library, I'm not certain, but I think it probably needs to be fixed at the start of a particular request/response, rather than be allowed to vary.

I'd be interested to look at your prototype to see what other pain points there might be.


    and had originally defined
    something that was quite similar to the asynchronous model used by
    Flow, but the Flow types have a nice mechanism for
    managing flow control asynchronously. So, it made sense to use the
    standard API.


Regarding managing flow control, I assume you're referring to subscription.request() which I assume is tied to WINDOW_UPDATE frames in HTTP 2? Also, I assume that Subscription.cancel() will result in a RST_STREAM frame being sent to the server, is that correct?

Yes.
Is there a recommended way for an application to limit stream concurrency (RFC 7540 section 5.1.2 <https://tools.ietf.org/html/rfc7540#section-5.1.2>)? Example use case: An application that loads and decodes .jpeg images from the network might be limited by how many decompressed images it can hold in memory at a time (the application may want to re-compress the images in a different format, e.g. for file format conversion or to send them to the GPU in ETC2 texture compression format). Would such an application manually maintain a limited set of "active" streams, making sure to start calling request() only for those streams that have become active? (With a blocking API, an application could simply use a fixed-size thread pool, or could use a Semaphore to guard access to a limited resource, such as memory).

I really wouldn't use the request/response processors for anything other than encoding/decoding of bodies. There are a variety of ways of limiting request/response concurrency. As you suggest, it is obvious for the blocking style. For asynchronous code using CompletableFuture there are nice ways to do it also. In one example I wrote a small class called RequestLimiter specified as follows:
(it is kind of like an asynchronous version of a semaphore)

// Asynchronous semaphore. Implementation just uses a 
LinkedList<CompletableFuture<Void>>
// Very simple
class RequestLimiter {
    RequestLimiter(int concurrency);

    // returned CF completes when caller is permitted to send request.
    // 'concurrency' requests are allowed simultaneously
    synchronized CompletableFuture<Void>  obtainPermit();

    // signals that a response has been received and processed
    // permitting another request to start
    synchronized void returnPermit();
}

// The calling code might look as follows

{
    HttpClient client = ... // build the client
    int REQUESTS = 1000; // say
    int CONCURRENCY = 10; // say
    RequestLimiter limiter = new RequestLimiter(CONCURRENCY);

    CompletableFuture<Path>  futures[] = new CompletableFuture[REQUESTS];

    for (int i=0; i<REQUESTS; i++) {
        HttpRequest request = .... // build the request
        Path path = Paths.get("/someroot", request.uri().getPath());
        futures[i] = limiter
            .obtainPermit()
            .thenCompose((v)->  client.sendAsync(request, 
ResponseBodyHandler.asFile(path)))
            .thenApply((HttpResponse<Path>  response) ->  {
                limiter.returnPermit();
                return response.body();
            })
    }
    // wait for all to complete
    CompletableFuture.allOf(futures).join();

    We're certainly open to suggestions for improved names, but I
    think it is useful to have a way to explicitly discard a response
    body, especially since HTTP/2 has an efficient way to do that.


Why can't the response body instead be discarded by calling subscription.cancel() during the call to onSubscribe() (before the first ByteBuffer of the body data has been delivered)?

That would have a similar effect, except that the subscription wouldn't normally be visible to the user of the response processor, but some processor implementations could expose that capability if it makes sense. For instance, a processor which returns the response body as an InputStream would map InputStream.close() to subscription.cancel()

    On the question of push versus pull, there seems to be differing
    philosophies around this, but
    actually the publish/subscribe model provided by the Flow types
    has characteristics of both push and pull.
    It is primarily push() as in Subscriber.onNext() delivers the next
    buffer to the subscriber. But, the subscriber
    can effectively control it through the Subscription object.


What does "effectively control" means? As far as I can see, the subscriber can stop calls temporarily by not calling request(), but e.g. it can not request a minimum number of bytes to deliver (if available before EOF) on the next call. Is that correct or am I missing something? (See above question about whether ByteBuffers are completely filled).

Yes, I mean by regulating the calls to request() you control the rate of data flow and can stop it temporarily. I think the question you raised above about minimum numbers of bytes can be addressed as discussed above.

    There are other benefits to having the body available
    synchronously with the rest of the response object
    and it is still possible to implement arbitrary streaming of
    response body data without having to buffer it.
    A while ago I prototyped a response processor which delivers the
    body through an InputStream.


What's the reason that implementation got dropped? FWIW, I suspect that a lot of applications will be tempted to use HttpResponse.asByteArray().getBody(); if this happens, it'll be strictly worse than a potential HttpResponse.asInputStream() because

  * it holds the entire response body in memory as a byte[], and
  * processing only starts after the last byte of the response is
    received from the network (this means that the request can't be
    aborted part way through)


But, if you know in advance that the response body will be small then I don't see the problem with reading it all in to a byte[] or String, or some other application defined type. If there is a possibility that response bodies might be very large or unbounded, then it would be better to use a response processor that streams the data,

    2.) HttpHeaders: I love that there is a type for this
    abstraction! But:

     *

        Why is the type an interface rather than a concrete, final
        class? Since this is a pure value type, there doesn’t seem to
        be much point in allowing alternative implementations?

    You could conceive of different implementations, with different
    tradeoffs between parsing early or lazily for instance.


What parsing are you referring to? I don't see any parsing logic in HttpHeadersImpl other than in firstValueAsLong. It does however have logic that alternative implementations could get wrong / be inconsistent with, such as

  * whether to use case insensitive comparison for header names
  * whether the Map returned from map() is unmodifiable
  * whether the List returned by allValues() is unmodifiable (it is
    modifiable in the current implementation, but I think that's a bug)
  * the behavior of firstValueAsLong when the value exists but is not
    parseable as a long

Have you considered making HttpHeaders immutable and adding a Builder implementation? That would make it thread safe at a small cost to the semantic weight of the API.

With HTTP/1 you can read the headers in as a blob of bytes and then just go looking for headers on demand, or you can parse the entire header block up front. We should certainly make sure that we don't preclude doing what you suggest here
that HttpHeaders might be buildable and an immutable class.

     *

        Dothe methods other than map() really pull their weight

     *

        (provide enough value relative to the semantic API weight
        that they introduce)?

         o

            firstValueAsLong() looks particularly suspect: why would
            anyone care particularly about long values?
            Especiallysince the current implementation seems to throw
            NumberFormatException rather than returning an empty
            Optional?


    Numeric header values would be fairly common I think. Long was
    just chosen as the widest possible integer type.
    NumberFormatException is only thrown if the value is present but
    can't be parsed as a number.


But then you'd have to document that NFE is thrown, which an application would have been much less likely to miss if they had called Long.valueOf() themselves. It's also not clear that throwing NFE, rather than returning an empty Optional, is the behavior that most applications prefer.

NFE is an unchecked exception. So, I don't think there is much overhead. If you are reading a header that is supposed to be a numeric value I think NFE is reasonable. That is effectively a protocol error.
These methods seem to add semantic weight to the API for very little benefit. Perhaps it'd be better to leave them out, but add them in a future revision of the API if it turns out that a lot of applications duplicate exactly this logic?

     *

        I haven’t made up my mind about whether the existing choices
        are the right ones / sufficient. Perhaps if this class used
        the typesafe enum pattern from Effective Java 1st edition
        rather than being an actual enum, the API would maintain the
        option in a future version to allow client-configured
        Redirect policies, allowing Redirect for URLs as long as they
        are within the same host/domain?


    As Anthony mentioned, there was a more complicated API originally,
    but it was felt to be too complicated, which
    was why we settled on enums (which can be extended in future).


The choice of enums limits the versatility because all enum instances need to be known at compile time; for example, it'd be impossible to provide a Redirect policy that allows redirects to a given set of hosts.

The only benefit I can see from using specifically an enum class is that this gives us a serialization format for free. But a conversion to/from String values for the standard policy instances (via a valueOf(String) method or similar) would offer the same at a fairly small cost of semantic weight.

For example, doesn't the existence of the convenience method HttpHeaders.firstValueAsLong() have a much smaller benefit, but similar semantic weight, to e.g. a potential non-enum Redirect.valueOf() method (and even so, that method could be left to a future API version)?

    4.) HttpClient.Version:

     *

        Why does a HttpClient need to commit to using one HTTP
        version or the other? What if an application wants to use
        HTTP/2 for those servers that support it, but fall back to
        HTTP/1.1 for those that don’t?


    Answered.


Given that HttpClientBuilder.version(Version) does not actually restrict the client to only that version, should it perhaps be called setMaximumVersion(Version), or changed to enableVersion(Version) / disableVersion(Version), or similar?
Perhaps, requestedVersion() ?

    5.) CookieManager

     *

        Is there a common interface we could add without making the
        API much more complex to allow us both RFC 2965 (outdated,
        implemented by CookieManager) and RFC 6265 (new, real world,
        actually used) cookies? Needs prototyping. I think it’s
        likely we’ll be able to do something similar to OkHttp’s
        CookieJar
        <https://square.github.io/okhttp/3.x/okhttp/okhttp3/CookieJar.html>which
        can be adapted to RFC 2965 - not 100%, but close enough that
        most implementations of CookieManager could be reused by the
        new HTTP API, while still taking advantage of RFC 6265 cookies.


    One suggestion has been to use the low-level CookieHandler API.
    But, another option would be just to
    evolve CookieManager to the latest RFCs, which sounds a lot like
    what you are suggesting.


Not exactly what I was suggesting, but similar and might work. For what it's worth, even CookieHandler mentions "Cookie2" headers in the documentation so this would need to be changed and the backwards compatibility risk would need to be estimated. For reference, RFC 7540 (HTTP 2) normative [COOKIE] reference is to the (new) cookie RFC 6265 which no longer has Cookie2 / Set-Cookie2 headers.

Tobias



Reply via email to