Re: HTTP client API

Michael McMahon Tue, 29 Nov 2016 07:47:53 -0800

Hi Tobias,

Apologies for the long delay in responding. I'll hopefully catch up onyour subsequent comments soon.Just to give an overall update on the status of this work first. Thereis a plan afoot to make the JDK 9module system support something we are calling "incubator modules".These are modules thatcan be delivered with the JDK, but are not standard parts of theplatform (Java SE). So, they do notresolve by default and will have runtime/compile time warnings toindicate that the contained APIsare potentially subject to change. The plan is to make this new httpclient API one of theseincubator modules in JDK 9. It will use a different package namespace toreflect this.

The upshot is that while we would like the API to be as complete aspossible, it will notbe fixed or standardized in JDK 9. But by delivering the API andimplementation with JDK 9, itwill get some exposure and use, so that we can standardize it in JDK 10,and move it fromits incubator module and namespace back into Java SE in thejava.net.http package.

I've added some answers to your questions below, and will get to thefollowup messages soon.


Thanks,
Michael

On 31/10/2016, 18:13, Tobias Thierer wrote:

Hi Michael -

thanks a lot for your response! Further comments below.
How would you rank simplicity, versatility and performance as goalsfor the HTTP Client API? For example the blocking vs. non-blocking APIchoices will largely depend on trade-offs, so I'd like to understandthe target applications and requirements that these trade-offs werebased on.

To my mind, we should support both blocking and non-blocking in the API,and by specifying them both inthe API (rather than relying on blocking behavior being a wrapper aroundnon-blocking) then we get thepotential benefits of both (reduced thread switching with blocking,versus less thread usage with non-blocking).

On Fri, Oct 28, 2016 at 12:28 PM, Michael McMahon<michael.x.mcma...@oracle.com <mailto:michael.x.mcma...@oracle.com>>wrote:
    The intention behind those interfaces is primarily to provide a
    conversion between ByteBuffers and higher level objects that users
    are interested in. So, HttpRequest.BodyProcessor generates
    ByteBuffers for output and HttpResponse.ByteBuffer consumes them
    on input. For response bodies, there is this additional implicit
    requirement of needing to do something with the incoming data
    and that is why the option of discarding it exists (the details of
    that API are certainly open to discussion)

    On Publisher/Subscriber (Flow) versus InputStream/OutputStream, we
    wanted a non blocking API
I can understand the desire to have a non blocking API, especially ifperformance is seen as more important for the API than usability.Cronet (the Chromium networking stack) API [documentation<https://chromium.googlesource.com/chromium/src/+/lkgr/components/cronet>]has a
UrlRequest.read(ByteBuffer) <->callback.onReadCompleted(UrlRequest, UrlResponseInfo, ByteBuffer)
call cycle that is not too different from OpenJDK 9 HTTP client's

   subscription.request(1) <-> BodyProcessor.onNext(ByteBuffer)
so the general idea is shared by others (I'm meeting with some Cronetfolks later this week so I hope to gather their input / views soon).Note that in Cronet, the ByteBuffer is provided as part of each callrequesting more data, whereas in OpenJDK 9 it's returned onceBodyProcessor.getBuffer().

On this point, I'd say that there is a high degree of usability with theproposed API when usingthe standard handlers/processors. Again, the blocking variant is veryeasy to use. The non-blockingvariant with CompletableFuture is not much harder, once the programmerinvests some effort in understanding that

model.

Implementing your own processor does require familiarity with the Flowtypes and their underlying philosophy, but I seethose cases as outside the core use cases. It's likely that additionalprocessor/handlers would be added in time also.

At the same time, the first and third goal mentioned in JEP 110<http://openjdk.java.net/jeps/110> are
  * /Must be easy to use for common cases, including a simple blocking
    mode./
  * /A simple and concise API which caters for 80-90% of application
    needs.
    /
Blocking, pull-based APIs tend to be simpler to use because (a) onecan just step through a for loop in a debugger rather than needingbreakpoints, and (b) state can be held on the stack rather than infields [see related blog post<http://code-o-matic.blogspot.co.uk/2011/01/note-on-api-design-call-stack-as-source.html>],so I'm struggling to follow the trade-offs here. Obviously, blockingAPIs have some disadvantages as well - e.g. context switching overheadin the case of separate Threads per request. Is there any place otherthan JEP 110 where I could read up on principles or targetapplications that drove the trade-offs for this API?

But, I think you are talking about debugging the processor or handlerthere rather than the calling application code?

A concrete example that I tried to prototype was reading an image fileheader & body from the network, potentially aborting the requestdepending on some metadata in the file header; using the OpenJDK 9HTTP client API, this involved a ton of boilerplate. Part of theboilerplate was because I wasn't sure if onNext() might get apartially filled ByteBuffer that might cut off the image file headeror even an individual int32 value in that header. Is it guaranteedthat the ByteBuffer returned by BodyProcessor.getBuffer will be fullon each call to onNext(ByteBuffer), except possibly the last? If not,perhaps it should be guaranteed?
Related to this, it appears that the documentation forSubscription.request(long n)<http://cr.openjdk.java.net/%7Emichaelm/httpclient/api.1/java/util/concurrent/Flow.Subscription.html#request-long-> specifiesthat /n/ specifically /denotes the number of future calls/ to onNext()<http://cr.openjdk.java.net/%7Emichaelm/httpclient/api.1/java/util/concurrent/Flow.Subscriber.html#onNext-T->.If this wasn't the case, BodyProcessor could have defined /n/ to referto a number of bytes (delivered as a single onNext() call if they fitinto the supplied ByteBuffer), which might have been a really neat wayfor the application to control the sizes of chunks that it getspassed. For example, an application might want to get the file headerin one go, and then afterwards process some fixed size data record foreach call. Oh well. Do you happen to have an idea for how theapplication could be more in control of the chunk sizes?

Yes, Subscriptions relate to the number of <T> items that can betransferred rather than the number of bytes. There was some discussionon that exact question and we need to tighten that up, so that it iseasier to map from the byte oriented flow control scheme used by HTTP/2to the item oriented model that Flow uses. I think basically thatBodyProcessor needs to commit to specific buffer sizes, and guaranteethat all calls of onNext() will transfer that exact number of bytes inthe ByteBuffer (except for the final call of a subscription). With thatguarantee, it is possible to have a seamless mapping between the twoflow control schemes. Whether the processor decides the buffer size orthe http library, I'm not certain, but I think it probably needs to befixed at the start of a particular request/response, rather than beallowed to vary.

I'd be interested to look at your prototype to see what other painpoints there might be.

    and had originally defined
    something that was quite similar to the asynchronous model used by
    Flow, but the Flow types have a nice mechanism for
    managing flow control asynchronously. So, it made sense to use the
    standard API.
Regarding managing flow control, I assume you're referring tosubscription.request() which I assume is tied to WINDOW_UPDATE framesin HTTP 2? Also, I assume that Subscription.cancel() will result in aRST_STREAM frame being sent to the server, is that correct?

Yes.

Is there a recommended way for an application to limit streamconcurrency (RFC 7540 section 5.1.2<https://tools.ietf.org/html/rfc7540#section-5.1.2>)? Example usecase: An application that loads and decodes .jpeg images from thenetwork might be limited by how many decompressed images it can holdin memory at a time (the application may want to re-compress theimages in a different format, e.g. for file format conversion or tosend them to the GPU in ETC2 texture compression format). Would suchan application manually maintain a limited set of "active" streams,making sure to start calling request() only for those streams thathave become active? (With a blocking API, an application could simplyuse a fixed-size thread pool, or could use a Semaphore to guard accessto a limited resource, such as memory).

I really wouldn't use the request/response processors for anything otherthan encoding/decoding of bodies. There are a variety of ways oflimiting request/response concurrency. As you suggest, it is obvious forthe blocking style. For asynchronous code using CompletableFuture thereare nice ways to do it also. In one example I wrote a small class calledRequestLimiter specified as follows:

(it is kind of like an asynchronous version of a semaphore)

// Asynchronous semaphore. Implementation just uses a 
LinkedList<CompletableFuture<Void>>
// Very simple
class RequestLimiter {
    RequestLimiter(int concurrency);

    // returned CF completes when caller is permitted to send request.
    // 'concurrency' requests are allowed simultaneously
    synchronized CompletableFuture<Void>  obtainPermit();

    // signals that a response has been received and processed
    // permitting another request to start
    synchronized void returnPermit();
}

// The calling code might look as follows

{
    HttpClient client = ... // build the client
    int REQUESTS = 1000; // say
    int CONCURRENCY = 10; // say
    RequestLimiter limiter = new RequestLimiter(CONCURRENCY);

    CompletableFuture<Path>  futures[] = new CompletableFuture[REQUESTS];

    for (int i=0; i<REQUESTS; i++) {
        HttpRequest request = .... // build the request
        Path path = Paths.get("/someroot", request.uri().getPath());
        futures[i] = limiter
            .obtainPermit()
            .thenCompose((v)->  client.sendAsync(request, 
ResponseBodyHandler.asFile(path)))
            .thenApply((HttpResponse<Path>  response) ->  {
                limiter.returnPermit();
                return response.body();
            })
    }
    // wait for all to complete
    CompletableFuture.allOf(futures).join();

    We're certainly open to suggestions for improved names, but I
    think it is useful to have a way to explicitly discard a response
    body, especially since HTTP/2 has an efficient way to do that.
Why can't the response body instead be discarded by callingsubscription.cancel() during the call to onSubscribe() (before thefirst ByteBuffer of the body data has been delivered)?

That would have a similar effect, except that the subscription wouldn'tnormally be visible to the user of the response processor,but some processor implementations could expose that capability if itmakes sense. For instance, a processor which returnsthe response body as an InputStream would map InputStream.close() tosubscription.cancel()

    On the question of push versus pull, there seems to be differing
    philosophies around this, but
    actually the publish/subscribe model provided by the Flow types
    has characteristics of both push and pull.
    It is primarily push() as in Subscriber.onNext() delivers the next
    buffer to the subscriber. But, the subscriber
    can effectively control it through the Subscription object.
What does "effectively control" means? As far as I can see, thesubscriber can stop calls temporarily by not calling request(), bute.g. it can not request a minimum number of bytes to deliver (ifavailable before EOF) on the next call. Is that correct or am Imissing something? (See above question about whether ByteBuffers arecompletely filled).

Yes, I mean by regulating the calls to request() you control the rate ofdata flow and can stop it temporarily. I thinkthe question you raised above about minimum numbers of bytes can beaddressed as discussed above.

    There are other benefits to having the body available
    synchronously with the rest of the response object
    and it is still possible to implement arbitrary streaming of
    response body data without having to buffer it.
    A while ago I prototyped a response processor which delivers the
    body through an InputStream.

What's the reason that implementation got dropped? FWIW, I suspectthat a lot of applications will be tempted to useHttpResponse.asByteArray().getBody(); if this happens, it'll bestrictly worse than a potential HttpResponse.asInputStream() because


  * it holds the entire response body in memory as a byte[], and
  * processing only starts after the last byte of the response is
    received from the network (this means that the request can't be
    aborted part way through)

But, if you know in advance that the response body will be small then Idon't see the problem with reading it all into a byte[] or String, or some other application defined type. If thereis a possibility that response bodies might be verylarge or unbounded, then it would be better to use a response processorthat streams the data,

    2.) HttpHeaders: I love that there is a type for this
    abstraction! But:

     *

        Why is the type an interface rather than a concrete, final
        class? Since this is a pure value type, there doesn’t seem to
        be much point in allowing alternative implementations?
    You could conceive of different implementations, with different
    tradeoffs between parsing early or lazily for instance.
What parsing are you referring to? I don't see any parsing logic inHttpHeadersImpl other than in firstValueAsLong. It does however havelogic that alternative implementations could get wrong / beinconsistent with, such as
  * whether to use case insensitive comparison for header names
  * whether the Map returned from map() is unmodifiable
  * whether the List returned by allValues() is unmodifiable (it is
    modifiable in the current implementation, but I think that's a bug)
  * the behavior of firstValueAsLong when the value exists but is not
    parseable as a long
Have you considered making HttpHeaders immutable and adding a Builderimplementation? That would make it thread safe at a small cost to thesemantic weight of the API.

With HTTP/1 you can read the headers in as a blob of bytes and then justgo looking for headers on demand, or you can parsethe entire header block up front. We should certainly make sure that wedon't preclude doing what you suggest here

that HttpHeaders might be buildable and an immutable class.

     *

        Dothe methods other than map() really pull their weight

     *

        (provide enough value relative to the semantic API weight
        that they introduce)?

         o

            firstValueAsLong() looks particularly suspect: why would
            anyone care particularly about long values?
            Especiallysince the current implementation seems to throw
            NumberFormatException rather than returning an empty
            Optional?

    Numeric header values would be fairly common I think. Long was
    just chosen as the widest possible integer type.
    NumberFormatException is only thrown if the value is present but
    can't be parsed as a number.

But then you'd have to document that NFE is thrown, which anapplication would have been much less likely to miss if they hadcalled Long.valueOf() themselves. It's also not clear that throwingNFE, rather than returning an empty Optional, is the behavior thatmost applications prefer.

NFE is an unchecked exception. So, I don't think there is much overhead.If you are reading a header that is supposed to bea numeric value I think NFE is reasonable. That is effectively aprotocol error.

These methods seem to add semantic weight to the API for very littlebenefit. Perhaps it'd be better to leave them out, but add them in afuture revision of the API if it turns out that a lot of applicationsduplicate exactly this logic?
     *

        I haven’t made up my mind about whether the existing choices
        are the right ones / sufficient. Perhaps if this class used
        the typesafe enum pattern from Effective Java 1st edition
        rather than being an actual enum, the API would maintain the
        option in a future version to allow client-configured
        Redirect policies, allowing Redirect for URLs as long as they
        are within the same host/domain?
    As Anthony mentioned, there was a more complicated API originally,
    but it was felt to be too complicated, which
    was why we settled on enums (which can be extended in future).
The choice of enums limits the versatility because all enum instancesneed to be known at compile time; for example, it'd be impossible toprovide a Redirect policy that allows redirects to a given set of hosts.
The only benefit I can see from using specifically an enum class isthat this gives us a serialization format for free. But a conversionto/from String values for the standard policy instances (via avalueOf(String) method or similar) would offer the same at a fairlysmall cost of semantic weight.
For example, doesn't the existence of the convenience methodHttpHeaders.firstValueAsLong() have a much smaller benefit, butsimilar semantic weight, to e.g. a potential non-enumRedirect.valueOf() method (and even so, that method could be left to afuture API version)?
    4.) HttpClient.Version:

     *

        Why does a HttpClient need to commit to using one HTTP
        version or the other? What if an application wants to use
        HTTP/2 for those servers that support it, but fall back to
        HTTP/1.1 for those that don’t?
    Answered.
Given that HttpClientBuilder.version(Version) does not actuallyrestrict the client to only that version, should it perhaps be calledsetMaximumVersion(Version), or changed to enableVersion(Version) /disableVersion(Version), or similar?

Perhaps, requestedVersion() ?

    5.) CookieManager

     *

        Is there a common interface we could add without making the
        API much more complex to allow us both RFC 2965 (outdated,
        implemented by CookieManager) and RFC 6265 (new, real world,
        actually used) cookies? Needs prototyping. I think it’s
        likely we’ll be able to do something similar to OkHttp’s
        CookieJar
        <https://square.github.io/okhttp/3.x/okhttp/okhttp3/CookieJar.html>which
        can be adapted to RFC 2965 - not 100%, but close enough that
        most implementations of CookieManager could be reused by the
        new HTTP API, while still taking advantage of RFC 6265 cookies.

    One suggestion has been to use the low-level CookieHandler API.
    But, another option would be just to
    evolve CookieManager to the latest RFCs, which sounds a lot like
    what you are suggesting.

Not exactly what I was suggesting, but similar and might work. Forwhat it's worth, even CookieHandler mentions "Cookie2" headers in thedocumentation so this would need to be changed and the backwardscompatibility risk would need to be estimated. For reference, RFC 7540(HTTP 2) normative [COOKIE] reference is to the (new) cookie RFC 6265which no longer has Cookie2 / Set-Cookie2 headers.


Tobias

Re: HTTP client API

Reply via email to