Re: HTTP client API

Tobias Thierer Mon, 21 Nov 2016 03:27:34 -0800

Hi Michael -

Some follow-up/updates on items from my previous email inline below:


On Mon, Oct 31, 2016 at 6:13 PM, Tobias Thierer <tobi...@google.com> wrote:

>
> On Fri, Oct 28, 2016 at 12:28 PM, Michael McMahon <
> michael.x.mcma...@oracle.com> wrote:
>
>>
>> The intention behind those interfaces is primarily to provide a
>> conversion between ByteBuffers and higher level objects that users
>> are interested in. So, HttpRequest.BodyProcessor generates ByteBuffers
>> for output and HttpResponse.ByteBuffer consumes them
>> on input. For response bodies, there is this additional implicit
>> requirement of needing to do something with the incoming data
>> and that is why the option of discarding it exists (the details of that
>> API are certainly open to discussion)
>>
>> On Publisher/Subscriber (Flow) versus InputStream/OutputStream, we wanted
>> a non blocking API
>>
>
> I can understand the desire to have a non blocking API, especially if
> performance is seen as more important for the API than usability. Cronet
> (the Chromium networking stack) API [documentation
> <https://chromium.googlesource.com/chromium/src/+/lkgr/components/cronet>]
> has a
>
>     UrlRequest.read(ByteBuffer) <-> callback.onReadCompleted(UrlRequest,
> UrlResponseInfo, ByteBuffer)
>
> call cycle that is not too different from OpenJDK 9 HTTP client's
>
>    subscription.request(1) <-> BodyProcessor.onNext(ByteBuffer)
>
> so the general idea is shared by others (I'm meeting with some Cronet
> folks later this week so I hope to gather their input / views soon).
>

I've meanwhile met with Cronet folks and now agree with your view that an
async API is useful for applications that prioritize performance over
simplicity. That said, I still think that many applications will prefer the
simplicity of a blocking API - either provided as part of the platform or
as a separate library.

FYI as an experiment to see how easy this could be, I've prototyped a
blocking API on top of the async API, specifically an adapter from OpenJDK
9's HttpResponse.BodyProcessor to an InputStream.

The way I've prototyped it was to use a PipedInputStream internally whose
buffer size is 32KBytes (2 * the size of the ByteBuffers used to deliver
the response); the number of bytes free in that internal buffer

   (a) decreases when onResponseBodyChunk() copies bytes into the
associated PipedOutputStream,
   (b) increases when the application reads bytes from the PipedInputStream.

The PipedInputStream is wrapped in another InputStream that calls
flowController.accept(1) every time (b) frees up enough bytes in the
internal buffer to fit everything that may be delivered in the next call to
onResponseBodyChunk(); this ensures that onResponseBodyChunk() never
blocks. The InputStream wrapper methods also take care of throwing
IOException after onResponseError(Throwable) occurred.

All of this seemed fairly straightforward, but I did run into some problems:

   - I haven't succeeded in building OpenJDK 9 from source, following the
   instructions (I always run into a NullPointerException in XMLDTDProcessor
   during the BUILD_TOOLS_CORBA part of "make all"), so I've been unable to
   empirically try / benchmark my prototype code.
   - Obviously, copying data into/out of the PipedInputStream's internal
   buffer is overhead; this consciously trades performance for simplicity of
   the API. (I note that ResponseContent.copyBuffer() also performs copying
   but I haven't checked if this can be avoided).
   - flowController.accept(1) is only called when the underlying
   PipedInputStream.read() returns, and then there will still be additional
   latency before more data is delivered to onResponseBodyChunk(). These
   delays will impair performance more than should ideally be required but I
   haven't been able to quantify this through benchmarks because of my
   problems of building OpenJDK 9 from source.
      - Further contributing to this latency is the fact that the wrapper
      doesn't have good control over how many bytes will be delivered
during the
      next onResponseBodyChunk(). Note that I've run into this same
limitation on
      my previous prototype for the use case of parsing image data.
Further note
      that Cronet's API *does* give the application control over how many
      bytes at most will be delivered during the next
onReadCompleted(UrlRequest,
      UrlResponseInfo, ByteBuffer) - see my previous email. I
previously thought
      that it was the BodyProcessor's job to construct the ByteBuffer
objects but
      I may be mistaken - in the latest version of the API, this seems to be an
      implementation detail of HttpClientImpl and not part of the public API.

On the question of push versus pull, there seems to be differing
>> philosophies around this, but
>> actually the publish/subscribe model provided by the Flow types has
>> characteristics of both push and pull.
>> It is primarily push() as in Subscriber.onNext() delivers the next buffer
>> to the subscriber. But, the subscriber
>> can effectively control it through the Subscription object.
>>
>
> What does "effectively control" means? As far as I can see, the subscriber
> can stop calls temporarily by not calling request(), but e.g. it can not
> request a minimum number of bytes to deliver (if available before EOF) on
> the next call. Is that correct or am I missing something? (See above
> question about whether ByteBuffers are completely filled).
>

I'm still interested in the answer to this question :)

2.) HttpHeaders: I love that there is a type for this abstraction! But:
>>
>>    -
>>
>>    Why is the type an interface rather than a concrete, final class?
>>    Since this is a pure value type, there doesn’t seem to be much point in
>>    allowing alternative implementations?
>>
>> You could conceive of different implementations, with different tradeoffs
>> between parsing early or lazily for instance.
>>
>
> What parsing are you referring to?
>

FYI I've meanwhile investigated in more detail whether lazy parsing might
be beneficial as part of HPACK decoding but I don't see how it could be:

   - A HTTP 2 implementation MUST (RFC 7540 section 4.3
   <https://tools.ietf.org/html/rfc7540#section-4.3>) sufficiently process
   of header frames to update the decompression state and to signal a
   COMPRESSION_ERROR in the case of an error. I take this to mean that at
   least the HPACK decoder's Dynamic Table of header entries needs to be kept
   up to date.
   - In the HPACK decoder, decoding String literals involves negligible
   work beyond figuring out where they start/end so there doesn't seem to be a
   potential to do work lazily.
   - Not even the Huffman decoding seems to offer significant opportunity
   for lazy work because HPACK requires that "*A Huffman-encoded string
   literal containing the EOS symbol MUST be treated as a decoding error*" (RFC
   7541 section 5.2 <https://tools.ietf.org/html/rfc7541#section-5.2>).
   Because Huffman codes are generally not self-synchronizing
   <https://en.wikipedia.org/wiki/Self-synchronizing_code>, the only way to
   figure out on what bit the last symbol starts seems to be to decode the
   entire sequence.

What am I missing? I still don't see why HttpHeaders should be (or at least
include) an immutable, final value type class. In general, do you disagree
that ease of providing an alternative HttpClient implementation is
important?

Tobias

Re: HTTP client API

Reply via email to