Chris Burdess <[EMAIL PROTECTED]> writes:

> David Daney wrote:
>> It seems the the current implementation of HTTPURLConnection.connect() 
>> buffers the entire response before returning.
>>
>> Is that a correct analysis?
>
> Yes.
>
>> This can be problematical if the content is larger than the heap.  It 
>> is even worse than that as it makes a copy of the content, so the 
>> content can only be half as large as the heap.
>>
>> Does anyone know the rational behind doing it this way?
>
> Our implementation uses the inetlib HTTP client in order to leverage 
> numerous HTTP features such as chunked and compressed transfer-codings, 
> TLS, and HTTP 1.1.
>
> The design of the inetlib HTTP client is based on callbacks. You 
> register a listener to receive notification of HTTP response data, 
> rather than pulling the data yourself. This leaves the client in proper 
> control of the stream and permits correct handling of HTTP persistent 
> connections (reuse of the same TCP connection for multiple HTTP 
> requests).
>
> The design of the URLConnection API is pull-based. Therefore we either 
> have to buffer an entire response before returning, or use multiple 
> threads, a pipe, and a much more complex implementation to manage 
> cleanup of resources. Also note that with HTTP 1.1 chunked encoding, 
> you can have headers after the response body, which is not something 
> that most naive developers will expect. This means that in the 
> non-buffered implementation you could have
>
>    connection.getHeader("My-Header"); // null
>    connection.getInputStream();
>    // read until -1
>    connection.getHeader("My-Header"); // non-null
>
> In practice I haven't seen this in many servers, but it is still a 
> possibility.
>
> Tom Tromey and I have discussed the possibility of this non-buffered 
> implementation and of a hybrid model which uses a heuristic based on 
> the content length to decide which of these implementations to use, but 
> we haven't really had time to thrash it all out yet.
>
> If you are dealing with streaming servers or with very large responses, 
> you probably shouldn't be using the URLConnection API in any case - 
> consider using the inetlib client directly as it will be more 
> efficient.

I have spoken to Chris before about my own http library which uses
non-blocking IO. This would be a solution to this problem but also
require another thread (for the selector).

It also does not have 1.1 features like pipelining though I will add
them if I get the time.


Nic


_______________________________________________
Classpath mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/classpath

Reply via email to