[htdig3-dev] A proposal for document retrieving with HTTP/1.1

Gabriele Bartolini Mon, 4 Oct 1999 03:29:01 -0700

Ciao my dear "talponi",

        I want to inform you that I seem to have resolved the problem of the
connection falling down after retrieving a chunk. It was, probably, a
buffer problem. Now I have added a method for flushing the stream
(apparently) in the io class: it re-initializes the position indexes of the
buffer to zero. I pray you to try the testnet program under the test dir
with a redirected URL.

        HTTP/1.1 seems to work now, but it needs improvements, obviously.

        I have a proposal to do. Since we take advantage of persistent connections
I want to implement these steps for each request:

1) If we have any informations on the server (it's already been asked for a
document) and we know it doesn't accepts persistent connections, we can
disable them for the request.

2) If persistent connections are available, I suggest to make a HEAD
request before the GET one. This will permit us to check both the response
status code and the content-type. And we can check if the document has
really got to be retrieved, by checking the status and the content-type
value (for parsing). If yes, we make a further request, but with the GET
method this time, retrieve the content and parse it. If persistent
connection aren't available, it's unquestionable that we must close the
connection. So we can make only a GET request and check for the status code
and the content-type here. If they satisfy us we commit the request, else
we don't do anything. So the connection closes immediately after.

I think this process could help us gain a lot of time. Just think to a big
imagine of 300K on a site with persistent connections activated. In the way
I proposed, we will make a HEAD request and see (with few bytes downloaded)
that it wouldn't satisfy our requirements, because of its content-type. We
would save downloading such a huge document. On the other way we should
make 2 requests for a document that exists and is parsable. For example an
HTML document. We will make a HEAD request, notice that it's valid for
retrieving and make a second request with the GET method. As we take
advantage of pcs, I think the time overload will not be so big and, at the
end we will have a time gain.

Just let me know what you think about it, so I can modify the code this way
..

Ciao
-Gabriele


-------------------------------------------------

Gabriele Bartolini
U.O. Rete Civica - Comune di Prato
Prato - Italia - Europa

e-mail: [EMAIL PROTECTED]
http://www.po-net.prato.it

-------------------------------------------------

------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.
[htdig3-dev] A proposal for document retrieving with HTTP/1.1

Reply via email to