Geoff Hutchison wrote:
>
> I thought I should post my proposed interface for the
> ExternalTransport class. I won't commit it because I can't build the
> current CVS tree...
>
> However, it uses the same script interface as ExternalParser. The
> script receives a URL to retrieve and should give back a formatted
> response:
>
> Field Purpose
> s Status Code
> r Status Reason Phrase
> m Modification Time
> c Contents
> t Content-Type
> l Content-Length
> u URL
>
> The s/r fields are a bit difficult. Should we just require scripts to
> map protocol errors to HTTP codes?
That would probably work the best. It should just about cover all possible
cases, I'd hope!
> The l field is used in case the script does not retrieve the entire
> contents.
Ah... That reminds me...
The current (and very simplistic!) method of retrieving a whole document
before parsing it is pretty bad. A much better way would be to stream the
document into the parser. The reason I didn't do that initially was because
it was harder to do! The advantages would be:
a) less memory usage... no need to store the whole document in memory.
b) faster digging... I/O sleep time can be used to parse.
The simplest solution would be to use threads to accomplish all of this, but
it can be done without threads as well.
So, assuming external retriever and external parser, htdig would start the
retriever and reads the header. Assuming the retriever can get the requested
document, htdig can now start the parser and tie the retriever's output to
the parser's input. htdig then ties the output of the parser to the database
updater. Voila! (Boy, this would be *so* trivial in java!!!!)
Any thoughts on this?
> The u field is used in case the script receives an external redirect,
> in which case it should set the s and u fields. The script shouldn't
> attempt to follow the new URL because it may involve a different
> protocol.
Good point.
> Does this seem like a reasonable interface? Should the script receive
> more information? (referring URL and credentials come to mind...)
Are those used, ever? Regardless, the header format is easily extended...
--
Andrew Scherpbier <[EMAIL PROTECTED]>
Contigo Software <http://www.contigo.com/>
------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.