Re: [htdig3-dev] Re: ExternalTransport interface

Geoff Hutchison Sun, 3 Oct 1999 10:26:50 -0700

At 10:56 AM -0700 10/3/99, Andrew Scherpbier wrote:
>a)  less memory usage...  no need to store the whole document in memory.
>b)  faster digging...  I/O sleep time can be used to parse.
>The simplest solution would be to use threads to accomplish all of this, but
>it can be done without threads as well.

Both of these would be nice. Heck, full multithreading would help in 
many places. But that's another ball of wax.

>So, assuming external retriever and external parser, htdig would start the
>retriever and reads the header.  Assuming the retriever can get the requested
>document, htdig can now start the parser and tie the retriever's output to
>the parser's input.  htdig then ties the output of the parser to the database
>updater.  Voila!   (Boy, this would be *so* trivial in java!!!!)
>
>Any thoughts on this?

This would be extremely useful in the case of using both external 
retriever and external parser. However, I can't quite see how it 
works conceptually with an internal parser. (I just got back from a 
run so it might be obvious and I'm just in oxygen debt? ;-)

> > Does this seem like a reasonable interface? Should the script receive
> > more information? (referring URL and credentials come to mind...)
>
>Are those used, ever?  Regardless, the header format is easily extended...

I could see someone wanting to index a non-anonymous FTP site, or 
writing an HTTPS transport script that might desire both. I'm a 
little hesitant to send credentials to the script because it smells 
of security problems, but I guess if you're allowing the script to be 
run, you know what's in it.

-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.
Re: [htdig3-dev] Re: ExternalTransport interface

Reply via email to