At 10:56 AM -0700 10/3/99, Andrew Scherpbier wrote:
>a) less memory usage... no need to store the whole document in memory.
>b) faster digging... I/O sleep time can be used to parse.
>The simplest solution would be to use threads to accomplish all of this, but
>it can be done without threads as well.
Both of these would be nice. Heck, full multithreading would help in
many places. But that's another ball of wax.
>So, assuming external retriever and external parser, htdig would start the
>retriever and reads the header. Assuming the retriever can get the requested
>document, htdig can now start the parser and tie the retriever's output to
>the parser's input. htdig then ties the output of the parser to the database
>updater. Voila! (Boy, this would be *so* trivial in java!!!!)
>
>Any thoughts on this?
This would be extremely useful in the case of using both external
retriever and external parser. However, I can't quite see how it
works conceptually with an internal parser. (I just got back from a
run so it might be obvious and I'm just in oxygen debt? ;-)
> > Does this seem like a reasonable interface? Should the script receive
> > more information? (referring URL and credentials come to mind...)
>
>Are those used, ever? Regardless, the header format is easily extended...
I could see someone wanting to index a non-anonymous FTP site, or
writing an HTTPS transport script that might desire both. I'm a
little hesitant to send credentials to the script because it smells
of security problems, but I guess if you're allowing the script to be
run, you know what's in it.
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/
------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.