At 5:00 PM +0200 9/10/99, [EMAIL PROTECTED] wrote:
>         re-entrant split WordList in two classes. One is the central
>         point and has access to the database. The other is lightweight
>         and only contains the context of insertion/retrieval of words.

This is probably true. I did it as it stands because it would require 
the least amount of changes to other parts of the code (i.e. 
Retriever). But the rest of the code will need to change anyway.

As a slightly better work-around, we could change the Word() method 
to allow an optional DocID parameter for the AddDescription call. 
This method would just use the DocID as a temporary matter.

If we factor the code into two separate files, I suggest we make a 
WordDB class and leave WordList as a lightweight class.

On a related subject, we're going to need to go through the Retriever 
class with a fine-toothed comb. It works, but I'm a bit worried about 
a number of issues like those mentioned:

1) It hasn't really changed since 3.1.x--up until this week, it would 
still call MarkScanned, MarkGone, etc.
2) The got_word method should just take a WordRecord flag instead of 
all this mess with the factor[] array.
3) The URL validation doesn't check a URL against the server's 
robots.txt information.

-Geoff


------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.

Reply via email to