I got HTDig to work and indexed a few test sites (even www.htdig.org
inadvertently). So, I'm excited. However, I don't get the bigger picture.
If I want to have a continually indexing search engine, then I suppose I
need to setup some sort of htdig daemon using the -a command to create an
off-line database. At the same time, I can write a utility which allows
people (at first my assistant) to add URLs via the web which end up in
the start.url document. And, I also understand that htmerge needs to be
run after htdig is finished. But how will I get all of this working
together?
More specifically:
If the start.url document is append to while htdig is digging, then will
htdig see the changes?
Can htmerge be run while htdig is running? If not, then how do I get
htmerge to stop htdig and restart it where it left off.
If the start.url is constantly being appended to, when will htdig go back
and check previously-indexed sites?
It seems that htdig remembers all sites it has indexed (even once they
are removed from start.url). If that's the case, then how do I get it to
stop indexing a site. And then how do I get it to remove that site's
information from the dbs?
I suppose I can kludge together my own solutions to these issues (like
creating a start.url with only one entry, running htdig then htmerge (or
more precisely runhtdig), then creating a new start.url with another
entry, etc.... However I'd rather not reinvent the wheel. Any adivce?
Thanks,
Frank
------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word unsubscribe in
the SUBJECT of the message.