How does one actually clean up the entire database of URLs that are not with 
status 200? Scenario via ./index:

Database repairing options:
  -X1           Check inverted index for deleted URLs
  -X2           Fix database to delete deleted URLs
  -H            Recreate citation indexes and ranks

Use "index -X1" to check inverted index for URLs for which "urlword.deleted"
field is non-zero. Use "index -X2" to fix it by appending information about
deleted keys to the delta files. So if you want to remove records where
"urlword.deleted" is non-zero, run index -X2; index -D, and finally perform
SQL statements to delete unnecessary records.

So my question is what "SQL statements to delete unnecessary records" do I 
use to completely clean up the database tables and keep aspseek happy at the 
same time?

I would like to "truly" clean up the database of URLs not yet indexed or 
returned any error status != 200. I know if I just go in mysql and do this:

DELETE FROM urlword WHERE status !=200;

It's not going to clean up other tables that need cleaning and it will 
probably break aspseek's ability to function properly.

So my question is what needs to be done and in what order? I'll be happy to 
write a Perl script to do this maintenance and also clean up the stats table 
if I only knew what all needs to be done.

Basically I would like to keep things as compact as possible and keep the 
database tables optimized. Any help in this matter would be appreciated.

Thanks John


_________________________________________________________________
Send and receive Hotmail on your mobile device: http://mobile.msn.com

Reply via email to