Hello Gerrit,

I take this at face value:

[index --help]
-s status     Limit index to documents matching status (HTTP Status code)

First off the "-s status" as it states above says Limit "index" to documents 
matching status (HTTP Status code). It doesn't say anything about deleting 
based on status. Sure I know I can do this:

./index -s 0

and have all documents with a status of zero "indexed", but what about 
deleting? I know I can do this too:

./index -C -u "http://badurl/";

and delete the "URL" which isn't really in the manual, but it does give a 
kind of reference that you can do this. An example of course would have been 
nice for us that don't understand all this simple manual stuff. I know I 
always provide examples when I am providing support.

What I do know is that I have read posts where Kir said you can't just 
delete a URL. He said the ONLY way to delete a URL from the database is by 
using this command.

./index -C -u "http://badurl/";

It makes perfect sense right? The "-u" is URL and the "-C" is CLEAR. So "-C 
-u" makes perfect sense because we want to "C"lear the "U"rl from the 
database.

So when I read this post by Kir it had me scared about how to go about 
deleting any other way.

Put me behind a MySQL prompt and I'm a happy camper as long as I know the 
structure of the database, the keys and how each table interacts with one 
another. There is a brief discussion on this in the manual, but there's more 
to this program than just MySQL database tables. There are citations and a 
bunch of other files in the /usr/local/aspseek/var/" directory to worry 
about which is not covered in the manual that I have seen.

According to you I can do the following without one bit of fear something 
will break:

./index -C -s 1
./index -C -s 202
./index -C -s 204
./index -C -s 205
./index -C -s 300
./index -C -s 301
./index -C -s 302
./index -C -s 303
./index -C -s 307
./index -C -s 400
./index -C -s 401
./index -C -s 402
./index -C -s 403
./index -C -s 404
./index -C -s 405
./index -C -s 407
./index -C -s 408
./index -C -s 410
./index -C -s 415
./index -C -s 500
./index -C -s 501
./index -C -s 502
./index -C -s 503
./index -C -s 504
./index -C -s 508

and I don't have to do anything else. I don't have to enter MySQL and at 
least OPTIMIZE TABLE X. No fear citations or other aspseek only files will 
be corrupted? I don't have to do this either right:

./index -X1 #   Check inverted index for deleted URLs
./index -X2 #   Fix database to delete deleted URLs
./index -H  #   Recreate citation indexes and ranks

If this is true then in what sequence should all this be done? And if it is 
this simple why is it not included in the manual in these simple terms as a 
sample?

I thought the whole reason of having a "users" mailing list is for us 
"users" to understand how the program works and to ask questions. I always 
thought the stupid question was the question you didn't ask. You answer the 
question as if the questions were stupid questions.

If a user knows the answer then answer. If not maybe Kir will answer. That's 
what a mailing list is all about I thought. Don't tell me you have NEVER 
asked a question that was in the "manual" in your entire life. Come on now 
be honest. You have no trouble stopping that blinking time stamp on your VCR 
right.

You took the time to answer this post, but you didn't provide an answer now 
that to me is more stupid than the question asked. All you said is "read the 
manual". If you took the time to reply why not answer the question and then 
reference to the "manual" which would be much more polite.

I don't answer if I don't know the answer. I'm not going to answer posts in 
this list by stating "read the manual" just to make someone look stupid. 
That's rude and doesn't help anyone. If you don't know the answer and can't 
help, don't post. That's my personal rule. Don't just post a message to get 
your name out there like you are some "cool dude" that always reads and 
understands every manual you have ever read.

Regards,
Karen

>>I too wonder how to get rid of all those non status 200 urls. I have a few 
>>million 404's and you know 404 means not found. It also means the web site 
>>owner removed them from their html tree so they will most likely never be 
>>available. Actually anything above 200 I don't really care to have around. 
>>Why keep all this around? I think anything that's not returned as a 200 
>>should be removed if I want to remove them, but I don't know how to do 
>>this.
>>
>>Maybe Kir will be so kind to tell us how to get rid of all the non 200 
>>status urls without us breaking something. Maybe he'll have an answer why 
>>index -D causes this error too. Who knows?
>
>Do you actually bother looking at the manuals or the help?
>
>[index --help]
>-C            Clear database
>-s status     Limit index to documents matching status (HTTP Status code)
>
>I think this will help you quite a bit along the way...
>
>- G






_________________________________________________________________
MSN Photos is the easiest way to share and print your photos: 
http://photos.msn.com/support/worldwide.aspx

Reply via email to