Hello Gerrit, I take this at face value:
[index --help] -s status Limit index to documents matching status (HTTP Status code) First off the "-s status" as it states above says Limit "index" to documents matching status (HTTP Status code). It doesn't say anything about deleting based on status. Sure I know I can do this: ./index -s 0 and have all documents with a status of zero "indexed", but what about deleting? I know I can do this too: ./index -C -u "http://badurl/" and delete the "URL" which isn't really in the manual, but it does give a kind of reference that you can do this. An example of course would have been nice for us that don't understand all this simple manual stuff. I know I always provide examples when I am providing support. What I do know is that I have read posts where Kir said you can't just delete a URL. He said the ONLY way to delete a URL from the database is by using this command. ./index -C -u "http://badurl/" It makes perfect sense right? The "-u" is URL and the "-C" is CLEAR. So "-C -u" makes perfect sense because we want to "C"lear the "U"rl from the database. So when I read this post by Kir it had me scared about how to go about deleting any other way. Put me behind a MySQL prompt and I'm a happy camper as long as I know the structure of the database, the keys and how each table interacts with one another. There is a brief discussion on this in the manual, but there's more to this program than just MySQL database tables. There are citations and a bunch of other files in the /usr/local/aspseek/var/" directory to worry about which is not covered in the manual that I have seen. According to you I can do the following without one bit of fear something will break: ./index -C -s 1 ./index -C -s 202 ./index -C -s 204 ./index -C -s 205 ./index -C -s 300 ./index -C -s 301 ./index -C -s 302 ./index -C -s 303 ./index -C -s 307 ./index -C -s 400 ./index -C -s 401 ./index -C -s 402 ./index -C -s 403 ./index -C -s 404 ./index -C -s 405 ./index -C -s 407 ./index -C -s 408 ./index -C -s 410 ./index -C -s 415 ./index -C -s 500 ./index -C -s 501 ./index -C -s 502 ./index -C -s 503 ./index -C -s 504 ./index -C -s 508 and I don't have to do anything else. I don't have to enter MySQL and at least OPTIMIZE TABLE X. No fear citations or other aspseek only files will be corrupted? I don't have to do this either right: ./index -X1 # Check inverted index for deleted URLs ./index -X2 # Fix database to delete deleted URLs ./index -H # Recreate citation indexes and ranks If this is true then in what sequence should all this be done? And if it is this simple why is it not included in the manual in these simple terms as a sample? I thought the whole reason of having a "users" mailing list is for us "users" to understand how the program works and to ask questions. I always thought the stupid question was the question you didn't ask. You answer the question as if the questions were stupid questions. If a user knows the answer then answer. If not maybe Kir will answer. That's what a mailing list is all about I thought. Don't tell me you have NEVER asked a question that was in the "manual" in your entire life. Come on now be honest. You have no trouble stopping that blinking time stamp on your VCR right. You took the time to answer this post, but you didn't provide an answer now that to me is more stupid than the question asked. All you said is "read the manual". If you took the time to reply why not answer the question and then reference to the "manual" which would be much more polite. I don't answer if I don't know the answer. I'm not going to answer posts in this list by stating "read the manual" just to make someone look stupid. That's rude and doesn't help anyone. If you don't know the answer and can't help, don't post. That's my personal rule. Don't just post a message to get your name out there like you are some "cool dude" that always reads and understands every manual you have ever read. Regards, Karen >>I too wonder how to get rid of all those non status 200 urls. I have a few >>million 404's and you know 404 means not found. It also means the web site >>owner removed them from their html tree so they will most likely never be >>available. Actually anything above 200 I don't really care to have around. >>Why keep all this around? I think anything that's not returned as a 200 >>should be removed if I want to remove them, but I don't know how to do >>this. >> >>Maybe Kir will be so kind to tell us how to get rid of all the non 200 >>status urls without us breaking something. Maybe he'll have an answer why >>index -D causes this error too. Who knows? > >Do you actually bother looking at the manuals or the help? > >[index --help] >-C Clear database >-s status Limit index to documents matching status (HTTP Status code) > >I think this will help you quite a bit along the way... > >- G _________________________________________________________________ MSN Photos is the easiest way to share and print your photos: http://photos.msn.com/support/worldwide.aspx
