Re: [aseek-users] Time frame thoughts

Karen Barnes Tue, 29 Oct 2002 15:55:47 -0800

Hi Mike,

>That will tell you how to stop a running index. From another shell:
>../index -E

That's what I wasn't sure about, from another shell. Now I know, thanks.

>That will safely terminate an already running index. This will NOT
>update your search engine with the newly index sites. To do that you have
>to do this:
>
>index -D

What comes to mind is my wondering if I should kill the process now and then, run
Index -D, then restart the indexing again? Do you think that would have any
benefit over running a huge process all the way through?

Mike

I don't know if it has any benefits and probably doesn't, but at least your indexing that you have been doing will be availble for searching without having to wait for the indexing to fully complete.

If you'll notice I index 1.5 million URLs a day. I'm a little concerned why you've been running the index for so long. I wonder if your settings in aspseek.conf are correct. Are you leaving enought time between re-indexing using the "Period" command? For example; if you have this set like the following:

Period 14d

then you have set a reindex every 14 days and if you run the indexer for 14 days non stop the process is going to start all over again and never finish. When I run an initial crawl I set this to a very large number like this:

Period 1y

That prevents the indexer from reindexing already fetched URLs for one year. Once my indexing is complete I might do something like this:

Period 1m14d

one month 14 days and run the indexer based on the URLs that are already in the index.

This allows me to add new URLs very quickly and have them ready for searching and then let the re-indexing commence later.

I don't know how you are indexing, but in my case I don't want to index the entire Web or follow all URLs found on every page I index. I already had 3 million URLs that I wanted to index so I created 15 different files each containing 200,000 URLs and inserted one file every 4 hours: Example;

./index -i -f ./urls.txt

Then I made sure I'm not hoping so I change this in aspseek.conf:

Maxhops 0

Then I run the indexer:

./index -N 80 -R 64

and within about 3-4 hours indexer will have indexed the 200,000 URLs and those will be ready for searching. Of coures doing things this way will prevent indexer from creating the so called "popularity" ranks which is calculated when it finds a link from site a to site b during the indexing process. It would be nice if index had the ability to index a single URL and then see how many documents in the index link to this page. That way you can have the best of both worlds.

Regards,
Karen

_________________________________________________________________
Surf the Web without missing calls!�Get MSN Broadband. http://resourcecenter.msn.com/access/plans/freeactivation.asp

Re: [aseek-users] Time frame thoughts

Reply via email to