On Fri, 18 Oct 2002 at 11:09:43 -0500, Searcher wrote: > >../index -N 80 -R 64 > > > >will index handle all this? Will index eat up all the available > >memory (2GB) trying to load all these URLs in memory? I've had problems with > > My first run ever with aspseek was 'index -N 100' and it lasted three days then > died. It had plenty of memory yet I can't see what died. I then ran 'index -D' > and once done, restarted with 'index -N 25' and it's been running non stop for > over a week now with 25GB's of indexed materials now.
Kir's question has a point. How many individual sites are you indexing? Are your 10,000+ URLs individual sites or are they URLs of a single site. Actually, Kir, I've been working on tidying up mutex locking arround calls to GetNextLink() as on larger databases (~ 20,000,000 URLs) it seems index can get locked in queueing mode for significant periods of time (depending on the distribution or URLs over time). I now lock within AddUrls() arround the iteration through the CIntSet of URLs to go into the queue. It provides a window where idle threads can pop the next document regardless of the number of queued sites and seems to help quite a bit. Matt.
