Good [morning|day|evening|night],

A new message has been posted to DataparkSearch Engine forum at 
http://www.dataparksearch.org/

- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: Pokia
Subject: Re: Spidering

Very sorry for i replied so late.

Hardware specs are athlon 64 3200+ (512kb L2 cache) which is actually 2200 mhz, 
1GB DDR-RAM, 200 GB IDE 7200 rpm 8 MB cache and so on. Time constraint is a bit 
loose. So I got some time. For the db part, I will decide whether mysql 4 or 
postgresql 8 (which is in beta). The distro is probably debian but i'm not very 
sure. 

I asked if this software can do it or not. Can someone please tell me what are 
the theoritical limits? What i see while searching for a good spider that no 
one really pay attention to spider very interestingly. indexer theory is quite 
good and so there are lots of indexers, swish/++, htdig etc. Of course there 
are spiders but most are really experimental.

My 2 cents: holding site list in memory is good as far as the memory can hold. 
Storing in a db may greatly slow down. a solution may be in-memory table type 
can be used for SQL uses.
- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=02;topic_id=1103529110

Reply via email to