Good [morning|day|evening|night], A new message has been posted to DataparkSearch Engine forum at http://www.dataparksearch.org/
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: Pokia Subject: Re: Spidering Very sorry for i replied so late. Hardware specs are athlon 64 3200+ (512kb L2 cache) which is actually 2200 mhz, 1GB DDR-RAM, 200 GB IDE 7200 rpm 8 MB cache and so on. Time constraint is a bit loose. So I got some time. For the db part, I will decide whether mysql 4 or postgresql 8 (which is in beta). The distro is probably debian but i'm not very sure. I asked if this software can do it or not. Can someone please tell me what are the theoritical limits? What i see while searching for a good spider that no one really pay attention to spider very interestingly. indexer theory is quite good and so there are lots of indexers, swish/++, htdig etc. Of course there are spiders but most are really experimental. My 2 cents: holding site list in memory is good as far as the memory can hold. Storing in a db may greatly slow down. a solution may be in-memory table type can be used for SQL uses. - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=02;topic_id=1103529110
