[DataparkSearch Forum] Re: Spidering

DataparkSearchForum Tue, 18 Jan 2005 23:41:34 -0800

Good [morning|day|evening|night],

A new message has been posted to DataparkSearch Engine forum at 
http://www.dataparksearch.org/


- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: Pokia
Subject: Re: Spidering

For indexing and searching part this server is not suitable for production use. 
What I do with that machine is just spidering which is not dependent much on 
hardware. Also because my time constraint is a bit loose i can wait while 
indexing. Are there any compressing process? If yes how effective does that? 
How do software search against the index? any partitioning to several servers? 
Of course searching from a 10 GB index is terribly slow even if you have very 
high-end hardware. A solution may be highly-ranked pages' index is on fast 
servers and low-ranked ones is on slow servers and so on. What about this? 
Index is partitioned and so smalled down so faster search results. How is the 
incremental update takes place? Must all index re-generated as a whole?

Another issue starts after indexing. We have to spread the index to several 
frontend machines namely at least 20 for a good search time.

SpiderWeb what spider did you use for spidering 50.000.000 pages? My current 
real problem is a good spider.
- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=02;topic_id=1103529110;page=2

[DataparkSearch Forum] Re: Spidering

Reply via email to