Good [morning|day|evening|night], A new message has been posted to DataparkSearch Engine forum at http://www.dataparksearch.org/
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: Pokia Subject: Re: Spidering For indexing and searching part this server is not suitable for production use. What I do with that machine is just spidering which is not dependent much on hardware. Also because my time constraint is a bit loose i can wait while indexing. Are there any compressing process? If yes how effective does that? How do software search against the index? any partitioning to several servers? Of course searching from a 10 GB index is terribly slow even if you have very high-end hardware. A solution may be highly-ranked pages' index is on fast servers and low-ranked ones is on slow servers and so on. What about this? Index is partitioned and so smalled down so faster search results. How is the incremental update takes place? Must all index re-generated as a whole? Another issue starts after indexing. We have to spread the index to several frontend machines namely at least 20 for a good search time. SpiderWeb what spider did you use for spidering 50.000.000 pages? My current real problem is a good spider. - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=02;topic_id=1103529110;page=2
