"������ �������" wrote:
> What will be with htdig if I'll try to index and search 50Gb
> of text? It's serious - I  have to  do it but  can't make an

First off, you'll need a monster of a machine. I'd guess you'll need at
least 100GB for storage and temporary space. You'll probably also want
something around 1GB of RAM. These are first guesses, I'm probably on
the low end since I've never indexed anywhere near that amount of text.
(Note that these requirements are not limited to ht://Dig--the nature of
indexing that amount of text is going to require those resources.)

> assumption  on  how  much  time the  search will occur, what
> algorithm to choose not to get 1,000,000 results...

The first step towards that will be to trim out very common words. IMHO,
you really don't get anything useful from a search that returns
1,000,000 documents anyway. If you agree, you can take a look at common
words in db.wordlist (for example cut -f 1 -d ' ' | sort | uniq -c |
sort -rn) and add most of these to the bad_words list.

-- 
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/
------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.

Reply via email to