On Wed, 1 Dec 2004 22:11:49 -0700 (MST) Jim <[EMAIL PROTECTED]> wrote:
> If you could come up with a decent algorithm for determining whether a > document is in the language you are interested in, you might be able to > filter at this level. It shouldn't be too hard, I did a word count in a 63,000 word document and found that the 5 words [the to of and that] made up 2.3% of the total. I don't have any equivalent documents in german on disk, but a quick look at a book in german shows that [das der den zu und] would probably make an equal percentage. (Missing out die because it is also a fairly common english word). Mike -- Mike Causer Email - mailto:[EMAIL PROTECTED] GPG KeyID 1C2DDA07 WWW - http://www.mikecauser.com Flood the fen again! - Wicken Fen enlargement - http://www.wicken.org.uk ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://productguide.itmanagersjournal.com/ _______________________________________________ ht://Dig general mailing list: <[EMAIL PROTECTED]> ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html List information (subscribe/unsubscribe, etc.) https://lists.sourceforge.net/lists/listinfo/htdig-general

