On Wed, 1 Dec 2004 22:11:49 -0700 (MST) Jim <[EMAIL PROTECTED]> wrote:

> If you could come up with a decent algorithm for determining whether a
> document is in the language you are interested in, you might be able to
> filter at this level.

It shouldn't be too hard, I did a word count in a 63,000 word document
and found that the 5 words [the to of and that] made up 2.3% of the
total.  I don't have any equivalent documents in german on disk, but a
quick look at a book in german shows that [das der den zu und] would
probably make an equal percentage.  (Missing out die because it is also
a fairly common english word). 



Mike
-- 
Mike Causer                          Email - mailto:[EMAIL PROTECTED]
GPG KeyID 1C2DDA07                       WWW - http://www.mikecauser.com
Flood the fen again! - Wicken Fen enlargement - http://www.wicken.org.uk


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now. 
http://productguide.itmanagersjournal.com/
_______________________________________________
ht://Dig general mailing list: <[EMAIL PROTECTED]>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general

Reply via email to