Hi Abhishek,

        You need to build up your metric for "probability" first.
        For e.g., 
1. keywords occurrence/total words count
2. Keywords occurrence/total sentences
3. the number of files who contain keyword / total files number


Best Regards, 
    James Fang

-----邮件原件-----
发件人: algogeeks@googlegroups.com [mailto:[EMAIL PROTECTED] 代表
Abhishek
发送时间: 2007年12月3日 16:10
收件人: Algorithm Geeks
主题: [algogeeks] Probability of a phrase in a text document?


Hi,
    If I have a large corpus of text documents and I need to find the
probability of occurence of a phrase like "I am" in the given set of
text documents, how do I go about finding  the value?
I can very well search how many time does the phrase "I am" occurs in
the whole set of text documents including all the sentences, but what
do i divide the count by?
Thanks

With Regards,
Abhishek S



--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Algorithm Geeks" group.
To post to this group, send email to algogeeks@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/algogeeks
-~----------~----~----~----~------~----~------~--~---

Reply via email to