Great Point!
But consider this situation, there's only one phrase in a long single
document, others are just aggregation of Simple words.
How do u handle this situation with the phrase count/ total phrase count
probability? Is the probability of the only phrase 100% while it occurs
rarely in the whole document?

Best Regards, 
    James Fang


-----邮件原件-----
发件人: algogeeks@googlegroups.com [mailto:[EMAIL PROTECTED] 代表
Shobhit Sinha
发送时间: 2007年12月5日 14:34
收件人: Algorithm Geeks
主题: [algogeeks] Re: Probability of a phrase in a text document?


Hi Abhishek,
   Regarding your question about finding probability of the occurence
of a phrase.
I guess its not as straight as dividing the total no of occurences
with total no of words/keywords/sentences.
In my view the question is about finding the probability of a
particular 'phrase' out of 'all the phrases in the document'
Therefore the dividing factor should be total no of phrases.
  Now the task comes to counting the total no of phrases which will
take a couple of things into account like full stops,comma.

May be its complex but in my view thiis is what is ideally required.
Thanx


On Dec 3, 11:48 pm, Abhishek <[EMAIL PROTECTED]> wrote:
> Hi,
>
>   I just came across an idea wherein they wanted to find out how
> frequently a particular phrase occurs in a set of documents.
> So they refer to the probability of that phrase coming in a set of
> documents put together.
> I was just wondering how they find the probability of the phrase in
> the whole set of documents.
>
> With Regards,
> Abhishek S
>
> On Dec 3, 9:51 pm, "James Fang" <[EMAIL PROTECTED]> wrote:
>
>
>
> > No problem man.
> > I wonder what's the user senario of your probability.One document or
> > multiple documents are not the matter, because u can actually combine
their
> > stastistics together.
>
> > Best Regards,
> >     James Fang
>
> > -----Original Mail-----
> > 发件人: algogeeks@googlegroups.com [mailto:[EMAIL PROTECTED]
代表
> > Abhishek
> > 发送时间: 2007年12月3日 17:17
> > 收件人: Algorithm Geeks
> > 主题: [algogeeks] Re: Probability of a phrase in a text document?
>
> > Thanks James. I was thinking on the same lines too.
> > I guess I have some homework to be done on this regard :)
>
> > With Regards,
> > Abhishek S
>
> > On Dec 3, 1:27 pm, "James Fang" <[EMAIL PROTECTED]> wrote:
>
> > > Hi Abhishek,
>
> > >         You need to build up your metric for "probability" first.
> > >         For e.g.,
> > > 1. keywords occurrence/total words count
> > > 2. Keywords occurrence/total sentences
> > > 3. the number of files who contain keyword / total files number
>
> > > Best Regards,
> > >     James Fang
>
> > > -----邮件原件-----
> > > 发件人: algogeeks@googlegroups.com [mailto:[EMAIL PROTECTED]
代
> > 表
> > > Abhishek
> > > 发送时间: 2007年12月3日 16:10
> > > 收件人: Algorithm Geeks
> > > 主题: [algogeeks] Probability of a phrase in a text document?
>
> > > Hi,
> > >     If I have a large corpus of text documents and I need to find the
> > > probability of occurence of a phrase like "I am" in the given set of
> > > text documents, how do I go about finding  the value?
> > > I can very well search how many time does the phrase "I am" occurs in
> > > the whole set of text documents including all the sentences, but what
> > > do i divide the count by?
> > > Thanks
>
> > > With Regards,
> > > Abhishek S- Hide quoted text -
>
> - Show quoted text -


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Algorithm Geeks" group.
To post to this group, send email to algogeeks@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/algogeeks
-~----------~----~----~----~------~----~------~--~---

Reply via email to