Re: co-occurrence paper and code

Dmitriy Lyubimov Thu, 07 Aug 2014 14:01:44 -0700

if exploration and bootstrap are concerns, in my case its saturation is
achieved by a different methodology. I want this threshold to be, (1) of
course optional, and (2) be expressed in confidence level, %, just to
understand the ballpark in each case.


Ok i think i understand the code to convert confidence into LLR threshold
(and vice versa). Thanks.


On Thu, Aug 7, 2014 at 1:38 PM, Ted Dunning <[email protected]> wrote:

> Yes.  This is a good thresholding to do.
>
> Typically I have done this by simply providing a threshold on the LLR score
> itself.  It is convenient to restate the score itself as the signed square
> root of the score since that lets you add information about whether the
> cooccurrence is more or less common than expected and it puts the scale
> essentially on the same scale as standard deviations from a normal
> distribution.  On that scale, a cutoff in the range of 5 to 15 is commonly
> used.  The fact that 5 standard deviations represents a p-value of about 3
> x 10^-7 is indicative of how stringent this criterion would be if it were a
> frequentist hypothesis test.
>
> In practice, I haven't found that the cutoff is that useful.  Part of the
> reason for this is that I would just as soon have some wild-eyed behavior
> happen with low data situations so that some kind of recommendations happen
> and we can gather more data.
>
>
> I don't see this threshold in our current RowSimilarityJob.  There is a
> threshold, but it is applied to counts on only some similarity classes.
>
>
>
> On Wed, Aug 6, 2014 at 5:07 PM, Dmitriy Lyubimov <[email protected]>
> wrote:
>
> > On Wed, Aug 6, 2014 at 5:04 PM, Ted Dunning <[email protected]>
> wrote:
> >
> > > On Wed, Aug 6, 2014 at 6:01 PM, Dmitriy Lyubimov <[email protected]>
> > > wrote:
> > >
> > > > > LLR is a classic test.
> > > >
> > > >
> > > > What i meant here it doesn't produce a p-value. or does it?
> > > >
> > >
> > > It produces an asymptotically chi^2 distributed statistic with 1-degree
> > of
> > > freedom (for our case of 2x2 contingency tables) which can be reduced
> > > trivially to a p-value in the standard way.
> > >
> >
> > Great. so that means that we can do h_0 rejection based on a %-expressed
> > level?
> >
>

Re: co-occurrence paper and code

Reply via email to