if exploration and bootstrap are concerns, in my case its saturation is achieved by a different methodology. I want this threshold to be, (1) of course optional, and (2) be expressed in confidence level, %, just to understand the ballpark in each case.
Ok i think i understand the code to convert confidence into LLR threshold (and vice versa). Thanks. On Thu, Aug 7, 2014 at 1:38 PM, Ted Dunning <[email protected]> wrote: > Yes. This is a good thresholding to do. > > Typically I have done this by simply providing a threshold on the LLR score > itself. It is convenient to restate the score itself as the signed square > root of the score since that lets you add information about whether the > cooccurrence is more or less common than expected and it puts the scale > essentially on the same scale as standard deviations from a normal > distribution. On that scale, a cutoff in the range of 5 to 15 is commonly > used. The fact that 5 standard deviations represents a p-value of about 3 > x 10^-7 is indicative of how stringent this criterion would be if it were a > frequentist hypothesis test. > > In practice, I haven't found that the cutoff is that useful. Part of the > reason for this is that I would just as soon have some wild-eyed behavior > happen with low data situations so that some kind of recommendations happen > and we can gather more data. > > > I don't see this threshold in our current RowSimilarityJob. There is a > threshold, but it is applied to counts on only some similarity classes. > > > > On Wed, Aug 6, 2014 at 5:07 PM, Dmitriy Lyubimov <[email protected]> > wrote: > > > On Wed, Aug 6, 2014 at 5:04 PM, Ted Dunning <[email protected]> > wrote: > > > > > On Wed, Aug 6, 2014 at 6:01 PM, Dmitriy Lyubimov <[email protected]> > > > wrote: > > > > > > > > LLR is a classic test. > > > > > > > > > > > > What i meant here it doesn't produce a p-value. or does it? > > > > > > > > > > It produces an asymptotically chi^2 distributed statistic with 1-degree > > of > > > freedom (for our case of 2x2 contingency tables) which can be reduced > > > trivially to a p-value in the standard way. > > > > > > > Great. so that means that we can do h_0 rejection based on a %-expressed > > level? > > >
