Asking because i am considering pulling this implementation but for some (mostly political) reasons people want to try different things here.
I may also have to start with a different way of constructing co-occurrences, and may do a few optimizations there (i.e. priority queue queing/enqueing does twice the work it really needs to do etc.) On Wed, Aug 6, 2014 at 5:05 PM, Sebastian Schelter <ssc.o...@googlemail.com> wrote: > I chose against porting all the similarity measures to the dsl version of > the cooccurrence analysis for two reasons. First, adding the measures in a > generalizable way makes the code superhard to read. Second, in practice, I > have never seen something giving better results than llr. As Ted pointed > out, a lot of the foundations of using similarity measures comes from > wanting to predict ratings, which people never do in practice. I think we > should restrict ourselves to approaches that work with implicit, count-like > data. > > -s > Am 06.08.2014 16:58 schrieb "Ted Dunning" <ted.dunn...@gmail.com>: > > > On Wed, Aug 6, 2014 at 5:49 PM, Dmitriy Lyubimov <dlie...@gmail.com> > > wrote: > > > > > On Wed, Aug 6, 2014 at 4:21 PM, Dmitriy Lyubimov <dlie...@gmail.com> > > > wrote: > > > > > > I suppose in that context LLR is considered a distance (higher scores > > mean > > > > more `distant` items, co-occurring by chance only)? > > > > > > > > > > Self-correction on this one -- having given a quick look at llr paper > > > again, it looks like it is actually a similarity (higher scores meaning > > > more stable co-occurrences, i.e. it moves in the opposite direction of > > > p-value if it had been a classic test > > > > > > > LLR is a classic test. It is essentially Pearson's chi^2 test without > the > > normal approximation. See my papers[1][2] introducing the test into > > computational linguistics (which ultimately brought it into all kinds of > > fields including recommendations) and also references for the G^2 > test[3]. > > > > [1] http://www.aclweb.org/anthology/J93-1003 > > [2] http://arxiv.org/abs/1207.1847 > > [3] http://en.wikipedia.org/wiki/G-test > > >