what i mean here i probably need to refactor it a little so that there's part of algorithm that accepts co-occurrence input directly and which is somewhat decoupled from the part that accepts u x item input and does downsampling and co-occurrence construction. So i could do some customization of my own to co-occurrence construction. Would that be reasonable if i do that?
On Wed, Aug 6, 2014 at 5:12 PM, Dmitriy Lyubimov <[email protected]> wrote: > Asking because i am considering pulling this implementation but for some > (mostly political) reasons people want to try different things here. > > I may also have to start with a different way of constructing > co-occurrences, and may do a few optimizations there (i.e. priority queue > queing/enqueing does twice the work it really needs to do etc.) > > > > > On Wed, Aug 6, 2014 at 5:05 PM, Sebastian Schelter < > [email protected]> wrote: > >> I chose against porting all the similarity measures to the dsl version of >> the cooccurrence analysis for two reasons. First, adding the measures in a >> generalizable way makes the code superhard to read. Second, in practice, I >> have never seen something giving better results than llr. As Ted pointed >> out, a lot of the foundations of using similarity measures comes from >> wanting to predict ratings, which people never do in practice. I think we >> should restrict ourselves to approaches that work with implicit, >> count-like >> data. >> >> -s >> Am 06.08.2014 16:58 schrieb "Ted Dunning" <[email protected]>: >> >> > On Wed, Aug 6, 2014 at 5:49 PM, Dmitriy Lyubimov <[email protected]> >> > wrote: >> > >> > > On Wed, Aug 6, 2014 at 4:21 PM, Dmitriy Lyubimov <[email protected]> >> > > wrote: >> > > >> > > I suppose in that context LLR is considered a distance (higher scores >> > mean >> > > > more `distant` items, co-occurring by chance only)? >> > > > >> > > >> > > Self-correction on this one -- having given a quick look at llr paper >> > > again, it looks like it is actually a similarity (higher scores >> meaning >> > > more stable co-occurrences, i.e. it moves in the opposite direction of >> > > p-value if it had been a classic test >> > > >> > >> > LLR is a classic test. It is essentially Pearson's chi^2 test without >> the >> > normal approximation. See my papers[1][2] introducing the test into >> > computational linguistics (which ultimately brought it into all kinds of >> > fields including recommendations) and also references for the G^2 >> test[3]. >> > >> > [1] http://www.aclweb.org/anthology/J93-1003 >> > [2] http://arxiv.org/abs/1207.1847 >> > [3] http://en.wikipedia.org/wiki/G-test >> > >> > >
