what i mean here i probably need to refactor it a little so that there's
part of algorithm that accepts co-occurrence input directly and which is
somewhat decoupled from the part that accepts u x item input and does
downsampling and co-occurrence construction. So i could do some
customization of my own to co-occurrence construction. Would that be
reasonable if i do that?


On Wed, Aug 6, 2014 at 5:12 PM, Dmitriy Lyubimov <[email protected]> wrote:

> Asking because i am considering pulling this implementation but for some
> (mostly political) reasons people want to try different things here.
>
> I may also have to start with a different way of constructing
> co-occurrences, and may do a few optimizations there (i.e. priority queue
> queing/enqueing does twice the work it really needs to do etc.)
>
>
>
>
> On Wed, Aug 6, 2014 at 5:05 PM, Sebastian Schelter <
> [email protected]> wrote:
>
>> I chose against porting all the similarity measures to the dsl version of
>> the cooccurrence analysis for two reasons. First, adding the measures in a
>> generalizable way makes the code superhard to read. Second, in practice, I
>> have never seen something giving better results than llr. As Ted pointed
>> out, a lot of the foundations of using similarity measures comes from
>> wanting to predict ratings, which people never do in practice. I think we
>> should restrict ourselves to approaches that work with implicit,
>> count-like
>> data.
>>
>> -s
>> Am 06.08.2014 16:58 schrieb "Ted Dunning" <[email protected]>:
>>
>> > On Wed, Aug 6, 2014 at 5:49 PM, Dmitriy Lyubimov <[email protected]>
>> > wrote:
>> >
>> > > On Wed, Aug 6, 2014 at 4:21 PM, Dmitriy Lyubimov <[email protected]>
>> > > wrote:
>> > >
>> > > I suppose in that context LLR is considered a distance (higher scores
>> > mean
>> > > > more `distant` items, co-occurring by chance only)?
>> > > >
>> > >
>> > > Self-correction on this one -- having given a quick look at llr paper
>> > > again, it looks like it is actually a similarity (higher scores
>> meaning
>> > > more stable co-occurrences, i.e. it moves in the opposite direction of
>> > >  p-value if it had been a classic  test
>> > >
>> >
>> > LLR is a classic test.  It is essentially Pearson's chi^2 test without
>> the
>> > normal approximation.  See my papers[1][2] introducing the test into
>> > computational linguistics (which ultimately brought it into all kinds of
>> > fields including recommendations) and also references for the G^2
>> test[3].
>> >
>> > [1] http://www.aclweb.org/anthology/J93-1003
>> > [2] http://arxiv.org/abs/1207.1847
>> > [3] http://en.wikipedia.org/wiki/G-test
>> >
>>
>
>

Reply via email to