Re: co-occurrence paper and code

Sebastian Schelter Wed, 06 Aug 2014 17:21:35 -0700

Sounds good to me.

-s
Am 06.08.2014 17:15 schrieb "Dmitriy Lyubimov" <[email protected]>:


> what i mean here i probably need to refactor it a little so that there's
> part of algorithm that accepts co-occurrence input directly and which is
> somewhat decoupled from the part that accepts u x item input and does
> downsampling and co-occurrence construction. So i could do some
> customization of my own to co-occurrence construction. Would that be
> reasonable if i do that?
>
>
> On Wed, Aug 6, 2014 at 5:12 PM, Dmitriy Lyubimov <[email protected]>
> wrote:
>
> > Asking because i am considering pulling this implementation but for some
> > (mostly political) reasons people want to try different things here.
> >
> > I may also have to start with a different way of constructing
> > co-occurrences, and may do a few optimizations there (i.e. priority queue
> > queing/enqueing does twice the work it really needs to do etc.)
> >
> >
> >
> >
> > On Wed, Aug 6, 2014 at 5:05 PM, Sebastian Schelter <
> > [email protected]> wrote:
> >
> >> I chose against porting all the similarity measures to the dsl version
> of
> >> the cooccurrence analysis for two reasons. First, adding the measures
> in a
> >> generalizable way makes the code superhard to read. Second, in
> practice, I
> >> have never seen something giving better results than llr. As Ted pointed
> >> out, a lot of the foundations of using similarity measures comes from
> >> wanting to predict ratings, which people never do in practice. I think
> we
> >> should restrict ourselves to approaches that work with implicit,
> >> count-like
> >> data.
> >>
> >> -s
> >> Am 06.08.2014 16:58 schrieb "Ted Dunning" <[email protected]>:
> >>
> >> > On Wed, Aug 6, 2014 at 5:49 PM, Dmitriy Lyubimov <[email protected]>
> >> > wrote:
> >> >
> >> > > On Wed, Aug 6, 2014 at 4:21 PM, Dmitriy Lyubimov <[email protected]
> >
> >> > > wrote:
> >> > >
> >> > > I suppose in that context LLR is considered a distance (higher
> scores
> >> > mean
> >> > > > more `distant` items, co-occurring by chance only)?
> >> > > >
> >> > >
> >> > > Self-correction on this one -- having given a quick look at llr
> paper
> >> > > again, it looks like it is actually a similarity (higher scores
> >> meaning
> >> > > more stable co-occurrences, i.e. it moves in the opposite direction
> of
> >> > >  p-value if it had been a classic  test
> >> > >
> >> >
> >> > LLR is a classic test.  It is essentially Pearson's chi^2 test without
> >> the
> >> > normal approximation.  See my papers[1][2] introducing the test into
> >> > computational linguistics (which ultimately brought it into all kinds
> of
> >> > fields including recommendations) and also references for the G^2
> >> test[3].
> >> >
> >> > [1] http://www.aclweb.org/anthology/J93-1003
> >> > [2] http://arxiv.org/abs/1207.1847
> >> > [3] http://en.wikipedia.org/wiki/G-test
> >> >
> >>
> >
> >
>

Re: co-occurrence paper and code

Reply via email to