Poof. At your command. My best explanation so far depends on mutual information. I blogged about this once upon a time here: http://tdunning.blogspot.com/2008/03/surprise-and-coincidence.html
I am not entirely sure, but I think that the case in question involves determining where two items have interesting coocurrence in different user histories (correct me if that isn't what we are looking at). This gives us a 2 x 2 matrix of four counts that represent the four combinations of two binary conditions. These conditions are "item 1/not item 1" and "item 2/not item 2". The counts are the number of users that had that combination of conditions. What we want to know is whether knowing that a user has interacted with item 1 will help us predict whether they interacted with item 2. The reverse condition is actually identical (whether known item2 helps predict item1). Further, we are only interested really in knowing whether the relationship exists, not how strong it is (because we will handle weighting according to strength elsewhere). Mathematically, there is a nice quantity called mutual information that measures just what we want, except that it is focused on how much predictive power one item provides relative to the other. We can adjust this to get a universal metric of connectedness by multiplying by the total number of samples observed. This is an amazing thing and based on deep mathematics, so don't be upset if it doesn't seem obvious that such a simple correction is, in fact, about as good as we can do. Mutual information is nicely computed by computing the entropy of the entire matrix and then subtracting the entropies of the row and column sums. That is essentially what these logL functions are doing. My blog post mentioned above has a bit more information about this. My original paper (that Robin so kindly references in his response) didn't use this formulation and as a result is a bit harder to follow. If I had been more clever two decades ago, I would have realized that the mutual information formula is much, much simpler, but I wasn't and the paper is a bit harder to understand than it should have been. Ping me if this isn't the explanation you needed. On Thu, Mar 18, 2010 at 5:06 AM, Sean Owen <[email protected]> wrote: > Maybe we can conjure Ted to explain it, since he's the person I nicked > this from, and I have trouble explaining it intuitively. >
