> On July 14, 2014, 6:57 p.m., Matthew Hayes wrote: > > datafu-pig/src/main/java/datafu/pig/stats/LogScoringFunction.java, line 14 > > <https://reviews.apache.org/r/23384/diff/1/?file=627467#file627467line14> > > > > Positions 0,1,2,3 yield values: > > > > 0 => 1 > > 1 => LOG2/Math.log(3) > > 2 => LOG2/Math.log(4) > > 3 => LOG2/Math.log(5) > > > > Is this right? > > > > http://en.wikipedia.org/wiki/Discounted_cumulative_gain > > > > Joshua Hartman wrote: > As I've implemented it so far, yes. This does not appear completely > inline with the first definition though I like it better :) > > First, there's the 2nd method listed on wikipedia that weights top > positions more heavily and documented in these papers: > > http://nlp.stanford.edu/IR-book/html/htmledition/evaluation-of-ranked-retrieval-results-1.html > > http://research.microsoft.com/en-us/um/people/cburges/papers/icml_ranking.pdf > > The logarithmic one is mentioned in this text: > http://www.pearsonhighered.com/croft1epreview/pdf/chap8.pdf > > You can pick both any discounting function and for this function you > could pick any logarithm. This implementation adds an extra 1 to the position > so that positions 0 and 1 will not have the same value. If you notice the 2nd > discounting function listed on wikipedia, 1 + position is also done (and > position seems to be 1-based). I can change this to be inline with the > wikipedia definition for the first. Would you suggest that? The definition as > implemented is probably a bit better, but might confuse people. > > Wikipedia itself notes the confusion, see the paragraph > "In Croft, Metzler and Strohman (page 320, 2010), the authors mistakenly > claim that these two formulations of DCG are the same when the relevance > values of documents are binary; rel_{i} \in \{0,1\}. To see that they are not > the same, let there be one relevant document and that relevant document is at > rank 2. The first version of DCG equals 1 / log2(2) = 1. The second version > of DCG equals 1 / log2(2+1) = 0.631. The way that the two formulations of DCG > are the same for binary judgments is in the way gain in the numerator is > calculated. For both formulations of DCG, binary relevance produces gain at > rank i of 0 or 1. No matter the number of relevance grades, the two > formulations differ in their discount of gain." > > Joshua Hartman wrote: > V2 uses standard definitions to reduce confusion as much as possible... > Even though I think the log2 one is stupid
I think it's worth following the standard definitions so people aren't confused. We can include improved scoring functions as alternatives :) - Matthew ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/23384/#review47725 ----------------------------------------------------------- On July 21, 2014, 5:06 p.m., Joshua Hartman wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/23384/ > ----------------------------------------------------------- > > (Updated July 21, 2014, 5:06 p.m.) > > > Review request for DataFu and Matthew Hayes. > > > Repository: datafu > > > Description > ------- > > DATAFU-60 Support for NDCG > > > Diffs > ----- > > > datafu-pig/src/main/java/datafu/pig/stats/ExponentialWeightedLog2ScoringFunction.java > PRE-CREATION > datafu-pig/src/main/java/datafu/pig/stats/Log2ScoringFunction.java > PRE-CREATION > datafu-pig/src/main/java/datafu/pig/stats/Ndcg.java PRE-CREATION > datafu-pig/src/main/java/datafu/pig/stats/PositionScoringFunction.java > PRE-CREATION > datafu-pig/src/main/java/datafu/pig/stats/RangeScoringFunction.java > PRE-CREATION > datafu-pig/src/main/java/datafu/pig/stats/UnaryScoringFunction.java > PRE-CREATION > datafu-pig/src/main/java/datafu/pig/util/NumericalRange.java PRE-CREATION > datafu-pig/src/main/java/datafu/pig/util/RangeMap.java PRE-CREATION > datafu-pig/src/test/java/datafu/test/pig/stats/NdcgTests.java PRE-CREATION > datafu-pig/src/test/java/datafu/test/pig/util/TestRange.java PRE-CREATION > datafu-pig/src/test/java/datafu/test/pig/util/TestRangeMap.java > PRE-CREATION > > Diff: https://reviews.apache.org/r/23384/diff/ > > > Testing > ------- > > Unit tests, Pig tests attached > > > Thanks, > > Joshua Hartman > >