> On July 14, 2014, 6:57 p.m., Matthew Hayes wrote:
> > datafu-pig/src/main/java/datafu/pig/stats/LogScoringFunction.java, line 14
> > <https://reviews.apache.org/r/23384/diff/1/?file=627467#file627467line14>
> >
> >     Positions 0,1,2,3 yield values:
> >     
> >     0 => 1
> >     1 => LOG2/Math.log(3)
> >     2 => LOG2/Math.log(4)
> >     3 => LOG2/Math.log(5)
> >     
> >     Is this right?
> >     
> >     http://en.wikipedia.org/wiki/Discounted_cumulative_gain
> >
> 
> Joshua Hartman wrote:
>     As I've implemented it so far, yes. This does not appear completely 
> inline with the first definition though I like it better :)
>     
>     First, there's the 2nd method listed on wikipedia that weights top 
> positions more heavily and documented in these papers:
>     
> http://nlp.stanford.edu/IR-book/html/htmledition/evaluation-of-ranked-retrieval-results-1.html
>     
> http://research.microsoft.com/en-us/um/people/cburges/papers/icml_ranking.pdf
>     
>     The logarithmic one is mentioned in this text:
>     http://www.pearsonhighered.com/croft1epreview/pdf/chap8.pdf
>     
>     You can pick both any discounting function and for this function you 
> could pick any logarithm. This implementation adds an extra 1 to the position 
> so that positions 0 and 1 will not have the same value. If you notice the 2nd 
> discounting function listed on wikipedia, 1 + position is also done (and 
> position seems to be 1-based). I can change this to be inline with the 
> wikipedia definition for the first. Would you suggest that? The definition as 
> implemented is probably a bit better, but might confuse people.
>     
>     Wikipedia itself notes the confusion, see the paragraph 
>     "In Croft, Metzler and Strohman (page 320, 2010), the authors mistakenly 
> claim that these two formulations of DCG are the same when the relevance 
> values of documents are binary; rel_{i} \in \{0,1\}. To see that they are not 
> the same, let there be one relevant document and that relevant document is at 
> rank 2. The first version of DCG equals 1 / log2(2) = 1. The second version 
> of DCG equals 1 / log2(2+1) = 0.631. The way that the two formulations of DCG 
> are the same for binary judgments is in the way gain in the numerator is 
> calculated. For both formulations of DCG, binary relevance produces gain at 
> rank i of 0 or 1. No matter the number of relevance grades, the two 
> formulations differ in their discount of gain."
> 
> Joshua Hartman wrote:
>     V2 uses standard definitions to reduce confusion as much as possible... 
> Even though I think the log2 one is stupid

I think it's worth following the standard definitions so people aren't 
confused.  We can include improved scoring functions as alternatives :)


- Matthew


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23384/#review47725
-----------------------------------------------------------


On July 21, 2014, 5:06 p.m., Joshua Hartman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/23384/
> -----------------------------------------------------------
> 
> (Updated July 21, 2014, 5:06 p.m.)
> 
> 
> Review request for DataFu and Matthew Hayes.
> 
> 
> Repository: datafu
> 
> 
> Description
> -------
> 
> DATAFU-60 Support for NDCG
> 
> 
> Diffs
> -----
> 
>   
> datafu-pig/src/main/java/datafu/pig/stats/ExponentialWeightedLog2ScoringFunction.java
>  PRE-CREATION 
>   datafu-pig/src/main/java/datafu/pig/stats/Log2ScoringFunction.java 
> PRE-CREATION 
>   datafu-pig/src/main/java/datafu/pig/stats/Ndcg.java PRE-CREATION 
>   datafu-pig/src/main/java/datafu/pig/stats/PositionScoringFunction.java 
> PRE-CREATION 
>   datafu-pig/src/main/java/datafu/pig/stats/RangeScoringFunction.java 
> PRE-CREATION 
>   datafu-pig/src/main/java/datafu/pig/stats/UnaryScoringFunction.java 
> PRE-CREATION 
>   datafu-pig/src/main/java/datafu/pig/util/NumericalRange.java PRE-CREATION 
>   datafu-pig/src/main/java/datafu/pig/util/RangeMap.java PRE-CREATION 
>   datafu-pig/src/test/java/datafu/test/pig/stats/NdcgTests.java PRE-CREATION 
>   datafu-pig/src/test/java/datafu/test/pig/util/TestRange.java PRE-CREATION 
>   datafu-pig/src/test/java/datafu/test/pig/util/TestRangeMap.java 
> PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/23384/diff/
> 
> 
> Testing
> -------
> 
> Unit tests, Pig tests attached
> 
> 
> Thanks,
> 
> Joshua Hartman
> 
>

Reply via email to