Re: [MLLIB] RankingMetrics.precisionAt

2016-12-05 Thread Sean Owen
I read it again and that looks like it implements mean precision@k as I
would expect. What is the issue?

On Tue, Dec 6, 2016, 07:30 Maciej Szymkiewicz 
wrote:

> Hi,
>
> Could I ask for a fresh pair of eyes on this piece of code:
>
>
> https://github.com/apache/spark/blob/f830bb9170f6b853565d9dd30ca7418b93a54fe3/mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala#L59-L80
>
>   @Since("1.2.0")
>   def precisionAt(k: Int): Double = {
> require(k > 0, "ranking position k should be positive")
> predictionAndLabels.map { case (pred, lab) =>
>   val labSet = lab.toSet
>
>   if (labSet.nonEmpty) {
> val n = math.min(pred.length, k)
> var i = 0
> var cnt = 0
> while (i < n) {
>   if (labSet.contains(pred(i))) {
> cnt += 1
>   }
>   i += 1
> }
> cnt.toDouble / k
>   } else {
> logWarning("Empty ground truth set, check input data")
> 0.0
>   }
> }.mean()
>   }
>
>
> Am I the only one who thinks this doesn't do what it claims? Just for
> reference:
>
>
>-
>
> https://web.archive.org/web/20120415101144/http://sas.uwaterloo.ca/stats_navigation/techreports/04WorkingPapers/2004-09.pdf
>-
>
> https://github.com/benhamner/Metrics/blob/master/Python/ml_metrics/average_precision.py
>
> --
> Best,
> Maciej
>
>


Re: [MLLIB] RankingMetrics.precisionAt

2016-12-06 Thread Maciej Szymkiewicz
Thank you Sean.

Maybe I am just confused about the language. When I read that it returns
"the average precision at the first k ranking positions" I somehow
expect there will ap@k there and a the final output would be MAP@k not
average precision at the k-th position.

I guess it is not enough sleep.

On 12/06/2016 02:45 AM, Sean Owen wrote:
> I read it again and that looks like it implements mean precision@k as
> I would expect. What is the issue?
>
> On Tue, Dec 6, 2016, 07:30 Maciej Szymkiewicz  > wrote:
>
> Hi,
>
> Could I ask fora fresh pair of eyes on this piece of code:
>
> 
> https://github.com/apache/spark/blob/f830bb9170f6b853565d9dd30ca7418b93a54fe3/mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala#L59-L80
>
>   @Since("1.2.0")
>   def precisionAt(k: Int): Double = {
> require(k > 0, "ranking position k should be positive")
> predictionAndLabels.map { case (pred, lab) =>
>   val labSet = lab.toSet
>
>   if (labSet.nonEmpty) {
> val n = math.min(pred.length, k)
> var i = 0
> var cnt = 0
> while (i < n) {
>   if (labSet.contains(pred(i))) {
> cnt += 1
>   }
>   i += 1
> }
> cnt.toDouble / k
>   } else {
> logWarning("Empty ground truth set, check input data")
> 0.0
>   }
> }.mean()
>   }
>
>
> Am I the only one who thinks this doesn't do what it claims? Just
> for reference:
>
>   * 
> https://web.archive.org/web/20120415101144/http://sas.uwaterloo.ca/stats_navigation/techreports/04WorkingPapers/2004-09.pdf
>   * 
> https://github.com/benhamner/Metrics/blob/master/Python/ml_metrics/average_precision.py
>
> -- 
> Best,
> Maciej
>

-- 
Maciej Szymkiewicz



Re: [MLLIB] RankingMetrics.precisionAt

2016-12-06 Thread Sean Owen
As I understand, this might best be called "mean precision@k", not "mean
average precision, up to k".

On Tue, Dec 6, 2016 at 9:43 PM Maciej Szymkiewicz 
wrote:

> Thank you Sean.
>
> Maybe I am just confused about the language. When I read that it returns "the
> average precision at the first k ranking positions" I somehow expect there
> will ap@k there and a the final output would be MAP@k not average
> precision at the k-th position.
>
> I guess it is not enough sleep.
> On 12/06/2016 02:45 AM, Sean Owen wrote:
>
> I read it again and that looks like it implements mean precision@k as I
> would expect. What is the issue?
>
> On Tue, Dec 6, 2016, 07:30 Maciej Szymkiewicz 
> wrote:
>
> Hi,
>
> Could I ask for a fresh pair of eyes on this piece of code:
>
>
> https://github.com/apache/spark/blob/f830bb9170f6b853565d9dd30ca7418b93a54fe3/mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala#L59-L80
>
>   @Since("1.2.0")
>   def precisionAt(k: Int): Double = {
> require(k > 0, "ranking position k should be positive")
> predictionAndLabels.map { case (pred, lab) =>
>   val labSet = lab.toSet
>
>   if (labSet.nonEmpty) {
> val n = math.min(pred.length, k)
> var i = 0
> var cnt = 0
> while (i < n) {
>   if (labSet.contains(pred(i))) {
> cnt += 1
>   }
>   i += 1
> }
> cnt.toDouble / k
>   } else {
> logWarning("Empty ground truth set, check input data")
> 0.0
>   }
> }.mean()
>   }
>
>
> Am I the only one who thinks this doesn't do what it claims? Just for
> reference:
>
>
>-
>
> https://web.archive.org/web/20120415101144/http://sas.uwaterloo.ca/stats_navigation/techreports/04WorkingPapers/2004-09.pdf
>-
>
> https://github.com/benhamner/Metrics/blob/master/Python/ml_metrics/average_precision.py
>
> --
> Best,
> Maciej
>
>
> --
> Maciej Szymkiewicz
>
>


Re: [MLLIB] RankingMetrics.precisionAt

2016-12-06 Thread Maciej Szymkiewicz
This sounds much better.

Follow up question is if we should provide MAP@k, which I believe is
wider used metric.


On 12/06/2016 09:52 PM, Sean Owen wrote:
> As I understand, this might best be called "mean precision@k", not
> "mean average precision, up to k".
>
> On Tue, Dec 6, 2016 at 9:43 PM Maciej Szymkiewicz
> mailto:mszymkiew...@gmail.com>> wrote:
>
> Thank you Sean.
>
> Maybe I am just confused about the language. When I read that it
> returns "the average precision at the first k ranking positions" I
> somehow expect there will ap@k there and a the final output would
> be MAP@k not average precision at the k-th position.
>
> I guess it is not enough sleep.
>
> On 12/06/2016 02:45 AM, Sean Owen wrote:
>> I read it again and that looks like it implements mean
>> precision@k as I would expect. What is the issue?
>>
>> On Tue, Dec 6, 2016, 07:30 Maciej Szymkiewicz
>> mailto:mszymkiew...@gmail.com>> wrote:
>>
>> Hi,
>>
>> Could I ask fora fresh pair of eyes on this piece of code:
>>
>> 
>> https://github.com/apache/spark/blob/f830bb9170f6b853565d9dd30ca7418b93a54fe3/mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala#L59-L80
>>
>>   @Since("1.2.0")
>>   def precisionAt(k: Int): Double = {
>> require(k > 0, "ranking position k should be positive")
>> predictionAndLabels.map { case (pred, lab) =>
>>   val labSet = lab.toSet
>>
>>   if (labSet.nonEmpty) {
>> val n = math.min(pred.length, k)
>> var i = 0
>> var cnt = 0
>> while (i < n) {
>>   if (labSet.contains(pred(i))) {
>> cnt += 1
>>   }
>>   i += 1
>> }
>> cnt.toDouble / k
>>   } else {
>> logWarning("Empty ground truth set, check input data")
>> 0.0
>>   }
>> }.mean()
>>   }
>>
>>
>> Am I the only one who thinks this doesn't do what it claims?
>> Just for reference:
>>
>>   * 
>> https://web.archive.org/web/20120415101144/http://sas.uwaterloo.ca/stats_navigation/techreports/04WorkingPapers/2004-09.pdf
>>   * 
>> https://github.com/benhamner/Metrics/blob/master/Python/ml_metrics/average_precision.py
>>
>> -- 
>> Best,
>> Maciej
>>
>
> -- 
> Maciej Szymkiewicz
>

-- 
Maciej Szymkiewicz