[code jam] 2021 Qualification "Cheating Detection": short solution fully solving test data

Malcolm Sharpe Tue, 06 Apr 2021 13:23:12 -0700

This solution is not mentioned in the analysis. It has zero errors on the 
official test data, and locally I estimate 0.4% error rate.


Link to my 
code: 
https://codingcompetitions.withgoogle.com/codejam/submissions/000000000043580a/Ymx1ZWJsaW1w

The explanation:

We wish to accuse the player i that maximizes P(player i cheated | data). 
Because P(player i cheated) and P(data) are constants w.r.t. i, this is 
proportional to P(data | player i cheated), so it is equivalent to maximize 
that.

It is too hard to calculate that exactly, so we make a crude approximation 
using Elos computed similarly to what's described in the analysis. 
Crucially, we use a different Elo if the player cheated:

question j elo = logit(# players solving j / # players)
player i legit elo = logit(# questions solved by i / # questions)
player i cheat elo = logit( (# questions solved by i - (# questions / 2)) / 
(# questions / 2) )

Since the formula for "player i cheat elo" is only defined when player i 
solves more than half of questions, otherwise we'll assume that player i 
didn't cheat.

Using these approximations, it's straightforward to find the player that 
maximizes log P(data | player i cheated), but there's a little trick here 
that makes it more convenient. Observe that

log P(data | player i cheated)
= log P(player i data | cheat) + Sum_{i' != i} log P(player i' data | legit)
= log P(player i data | cheat) - log P(player i data | legit) + Sum_{i'} 
log P(player i' data | legit)

The last factor is a constant w.r.t. i, so we don't need it to make 
comparisons. Thus we just find the player i that maximizes

log P(player i data | cheat) - log P(player i data | legit)
= Sum_j
{ log(0.5 + 0.5 expit(z_cheat)) - log(expit(z_legit))     if i solved j
{ log(0.5 * expit(-z_cheat)) - log(expit(-z_legit))       if i failed j
where
            z_legit = player i legit elo - question j elo
            z_cheat = player i cheat elo - question j elo

That's all. It took me a few hours of failing other ideas before I stumbled 
on this one, but it makes sense and it's easy to implement. I was very 
surprised at the low error rate given the crudeness of the Elo 
approximation.

-- 
-- You received this message because you are subscribed to the Google Groups 
Code Jam group. To post to this group, send email to 
google-code@googlegroups.com. To unsubscribe from this group, send email to 
google-code+unsubscr...@googlegroups.com. For more options, visit this group at 
https://groups.google.com/d/forum/google-code?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Google Code Jam" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-code+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/google-code/2b2eebe5-b79d-4fb2-87dc-864d2227ce32n%40googlegroups.com.

[code jam] 2021 Qualification "Cheating Detection": short solution fully solving test data

Reply via email to