On Mon, Oct 31, 2016 at 1:13 PM, Justin Ormont <[email protected]> wrote:
> Did you add any honey-pot answers? Answers where you know the results > quite well (via many judges agreeing), or are very obvious (q=Obama, > results=[en:Presidency of Barack Obama, en:A4 Paper]). > > I've set these up as a pre-test before starting the judgment session to > check that the judge understands the instructions, and randomly included to > weed out judges that randomly select answers. > > We don't have any honey-pot answers yet, we had thought about it but had hoped that since there was no real benefit to users of doing a bad job (no payments, no leaderboard to get on) it wouldn't be necessary. We may have to re-evaluate that though, it seems a common way to deal with crowd-sourced data. > Investigating the labels (individual query-result pair) with the most > disagreement may be useful, along with the judges with the most > disagreement. > Good idea, will be looking into it soon. > > --justin > > On Mon, Oct 31, 2016 at 7:43 AM, Trey Jones <[email protected]> wrote: > >> Interesting stats, Erik. Thanks for sharing these. >> >> More clarity in the documentation is always good. >> >> For some of the negative alpha agreement values, a couple of possible >> sources come to mind. There could be bad faith actors, who either didn't >> really try very hard, or purposely put in incorrect or random values. There >> could also be genuine disagreement between the scorers about the relevance >> of the results—David and I discussed one that we both scored, and we >> disagreed like that. I can see where he was coming from, but it wasn't how >> I thought of it. In both of these cases, additional scores would help. >> >> One thing I noticed that has been inconsistent in my own scoring is that >> early on when I got a mediocre query (i.e., I wouldn't expect any >> *really* good results), I tended to grade on a curve. I'd give >> "Relevant" to the best result even if it wasn't actually a great result. >> After grading a couple of queries for which there were clearly *no* good >> results (i.e., *everything* was irrelevant), I think I stopped grading >> on a curve. >> >> My point there is that's one place we could improve the documentation: >> explicitly state that not every query has good results. It's okay to not >> have any result rated as "relevant"—or this could already be in the docs, >> and the problem is that no one reads them. :( >> >> Another thing that Erik has suggested was trying to filter out wildly >> non-encyclopedic queries (like "SANTA CLAUS PRINT OUT 3D PAPERTOYS"), and >> maybe really vague queries (like "antonio parent"), but that's potentially >> more work than filtering PII, and much more subjective. >> >> It might also be informative to review some of the scores for the >> negative alphas and see if something obvious is going on, in which case >> we'd know the alpha calculation is doing its job. >> >> >> On Thu, Oct 27, 2016 at 7:21 PM, Erik Bernhardson < >> [email protected]> wrote: >> >>> To follow up a little here, i implemented Krippendorff's Alpha and ran >>> it against all the data we currently have in discernatron, the distribution >>> looks something like: >>> >>> constraint count >>> alpha >= 0.80 11 >>> 0.667 <= alpha < 0.80 18 >>> 0.500 <= alpha < 0.667 20 >>> 0.333 <= alpha < 0.500 26 >>> 0 <= alpha < 0.333 43 >>> alpha < 0 31 >>> >>> This is a much lower level of agreement than i was expecting. The >>> literature suggests 0.80 as a reliable cutoff, and 0.667 as a cutoff from >>> which you can draw tentative conclusions. Below 0 indicates there is less >>> agreement than random chance, and we need to re-evaluate the instructions >>> to be more clear (probably true). >>> >>> >> _______________________________________________ >> discovery mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/discovery >> >> > > _______________________________________________ > discovery mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/discovery > >
_______________________________________________ discovery mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/discovery
