Isaac added a comment.

  Updated API to be slightly more robust to instance-of-only edge cases and 
provide the individual features. Output for 
https://wikidata-quality.wmcloud.org/api/item-scores?qid=Q67559155:
  
    {
      "item": "https://www.wikidata.org/wiki/Q67559155";,
      "features": {
        "ref-completeness": 0.9055531797461024,
        "claim-completeness": 0.903502532415779,
        "label-desc-completeness": 1.0,
        "num-claims": 11
      },
      "predicted-completeness": "A",
      "predicted-quality": "C"
    }
  
  Details:
  
  - `ref-completeness`: what proportion of expected references does the item 
have? References that are internal to Wikimedia are only given half-credit 
while external links / identifiers are given full credit. Based on what 
proportion of claims for a given property typically have references on 
Wikidata. Also takes into account missing statements.
  - `claim-completeness`: what proportion of the expected claims does the item 
have. Data taken from Recoin <https://www.wikidata.org/wiki/Wikidata:Recoin> 
where less common properties for a given instance-of are weighted less.
  - `label-desc-completeness`: what proportion of expected labels/descriptions 
are present. Right now the expected labels/descriptions are English plus any 
language for which the item has a sitelink.
  - `num-claims`: how many total properties the item has actually so it's a 
misnomer and something I'll fix at some point (I don't give more credit for 
e.g., having 3 authors instead of 1 author for a scientific paper)
  - `predicted-completeness`: E (worst) to A (best) based on (see guidelines 
<https://www.wikidata.org/wiki/Wikidata:Item_quality>), which uses just the 
proportional `*-completeness` features.
  - `predicted-quality`: same classes but now also includes the more generic 
`num-claims` feature too.
  
  Regarding T332021 <https://phabricator.wikimedia.org/T332021>, I'll have to 
think about how to count that for the label-desc score. Probably no change for 
descriptions but for labels, perhaps accept it in place of English but still 
expect language-specific labels for any languages that have a sitelink? Either 
way, label/descriptions are not a major feature so it won't greatly affect the 
model.

TASK DETAIL
  https://phabricator.wikimedia.org/T321224

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Isaac
Cc: Michael, Lydia_Pintscher, diego, Miriam, Isaac, Astuthiodit_1, 
karapayneWMDE, Invadibot, Ywats0ns, maantietaja, ItamarWMDE, Akuckartz, 
Nandana, Abdeaitali, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, 
Avner, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Capt_Swing, Mbch331
_______________________________________________
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org

Reply via email to