Isaac added a comment.
Weekly update: - I cleaned up the results notebook <https://public.paws.wmcloud.org/User:Isaac_(WMF)/Annotation%20Gap/eval_wikidata_quality_model.ipynb#Results>. The original ORES model does better on the labeled data than my initial model. This isn't a big surprise -- it was trained directly on them and uses many more features. A few takeaways: - I think one salient thing in comparing feature lists to take from the ORES model is boosting the importance of having an image if that's a common property for similar items. - The real perceived benefit of this new model will be its simplicity and flexibility. If we had updated test data, I think the new model would perform much better comparatively because it shouldn't go stale in the same way the ORES model would go because I'm not hard-coding lots of rules but allowing the model to adapt and learn from the current state of Wikidata. - The ordinal logistic regression approach that I used might also not be working well. I never really planned to keep it even though it's a good theoretical match for the data because I think a simpler classification or linear regression model w/ cut-offs would be just as reasonable. I also only trained it on about 200 items so I'd have plenty of test data so certainly plenty of room to scale that up. - My model includes no features regarding the actual number of statements. They are implicitly included in the completeness proportions (e.g., what proportion of expected claims exist) but I suspect humans in labeling items pay much more attention to the sheer quantity of statements regardless of what's actually expected for an item of a given type. Not sure if this is a drawback or not but I like that it theoretically allows for an item to be high quality even if it only has a few statements. - Other big next step will be considering how to scale up the model so it could potentially run on LiftWing if that's desired. It has a few semi-large data dependencies and that might pose a challenge. TASK DETAIL https://phabricator.wikimedia.org/T321224 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Isaac Cc: Lydia_Pintscher, diego, Miriam, Isaac, Astuthiodit_1, karapayneWMDE, Invadibot, Ywats0ns, maantietaja, ItamarWMDE, Akuckartz, Nandana, Abdeaitali, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, Avner, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Capt_Swing, Mbch331
_______________________________________________ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org