Isaac added a comment.
From discussion with Lydia/Diego: - The concept of `completeness` feels closer to what we want than `quality` -- i.e. allowing for more nuance in how many statements are associated with a given item. We came up with a few ideas for how to make assessing item completeness easier (because otherwise it would require very extensive knowledge of a domain area to know how many statements should be associated with an item): I suggested providing the completeness score and quality score and asking the evaluator which was more appropriate but I like Lydia's idea better which was to just provide the completeness score and ask the evaluator if they felt that the actual score was lower, the same, or higher. - Putting together a dataset like this would be fairly straightforward -- the main challenge is having a nice stratified dataset and one that provides information on top of the original quality-oriented dataset. For example, for highly-extensive items, both models tend to agree that the item is A-class so collecting a lot more annotations won't tell us much. It's only for the shorter items where we begin to see discrepancies and so that's where we should probably focus our efforts. Plus because the model is very specific to the instance-of/occupation properties, we should make sure to have a diversity of items by those properties. This is my main TODO. - I read through the paper <https://link.springer.com/chapter/10.1007/978-3-030-49461-2_11> describing the new proposed Wikidata Property Suggester approach. My understanding of the existing item-completeness/recommender systems: - Existing Wikidata Property Suggester: make recommendations for properties to add based on statistics on co-occurrence of properties. Ignores values of these properties except for instance-of/subclass-of where the statistics are based on the value. Recommendations are ranked by probability of co-occurrence. - Recoin: similar to above but only uses instance-of property for determining missing properties and adds in refinement of which occupation the item has if it's a human. - Proposed Wikidata Property Suggester: more advanced system for finding likely co-occurring properties based on more fine-grained association rules -- i.e. doesn't just merge all the individual "if Property A -> Property B k% of the time" but instead does things like 'if Property A and Property B and ... -> Property N k% of the time". Also takes into account instance-of/subclass-of property values like the existing suggester. This seems like a pretty reasonable enhancement and their approach is quite lightweight (~1.5GB RAM for holding data structure). - I am following the Recoin approach in my model though if the new Property Suggester proves successful and provides the data needed to incorporate into the model (a list of likely missing properties + confidence scores), it would be very reasonable to incorporate that in in place of the Recoin model at a later point and also solve some of the problems that @diego was considering addressing via wikidata embeddings (more nuanced recommendations of missing properties). TASK DETAIL https://phabricator.wikimedia.org/T321224 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Isaac Cc: Michael, Lydia_Pintscher, diego, Miriam, Isaac, Astuthiodit_1, karapayneWMDE, Invadibot, Ywats0ns, maantietaja, ItamarWMDE, Akuckartz, Nandana, Abdeaitali, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, Avner, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Capt_Swing, Mbch331
_______________________________________________ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org