[Wikidata-bugs] [Maniphest] T321224: Wikidata Item Quality Model

2023-08-03 Thread Isaac
Isaac added a comment.


  I'm going to be out the next several weeks so FYI likely won't hear updates 
until mid-September on this. Thanks for these additional details though!
  
  > Now there are several Properties that can represent such relations. The 
main ones we should probably focus on are instance of, subclass of and part of 
as explained on https://www.wikidata.org/wiki/Help:Basic_membership_properties.
  
  Everything is currently based on instance-of values but looks like I need to 
also allow `subclass of` and `part of`. The tricky thing there is that I assume 
most `subclass of` and `part of` statements are pretty rare -- e.g., there are 
only so many items with `subclass of` for `physicist` -- so it's hard to learn 
the expectations for `subclass of physicist` and my best bet is probably a more 
generic set of expected properties for any item that has a `subclass of` 
property regardless of its value (and same for `part of`). I'll have to see how 
consistent these properties are though because if they're highly specific to 
the value of `subclass of`, the model won't be able to do anything useful with 
them. I'm hopeful that this small change though will fix these various outliers 
and get us to a place where we can reasonably test the verify that the model is 
doing what we expect.

TASK DETAIL
  https://phabricator.wikimedia.org/T321224

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Isaac
Cc: Michael, Lydia_Pintscher, diego, Miriam, Isaac, Danny_Benjafield_WMDE, 
KinneretG, Astuthiodit_1, YLiou_WMF, karapayneWMDE, Invadibot, Ywats0ns, 
maantietaja, ItamarWMDE, Akuckartz, Nandana, Abdeaitali, Lahi, Gq86, 
GoranSMilovanovic, QZanden, LawExplorer, Avner, _jensen, rosalieper, 
Scott_WUaS, Wikidata-bugs, aude, Capt_Swing, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T321224: Wikidata Item Quality Model

2023-07-28 Thread Lydia_Pintscher
Lydia_Pintscher added a comment.


  Thanks for this!
  So in general it is pretty important for Items to be classified and put into 
the right place in the larger ontology. So these statements do imho deserve 
some sort of special status as they are generally more important than other 
statements.
  Now there are several Properties that can represent such relations. The main 
ones we should probably focus on are instance of, subclass of and part of as 
explained on https://www.wikidata.org/wiki/Help:Basic_membership_properties.

TASK DETAIL
  https://phabricator.wikimedia.org/T321224

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Isaac, Lydia_Pintscher
Cc: Michael, Lydia_Pintscher, diego, Miriam, Isaac, Danny_Benjafield_WMDE, 
KinneretG, Astuthiodit_1, YLiou_WMF, karapayneWMDE, Invadibot, Ywats0ns, 
maantietaja, ItamarWMDE, Akuckartz, Nandana, Abdeaitali, Lahi, Gq86, 
GoranSMilovanovic, QZanden, LawExplorer, Avner, _jensen, rosalieper, 
Scott_WUaS, Wikidata-bugs, aude, Capt_Swing, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T321224: Wikidata Item Quality Model

2023-07-25 Thread Isaac
Isaac added a comment.


  > That's quite an interesting table! Would it be possible to get the actual 
Item IDs for the last two rows? It could be instructive to know which Items the 
model thinks are very incomplete but have excellent quality :)
  
  @Michael thanks for the questions! Some context: I think the completeness 
model is better suited for evaluating items (it's much more nuanced than the 
quality model, which largely just takes into consideration the number of 
statements an item has). This analysis hopefully will do two things: 1) help us 
find some places where the completeness model doesn't do great and we could 
tweak it, and, 2) build a sample of items to give to Wikidata experts to ensure 
that the completeness model is in fact capturing their expectations better than 
the quality model.
  
  Looking into the extreme ends of the data, most of the 241 that were low 
completeness and high quality are items that have many statements but lack an 
instance-of property that I use for categorizing an item to determine which 
properties it should have. I assume that it is okay for items to lack an 
instance-of if they're subclasses of another item? Perhaps I can make a special 
case for items that lack an instance-of but do have a subclass-of property? 
Though in checking a few examples, I don't know if there is a particularly 
consistent set of expectations around which properties should exist for these 
items. Example: Q22698 (park) . The other 
set are items like Q7473516 (Tokyo) , 
which have a bunch of statements but are lacking references and also have a 
bunch of instance-ofs so missing some expected statements too.
  
Data -- instead of the labels, I'm outputting the raw scores which range 
from 0 (very bad) to 1 (very good) for both the individual features and the 
overall completeness/quality scores. Number of statements is what it sounds 
like.

+++---++--+---+--+
|item|claims_score|refs_score 
|labels_score|num_statements|completeness_score |quality_score |

+++---++--+---+--+
|https://www.wikidata.org/wiki/Q907112   |0.37053642  |0.0948047  |0.65625  
   |102.0 |0.334656685590744  |1.0   |
|https://www.wikidata.org/wiki/Q34754|0.36810225  |0.15446919 
|0.60655737  |80.0  |0.3423793315887451 |0.9698982238769531|
|https://www.wikidata.org/wiki/Q23427|0.36427906  |0.20430791 
|0.597   |75.0  |0.35162511467933655|0.9493570327758789|
|https://www.wikidata.org/wiki/Q170174   |0.3615961   |0.16795586 
|0.653   |62.0  |0.34746548533439636|0.8789964914321899|
|https://www.wikidata.org/wiki/Q43287|0.3493883   |0.16088052 
|0.639   |70.0  |0.33629995584487915|0.9208924174308777|
|https://www.wikidata.org/wiki/Q7473516  |0.34005976  |0.13500817 |0.734375 
   |59.0  |0.33547067642211914|0.8670973181724548|
|https://www.wikidata.org/wiki/Q28179|0.33763435  |0.12792718 
|0.6805556   |58.0  |0.325609028339386  |0.8516228199005127|
|https://www.wikidata.org/wiki/Q12280|0.33676738  |0.21568704 
|0.6069182   |85.0  |0.3385910391807556 |1.0   |
|https://www.wikidata.org/wiki/Q40362|0.33630428  |0.13121563 
|0.60952383  |70.0  |0.31699275970458984|0.9111775159835815|
|https://www.wikidata.org/wiki/Q81931|0.33004636  |0.26839557 
|0.6395349   |85.0  |0.3518647849559784 |1.0   |
|https://www.wikidata.org/wiki/Q7318 |0.32431 
|0.124200575|0.6560284   |61.0  |0.31338077783584595|0.8646677136421204|
|https://www.wikidata.org/wiki/Q133356   |0.32018384  
|0.105077215|0.6369048   |62.0  |0.30359283089637756|0.8645642399787903|
|https://www.wikidata.org/wiki/Q39473|0.30660433  |0.13555892 
|0.6276  |66.0  |0.30183497071266174|0.890287458896637 |
|https://www.wikidata.org/wiki/Q4948 |0.30622554  |0.16706112 
|0.59638554  |64.0  |0.3058503270149231 |0.878718376159668 |
|https://www.wikidata.org/wiki/Q35666|0.30606884  |0.30240446 
|0.56299216  |76.0  |0.33634650707244873|0.9609817862510681|
|https://www.wikidata.org/wiki/Q5684 |0.2750341   |0.24169385 
|0.6298077   |70.0  |0.3096027374267578 |0.9274186491966248|
|https://www.wikidata.org/wiki/Q170468   |0.27139774  |0.14692228 
|0.6081081   |67.0  |0.2804390490055084 |0.8926790356636047|
|https://www.wikidata.org/wiki/Q180573   |0.26850355  |0.12669751 
|0.6474359   |58.0  |0.2782377600669861 |0.8423773050308228|
|https://www.wikidata.org/wiki/Q9158 |0.26563877  |0.27638257 

[Wikidata-bugs] [Maniphest] T321224: Wikidata Item Quality Model

2023-07-24 Thread Michael
Michael added a comment.


  In T321224#9035684 , 
@Isaac wrote:
  
  > Oooh and the job worked! High-level data on overlap between the two scores 
where they are the same except completeness just takes into account how many of 
the expected claims/refs/labels are there and quality adds the total number of 
claims to the features too:
  >
  >   +--+-+-+
  >   |completeness_label|quality_label|num_items|
  >   +--+-+-+
  >   |D |D|29955491 |
  >   |A |C|28315614 |
  >   |A |B|14986978 |
  >   |D |C|11287166 |
  >   |E |D|6428229  |
  >   |E |E|4929743  |
  >   |A |D|3697974  |
  >   |D |E|1760575  |
  >   |D |B|1361759  |
  >   |D |A|207834   |
  >   |E |C|55665|
  >   |A |A|45423|
  >   |E |B|2087 |
  >   |E |A|241  |
  >   |A |E|6|
  >   +--+-+-+
  
  That's quite an interesting table! Would it be possible to get the actual 
Item IDs for the last two rows? It could be instructive to know which Items the 
model thinks are very incomplete but have excellent quality :)

TASK DETAIL
  https://phabricator.wikimedia.org/T321224

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Isaac, Michael
Cc: Michael, Lydia_Pintscher, diego, Miriam, Isaac, KinneretG, Astuthiodit_1, 
YLiou_WMF, karapayneWMDE, Invadibot, Ywats0ns, maantietaja, ItamarWMDE, 
Akuckartz, Nandana, Abdeaitali, Lahi, Gq86, GoranSMilovanovic, QZanden, 
LawExplorer, Avner, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, 
Capt_Swing, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T321224: Wikidata Item Quality Model

2023-07-21 Thread Isaac
Isaac added a comment.


  Oooh and the job worked! High-level data on overlap between the two scores 
where they are the same except completeness just takes into account how many of 
the expected claims/refs/labels are there and quality adds the total number of 
claims to the features too:
  
+--+-+-+
|completeness_label|quality_label|num_items|
+--+-+-+
|D |D|29955491 |
|A |C|28315614 |
|A |B|14986978 |
|D |C|11287166 |
|E |D|6428229  |
|E |E|4929743  |
|A |D|3697974  |
|D |E|1760575  |
|D |B|1361759  |
|D |A|207834   |
|E |C|55665|
|A |A|45423|
|E |B|2087 |
|E |A|241  |
|A |E|6|
+--+-+-+

TASK DETAIL
  https://phabricator.wikimedia.org/T321224

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Isaac
Cc: Michael, Lydia_Pintscher, diego, Miriam, Isaac, KinneretG, Astuthiodit_1, 
YLiou_WMF, karapayneWMDE, Invadibot, Ywats0ns, maantietaja, ItamarWMDE, 
Akuckartz, Nandana, Abdeaitali, Lahi, Gq86, GoranSMilovanovic, QZanden, 
LawExplorer, Avner, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, 
Capt_Swing, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T321224: Wikidata Item Quality Model

2023-07-21 Thread Isaac
Isaac added a comment.


  Updates:
  
  - Finally ported all the code from the API to work on the cluster. I don't 
know if it'll run to completeness yet but I ran it on a subset and the results 
largely matched the API: 
https://gitlab.wikimedia.org/isaacj/miscellaneous-wikimedia/-/blob/master/annotation-gap/wikidata-completeness.ipynb
- Notably, I got rid of the statsmodel ordinal logistic regression 
dependency which was painful and just take the parameters/thresholds from the 
model and do the math myself.
  - Next step will be running this fully or on a sample of data and then 
choosing a sample of items to provide to raters to compare the scores and 
choose whether the quality or completeness models seems to best capture the 
concept of "this Wikidata item is in good shape".

TASK DETAIL
  https://phabricator.wikimedia.org/T321224

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Isaac
Cc: Michael, Lydia_Pintscher, diego, Miriam, Isaac, KinneretG, Astuthiodit_1, 
YLiou_WMF, karapayneWMDE, Invadibot, Ywats0ns, maantietaja, ItamarWMDE, 
Akuckartz, Nandana, Abdeaitali, Lahi, Gq86, GoranSMilovanovic, QZanden, 
LawExplorer, Avner, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, 
Capt_Swing, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T321224: Wikidata Item Quality Model

2023-06-30 Thread Isaac
Isaac added a comment.


  Updates:
  
  - Wrestling with re-adapting everything to the cluster but making good 
progress. One of the main challenges is that the wikidata item schema is 
different between cluster and API so lots of little errors that I'm having to 
discover and correct as I make that adaptation in source data.

TASK DETAIL
  https://phabricator.wikimedia.org/T321224

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Isaac
Cc: Michael, Lydia_Pintscher, diego, Miriam, Isaac, KinneretG, Astuthiodit_1, 
YLiou_WMF, karapayneWMDE, Invadibot, Ywats0ns, maantietaja, ItamarWMDE, 
Akuckartz, Nandana, Abdeaitali, Lahi, Gq86, GoranSMilovanovic, QZanden, 
LawExplorer, Avner, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, 
Capt_Swing, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T321224: Wikidata Item Quality Model

2023-06-23 Thread Isaac
Isaac added a comment.


  Updates:
  
  - Successfully generated the property data I need so now I have the necessary 
data to run the model in bulk on the cluster and can turn towards generating a 
dataset for sampling. Notebook: 
https://gitlab.wikimedia.org/isaacj/miscellaneous-wikimedia/-/blob/master/annotation-gap/generate_wikidata_propertyfreq_data.ipynb

TASK DETAIL
  https://phabricator.wikimedia.org/T321224

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Isaac
Cc: Michael, Lydia_Pintscher, diego, Miriam, Isaac, KinneretG, Astuthiodit_1, 
YLiou_WMF, karapayneWMDE, Invadibot, Ywats0ns, maantietaja, ItamarWMDE, 
Akuckartz, Nandana, Abdeaitali, Lahi, Gq86, GoranSMilovanovic, QZanden, 
LawExplorer, Avner, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, 
Capt_Swing, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T321224: Wikidata Item Quality Model

2023-06-16 Thread Isaac
Isaac added a comment.


  Updates:
  
  - Began process of regenerating property-frequency table on cluster given 
that we shouldn't depend on RECOIN for bulk computation even if it greatly 
simplifies the API prototype. Working out a few bugs but feel like I have the 
right approach and relatively simple.

TASK DETAIL
  https://phabricator.wikimedia.org/T321224

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Isaac
Cc: Michael, Lydia_Pintscher, diego, Miriam, Isaac, Astuthiodit_1, YLiou_WMF, 
karapayneWMDE, Invadibot, Ywats0ns, maantietaja, ItamarWMDE, Akuckartz, 
Nandana, Abdeaitali, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, 
Avner, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Capt_Swing, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T321224: Wikidata Item Quality Model

2023-05-12 Thread Isaac
Isaac added a comment.


  No updates still with prep for wikiworkshop/hackathon but after next week, 
hoping to get back to this!

TASK DETAIL
  https://phabricator.wikimedia.org/T321224

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Isaac
Cc: Michael, Lydia_Pintscher, diego, Miriam, Isaac, Astuthiodit_1, 
karapayneWMDE, Invadibot, Ywats0ns, maantietaja, ItamarWMDE, Akuckartz, 
Nandana, Abdeaitali, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, 
Avner, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Capt_Swing, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T321224: Wikidata Item Quality Model

2023-04-11 Thread Isaac
Isaac added a comment.


  From discussion with Lydia/Diego:
  
  - The concept of `completeness` feels closer to what we want than `quality` 
-- i.e. allowing for more nuance in how many statements are associated with a 
given item. We came up with a few ideas for how to make assessing item 
completeness easier (because otherwise it would require very extensive 
knowledge of a domain area to know how many statements should be associated 
with an item): I suggested providing the completeness score and quality score 
and asking the evaluator which was more appropriate but I like Lydia's idea 
better which was to just provide the completeness score and ask the evaluator 
if they felt that the actual score was lower, the same, or higher.
  - Putting together a dataset like this would be fairly straightforward -- the 
main challenge is having a nice stratified dataset and one that provides 
information on top of the original quality-oriented dataset. For example, for 
highly-extensive items, both models tend to agree that the item is A-class so 
collecting a lot more annotations won't tell us much. It's only for the shorter 
items where we begin to see discrepancies and so that's where we should 
probably focus our efforts. Plus because the model is very specific to the 
instance-of/occupation properties, we should make sure to have a diversity of 
items by those properties. This is my main TODO.
  - I read through the paper 
 describing the 
new proposed Wikidata Property Suggester approach. My understanding of the 
existing item-completeness/recommender systems:
- Existing Wikidata Property Suggester: make recommendations for properties 
to add based on statistics on co-occurrence of properties. Ignores values of 
these properties except for instance-of/subclass-of where the statistics are 
based on the value. Recommendations are ranked by probability of co-occurrence.
- Recoin: similar to above but only uses instance-of property for 
determining missing properties and adds in refinement of which occupation the 
item has if it's a human.
- Proposed Wikidata Property Suggester: more advanced system for finding 
likely co-occurring properties based on more fine-grained association rules -- 
i.e. doesn't just merge all the individual "if Property A -> Property B k% of 
the time" but instead does things like 'if Property A and Property B and ... -> 
Property N k% of the time". Also takes into account instance-of/subclass-of 
property values like the existing suggester. This seems like a pretty 
reasonable enhancement and their approach is quite lightweight (~1.5GB RAM for 
holding data structure).
  - I am following the Recoin approach in my model though if the new Property 
Suggester proves successful and provides the data needed to incorporate into 
the model (a list of likely missing properties + confidence scores), it would 
be very reasonable to incorporate that in in place of the Recoin model at a 
later point and also solve some of the problems that @diego was considering 
addressing via wikidata embeddings (more nuanced recommendations of missing 
properties).

TASK DETAIL
  https://phabricator.wikimedia.org/T321224

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Isaac
Cc: Michael, Lydia_Pintscher, diego, Miriam, Isaac, Astuthiodit_1, 
karapayneWMDE, Invadibot, Ywats0ns, maantietaja, ItamarWMDE, Akuckartz, 
Nandana, Abdeaitali, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, 
Avner, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Capt_Swing, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T321224: Wikidata Item Quality Model

2023-04-03 Thread leila
leila moved this task from FY2022-23-Research-January-March to In Progress on 
the Research board.
leila edited projects, added Research; removed Research 
(FY2022-23-Research-January-March).

TASK DETAIL
  https://phabricator.wikimedia.org/T321224

WORKBOARD
  https://phabricator.wikimedia.org/project/board/45/

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Isaac, leila
Cc: Michael, Lydia_Pintscher, diego, Miriam, Isaac, Astuthiodit_1, 
karapayneWMDE, Invadibot, Ywats0ns, maantietaja, ItamarWMDE, Akuckartz, 
Nandana, Abdeaitali, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, 
Avner, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Capt_Swing, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T321224: Wikidata Item Quality Model

2023-04-03 Thread leila
leila added a parent task: T333892: Develop a new generation of ML models for 
Wikidata.

TASK DETAIL
  https://phabricator.wikimedia.org/T321224

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Isaac, leila
Cc: Michael, Lydia_Pintscher, diego, Miriam, Isaac, Astuthiodit_1, 
karapayneWMDE, Invadibot, Ywats0ns, maantietaja, ItamarWMDE, Akuckartz, 
Nandana, Abdeaitali, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, 
Avner, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Capt_Swing, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T321224: Wikidata Item Quality Model

2023-04-03 Thread leila
leila removed a parent task: T293478: Content Tagging Models.

TASK DETAIL
  https://phabricator.wikimedia.org/T321224

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Isaac, leila
Cc: Michael, Lydia_Pintscher, diego, Miriam, Isaac, Astuthiodit_1, 
karapayneWMDE, Invadibot, Ywats0ns, maantietaja, ItamarWMDE, Akuckartz, 
Nandana, Abdeaitali, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, 
Avner, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Capt_Swing, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T321224: Wikidata Item Quality Model

2023-03-24 Thread Isaac
Isaac added a comment.


  Updated API to be slightly more robust to instance-of-only edge cases and 
provide the individual features. Output for 
https://wikidata-quality.wmcloud.org/api/item-scores?qid=Q67559155:
  
{
  "item": "https://www.wikidata.org/wiki/Q67559155;,
  "features": {
"ref-completeness": 0.9055531797461024,
"claim-completeness": 0.903502532415779,
"label-desc-completeness": 1.0,
"num-claims": 11
  },
  "predicted-completeness": "A",
  "predicted-quality": "C"
}
  
  Details:
  
  - `ref-completeness`: what proportion of expected references does the item 
have? References that are internal to Wikimedia are only given half-credit 
while external links / identifiers are given full credit. Based on what 
proportion of claims for a given property typically have references on 
Wikidata. Also takes into account missing statements.
  - `claim-completeness`: what proportion of the expected claims does the item 
have. Data taken from Recoin  
where less common properties for a given instance-of are weighted less.
  - `label-desc-completeness`: what proportion of expected labels/descriptions 
are present. Right now the expected labels/descriptions are English plus any 
language for which the item has a sitelink.
  - `num-claims`: how many total properties the item has actually so it's a 
misnomer and something I'll fix at some point (I don't give more credit for 
e.g., having 3 authors instead of 1 author for a scientific paper)
  - `predicted-completeness`: E (worst) to A (best) based on (see guidelines 
), which uses just the 
proportional `*-completeness` features.
  - `predicted-quality`: same classes but now also includes the more generic 
`num-claims` feature too.
  
  Regarding T332021 , I'll have to 
think about how to count that for the label-desc score. Probably no change for 
descriptions but for labels, perhaps accept it in place of English but still 
expect language-specific labels for any languages that have a sitelink? Either 
way, label/descriptions are not a major feature so it won't greatly affect the 
model.

TASK DETAIL
  https://phabricator.wikimedia.org/T321224

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Isaac
Cc: Michael, Lydia_Pintscher, diego, Miriam, Isaac, Astuthiodit_1, 
karapayneWMDE, Invadibot, Ywats0ns, maantietaja, ItamarWMDE, Akuckartz, 
Nandana, Abdeaitali, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, 
Avner, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Capt_Swing, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T321224: Wikidata Item Quality Model

2023-03-17 Thread Isaac
Isaac added a comment.


  I still need to do some checks because I know e.g., this fails when the item 
lacks statements, but I put together an API for testing the model. It has two 
outputs: a quality class (E worst to A best) that uses the number of claims on 
the item as a feature (along with labels/refs/claims completeness) and 
corresponds very closely to ORES model outputs and the annotated data, and, a 
completeness class (same set of labels) that does not include the number of 
claims as a feature and so is more a measure of how complete an item is (a la 
the Recoin approach).
  
  Example: https://wikidata-quality.wmcloud.org/api/item-scores?qid=Q67559155

TASK DETAIL
  https://phabricator.wikimedia.org/T321224

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Isaac
Cc: Michael, Lydia_Pintscher, diego, Miriam, Isaac, Astuthiodit_1, 
karapayneWMDE, Invadibot, Ywats0ns, maantietaja, ItamarWMDE, Akuckartz, 
Nandana, Abdeaitali, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, 
Avner, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Capt_Swing, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T321224: Wikidata Item Quality Model

2023-03-10 Thread Isaac
Isaac added a comment.


  Weekly updates:
  
  - Discussed with Diego the challenge of whether our annotated data is really 
assessing what we want it to. I'll try to join the next meeting with Lydia to 
hear more and figure out our options.
  - Diego is also considering how embeddings might help with better missing 
property / out-of-date property / quality predictions for Wikidata subgraphs 
where we have a lot more data and the sorts of properties you might expect 
varies at finer-grained levels than just instance-of/occupation. For examples, 
instances where e.g., country of citizenship or age might further mediate what 
claims you'd expect. This could also be useful for fine-grained similarity to 
e.g., identify similar Wikidata items to use as examples or also improve.

TASK DETAIL
  https://phabricator.wikimedia.org/T321224

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Isaac
Cc: Lydia_Pintscher, diego, Miriam, Isaac, Astuthiodit_1, karapayneWMDE, 
Invadibot, Ywats0ns, maantietaja, ItamarWMDE, Akuckartz, Nandana, Abdeaitali, 
Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, Avner, _jensen, 
rosalieper, Scott_WUaS, Wikidata-bugs, aude, Capt_Swing, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T321224: Wikidata Item Quality Model

2023-03-03 Thread Isaac
Isaac added a comment.


  I slightly tweaked the model but also experimented with adding just a simple 
square-root of the number of existing claims to the model and found that that 
is essentially that's all that is needed to almost match ORES quality (which is 
near perfect) for predicting item quality. That said, I think this is mainly an 
issue with the assessment data as opposed to Wikidata quality really just being 
about the number of statements. For example, the dataset has many Wikidata 
items that are for disambiguation pages and they're almost all rated E-class 
(lowest) because their only property is their instance-of. I'd argue though 
that that's perfectly acceptable for almost all disambiguation pages and these 
items are nearly complete even with just that one property (you can see the 
frequency of other properties that occur for these pages but they're pretty 
low: https://recoin.toolforge.org/getbyclassid.php?subject=Q4167410=200). So 
while the number of claims is a useful feature for matching human perception of 
quality, I think we'd actually want to leave it out to get closer to the 
concept of "to what degree is an item missing major information". Where most 
disambiguation pages would do just fine here but human items that have many 
more statements (but also a much higher expectation) wouldn't do as well.
  
  Notebook: 
https://public.paws.wmcloud.org/User:Isaac_(WMF)/Annotation%20Gap/v2_eval_wikidata_quality_model.ipynb
  Quick summary:
  
38.7% correct (62.6% within 1 class) using features ['label_s'].
56.7% correct (77.0% within 1 class) using features ['claim_s'].
44.8% correct (72.7% within 1 class) using features ['ref_s'].
77.3% correct (98.1% within 1 class) using features ['sqrt_num_claims'].
55.0% correct (75.3% within 1 class) using features ['label_s', 'claim_s'].
50.2% correct (74.5% within 1 class) using features ['label_s', 'ref_s'].
76.5% correct (98.4% within 1 class) using features ['label_s', 
'sqrt_num_claims'].
54.2% correct (76.6% within 1 class) using features ['label_s', 'claim_s', 
'ref_s'].
75.1% correct (98.3% within 1 class) using features ['label_s', 'claim_s', 
'sqrt_num_claims'].
79.4% correct (97.7% within 1 class) using features ['label_s', 'ref_s', 
'sqrt_num_claims'].
55.0% correct (78.4% within 1 class) using features ['claim_s', 'ref_s'].
75.3% correct (98.0% within 1 class) using features ['claim_s', 
'sqrt_num_claims'].
78.8% correct (98.3% within 1 class) using features ['claim_s', 'ref_s', 
'sqrt_num_claims'].
79.4% correct (98.7% within 1 class) using features ['ref_s', 
'sqrt_num_claims'].
78.3% correct (97.9% within 1 class) using features ['label_s', 'claim_s', 
'ref_s', 'sqrt_num_claims']

ORES is at (remembering it's trained on 2x more data including what I'm 
evaluating it on here):
87.1% correct and 98.3% within 1 class

TASK DETAIL
  https://phabricator.wikimedia.org/T321224

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Isaac
Cc: Lydia_Pintscher, diego, Miriam, Isaac, Astuthiodit_1, karapayneWMDE, 
Invadibot, Ywats0ns, maantietaja, ItamarWMDE, Akuckartz, Nandana, Abdeaitali, 
Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, Avner, _jensen, 
rosalieper, Scott_WUaS, Wikidata-bugs, aude, Capt_Swing, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T321224: Wikidata Item Quality Model

2023-02-16 Thread Isaac
Isaac added a comment.


  Weekly update:
  
  - I cleaned up the results notebook 
.
 The original ORES model does better on the labeled data than my initial model. 
This isn't a big surprise -- it was trained directly on them and uses many more 
features. A few takeaways:
- I think one salient thing in comparing feature lists to take from the 
ORES model is boosting the importance of having an image if that's a common 
property for similar items.
- The real perceived benefit of this new model will be its simplicity and 
flexibility. If we had updated test data, I think the new model would perform 
much better comparatively because it shouldn't go stale in the same way the 
ORES model would go because I'm not hard-coding lots of rules but allowing the 
model to adapt and learn from the current state of Wikidata.
- The ordinal logistic regression approach that I used might also not be 
working well. I never really planned to keep it even though it's a good 
theoretical match for the data because I think a simpler classification or 
linear regression model w/ cut-offs would be just as reasonable. I also only 
trained it on about 200 items so I'd have plenty of test data so certainly 
plenty of room to scale that up.
- My model includes no features regarding the actual number of statements. 
They are implicitly included in the completeness proportions (e.g., what 
proportion of expected claims exist) but I suspect humans in labeling items pay 
much more attention to the sheer quantity of statements regardless of what's 
actually expected for an item of a given type. Not sure if this is a drawback 
or not but I like that it theoretically allows for an item to be high quality 
even if it only has a few statements.
  - Other big next step will be considering how to scale up the model so it 
could potentially run on LiftWing if that's desired. It has a few semi-large 
data dependencies and that might pose a challenge.

TASK DETAIL
  https://phabricator.wikimedia.org/T321224

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Isaac
Cc: Lydia_Pintscher, diego, Miriam, Isaac, Astuthiodit_1, karapayneWMDE, 
Invadibot, Ywats0ns, maantietaja, ItamarWMDE, Akuckartz, Nandana, Abdeaitali, 
Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, Avner, _jensen, 
rosalieper, Scott_WUaS, Wikidata-bugs, aude, Capt_Swing, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T321224: Wikidata Item Quality Model

2023-02-10 Thread Isaac
Isaac added a comment.


  > Recoin I believe didn't exist at that point. It was also not integrated in 
the existing production systems. I don't think we ever did a proper analysis of 
what it's currently capable of and how good it is for judging Item quality.
  
  Thanks -- useful context. I'll see about evaluating it then and report back. 
I've been working on a prototype that essentially uses Recoin + additional 
rules for labels / references to generate a score. I'll then compare it against 
the labeled data from the original ORES campaign. You can see a super raw 
prototype here (scores at the very bottom of the notebook) but I'd wait a week 
or so until I can generate more interesting figures and actually fine-tune it: 
https://public.paws.wmcloud.org/User:Isaac_(WMF)/Annotation%20Gap/eval_wikidata_quality_model.ipynb

TASK DETAIL
  https://phabricator.wikimedia.org/T321224

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Isaac
Cc: Lydia_Pintscher, diego, Miriam, Isaac, Astuthiodit_1, karapayneWMDE, 
Invadibot, Ywats0ns, maantietaja, ItamarWMDE, Akuckartz, Nandana, Abdeaitali, 
Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, Avner, _jensen, 
rosalieper, Scott_WUaS, Wikidata-bugs, aude, Capt_Swing, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T321224: Wikidata Item Quality Model

2023-02-03 Thread Lydia_Pintscher
Lydia_Pintscher added a comment.


  In T321224#8521681 , 
@Isaac wrote:
  
  > @Lydia_Pintscher I was reminded recently of Recoin 
 (and the closely related 
PropertySuggester ) 
and that got me wondering: is there a reason that the ORES model was used 
instead of Recoin? Or maybe more specifically, is there any reason not to use 
Recoin for assessing Wikidata item quality? What are its drawbacks?
  >
  > Looking through it, my impression was that it's quite good and that my 
approach likely would have been very similar. I do see a few places we could 
augment it:
  >
  > - Also assessing references in a similar way (based on how often a property 
is referenced on other items) to identify claims where references are missing 
or could be improved (e.g., imported from wikipedia)
  > - Also assessing labels/descriptions based on which language sitelinks 
exist for the item -- e.g., if Japanese Wikipedia article, should also have 
Japanese label/description
  >
  > And then I know you asked about Properties / Lexemes -- presumably this 
same strategy could be adopted for them if it's indeed working well for items!
  
  Recoin I believe didn't exist at that point. It was also not integrated in 
the existing production systems. I don't think we ever did a proper analysis of 
what it's currently capable of and how good it is for judging Item quality.

TASK DETAIL
  https://phabricator.wikimedia.org/T321224

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Isaac, Lydia_Pintscher
Cc: Lydia_Pintscher, diego, Miriam, Isaac, Astuthiodit_1, karapayneWMDE, 
Invadibot, Ywats0ns, maantietaja, ItamarWMDE, Akuckartz, Nandana, Abdeaitali, 
Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, Avner, _jensen, 
rosalieper, Scott_WUaS, Wikidata-bugs, aude, Capt_Swing, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T321224: Wikidata Item Quality Model

2023-01-27 Thread Isaac
Isaac added a comment.


  I started a PAWS notebook where I will evaluate the proposed strategy (Recoin 
with additional of reference/labels rules) against the 2020 dataset (~4k items) 
of assessed Wikidata item qualities. This will allow me to relatively cheapily 
assess the method before trying to scale up.
  
  Notebook: 
https://public.paws.wmcloud.org/User:Isaac_(WMF)/Annotation%20Gap/eval_wikidata_quality_model.ipynb

TASK DETAIL
  https://phabricator.wikimedia.org/T321224

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Isaac
Cc: Lydia_Pintscher, diego, Miriam, Isaac, Astuthiodit_1, karapayneWMDE, 
Invadibot, Ywats0ns, maantietaja, ItamarWMDE, Akuckartz, Nandana, Abdeaitali, 
Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, Avner, _jensen, 
rosalieper, Scott_WUaS, Wikidata-bugs, aude, Capt_Swing, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T321224: Wikidata Item Quality Model

2023-01-24 Thread Isaac
Isaac moved this task from FY2022-23-Research-October-December to 
FY2022-23-Research-January-March on the Research board.
Isaac edited projects, added Research (FY2022-23-Research-January-March); 
removed Research (FY2022-23-Research-October-December).

TASK DETAIL
  https://phabricator.wikimedia.org/T321224

WORKBOARD
  https://phabricator.wikimedia.org/project/board/45/

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Isaac
Cc: Lydia_Pintscher, diego, Miriam, Isaac, Astuthiodit_1, karapayneWMDE, 
Invadibot, Ywats0ns, maantietaja, ItamarWMDE, Akuckartz, Nandana, Abdeaitali, 
Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, Avner, _jensen, 
rosalieper, Scott_WUaS, Wikidata-bugs, aude, Capt_Swing, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T321224: Wikidata Item Quality Model

2023-01-12 Thread Isaac
Isaac added a subscriber: Lydia_Pintscher.
Isaac added a comment.


  @Lydia_Pintscher I was reminded recently of Recoin 
 (and the closely related 
PropertySuggester ) 
and that got me wondering: is there a reason that the ORES model was used 
instead of Recoin? Or maybe more specifically, is there any reason not to use 
Recoin for assessing Wikidata item quality? What are its drawbacks?
  
  Looking through it, my impression was that it's quite good and that my 
approach likely would have been very similar. I do see a few places we could 
augment it:
  
  - Also assessing references in a similar way (based on how often a property 
is referenced on other items) to identify claims where references are missing 
or could be improved (e.g., imported from wikipedia)
  - Also assessing labels/descriptions based on which language sitelinks exist 
for the item -- e.g., if Japanese Wikipedia article, should also have Japanese 
label/description
  
  And then I know you asked about Properties / Lexemes -- presumably this same 
strategy could be adopted for them if it's indeed working well for items!

TASK DETAIL
  https://phabricator.wikimedia.org/T321224

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Isaac
Cc: Lydia_Pintscher, diego, Miriam, Isaac, Astuthiodit_1, karapayneWMDE, 
Invadibot, Ywats0ns, maantietaja, ItamarWMDE, Akuckartz, Nandana, Abdeaitali, 
Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, Avner, _jensen, 
rosalieper, Scott_WUaS, Wikidata-bugs, aude, Capt_Swing, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T321224: Wikidata Item Quality Model

2022-12-22 Thread diego
diego added a comment.


  I'm trying to implement a link-prediction task on Wikidata, to be used as 
proxy for claims coverage. I'm building on top of Goyal & Ferrara 
's work. The existing libraries might 
require some tweaks to work on the full Wikidata Graph, but before addressing 
the scalability issues I want to test this approach on a small sample to see 
the suitability of this approach.

TASK DETAIL
  https://phabricator.wikimedia.org/T321224

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Isaac, diego
Cc: diego, Miriam, Isaac, Astuthiodit_1, karapayneWMDE, Invadibot, Ywats0ns, 
maantietaja, ItamarWMDE, Akuckartz, Nandana, Abdeaitali, Lahi, Gq86, 
GoranSMilovanovic, QZanden, LawExplorer, Avner, _jensen, rosalieper, 
Scott_WUaS, Wikidata-bugs, aude, Capt_Swing, Lydia_Pintscher, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T321224: Wikidata Item Quality Model

2022-12-22 Thread Isaac
Isaac added a comment.


  Weekly updates:
  
  - I focused on the references component of the model this week. I built 
heavily on Amaral, Gabriel, Alessandro Piscopo, Lucie-Aimée Kaffee, Odinaldo 
Rodrigues, and Elena Simperl. "Assessing the quality of sources in Wikidata 
across languages: a hybrid approach." Journal of Data and Information Quality 
(JDIQ) 13, no. 4 (2021): 1-35. 
  - I wrote a Python function (code below) that takes the references for a 
claim and maps it to high-level categories that tell us about the quality of 
the reference -- e.g., has an External URL associated with it vs. referring to 
internal Wikidata item or import from another Wikimedia project. I can imagine 
weak and strong recommendations based on this -- e.g., high priority would be 
adding missing references and lower priority might be updating Imported from 
Wikimedia Project to a external URL and very low priority might be adding a 
second reference.
  - Using that function, I can generate basic descriptive stats on reference 
distributions on Wikidata (table below) and split by property 
(top-100-most-common properties below). From this data, you can see that we 
might be able to automatically infer which properties definitely need 
references, which ones probably should have references, and which ones probably 
don't by just setting some basic heuristics. One challenge will be whether we 
use the current state of Wikidata (which is heavily bot-influenced so for 
certain properties, reflects the choice of a few people) or try to build a more 
nuanced dataset based on edit history of which properties have references when 
editors add them.
  
# Code for categorizing references for a claim per a simple taxonomy that 
by proxy tells us something about authority/accessibility/usefulness of the 
reference
# types of references from least -> best
# so if a claim has two references and one is Internal-Stated and one is 
External-Direct, we keep External-Direct
REF_ORDER = {r:i for r,i in enumerate(
['Internal-Inferred', 'Internal-Stated', 'Internal-Wikimedia',
 'External-Identifier', 'External-Direct'])}

EXTERNAL_ID_PROPERTIES = set()
# all Wikidata properties that are external IDs -- used for detecting when 
used as part of a reference
# TODO: Maybe update to SPARQL query that is external identifier properties 
ONLY with URL formatter properties? (maybe that's essentially the same thing?)
# https://quarry.wmcloud.org/query/69919
with open('quarry-69919-wikidata-external-ids-run692643.tsv', 'r') as fin:
for line in fin:
EXTERNAL_ID_PROPERTIES.add(f'P{line.strip()}')

def getReferenceType(references):
"""Map references for a claim to different categories.

Heavily inspired by: https://arxiv.org/pdf/2109.09405.pdf
Also: https://www.wikidata.org/wiki/Help:Sources
"""
if references is None:
ref_count = 'unreferenced'
best_ref_type = None
else:
ref_count = 'single' if len(references) == 1 else 'multiple'
best_ref_types = []
for ref in references:
# reference URL OR official website OR archive URL OR URL OR 
external data available at 
if 'P854' in ref['snaksOrder'] or 'P856' in ref['snaksOrder'] 
or 'P1065' in ref['snaksOrder'] or 'P953' in ref['snaksOrder'] or 'P2699' in 
ref['snaksOrder'] or 'P1325' in ref['snaksOrder']:
best_ref_types.append('External-Direct')
break
elif [p for p in ref['snaksOrder'] if p in 
EXTERNAL_ID_PROPERTIES]:
best_ref_types.append('External-Identifier')
# Wikimedia import URL OR imported from Wikimedia project
elif 'P4656' in ref['snaksOrder'] or 'P143' in 
ref['snaksOrder']:
best_ref_types.append('Internal-Wikimedia')
# stated in
elif 'P248' in ref['snaksOrder']:
best_ref_types.append('Internal-Stated')
# inferred from Wikidata item OR based on heuristic OR based on
elif 'P3452' in ref['snaksOrder'] or 'P887' in 
ref['snaksOrder'] or 'P144' in ref['snaksOrder']:
best_ref_types.append('Internal-Inferred')
# title OR published in -- hard to interpret without more info 
but probably links to Wikidata item
elif 'P1476' in ref['snaksOrder'] or 'P1433' in 
ref['snaksOrder']:
best_ref_types.append('Internal-Stated')
else:
best_ref_types.append(f'Unknown: {ref["snaksOrder"]}')
best_ref_type = max(best_ref_types, key=lambda x: REF_ORDER.get(x, 
-1))
return (ref_count, best_ref_type)
  
  
  
High-level descriptive stats for every num_refs/best_ref category over 1000 
claims:
I manually inspect the top 

[Wikidata-bugs] [Maniphest] T321224: Wikidata Item Quality Model

2022-12-16 Thread Isaac
Isaac added a comment.


  Able to start thinking about this again and a few thoughts:
  
  - Machine-in-the-loop: when we built quality models for the Wikipedia 
language communities, it was with the idea that the models could potentially 
support the existing editor processes for assigning article quality scores -- 
e.g., https://en.wikipedia.org/wiki/Wikipedia:Content_assessment. This 
generally aligns with our machine-in-the-loop practice of only building models 
that clearly could support and receive feedback from existing community 
processes. For the Wikidata, while there are reasonable guidelines 
 for item quality, the 
only community-generated data was a one-off labeling campaign from 2020 via 
Wiki labels . This presents a 
major challenge: how do we improve on the existing ORES model to make it more 
maintainable / effective without a clear feedback loop that can be used to 
validate/update the model? One possible approach is to instead treat this as a 
task-identification model -- i.e. instead of seeking to model quality directly 
and therefore allowing vague features like the total # of references, we could 
design a model that seeks to explicitly build a list of missing/to-be-improved 
properties/aliases/descriptions/references. This list of changes could then 
always be converted into a quality score -- e.g., by computing a simple ratio 
of existing properties to missing properties or something like that -- but that 
would be secondary to the model. The community process that can provide 
feedback for this style of model then is just the regular editing process 
(albeit quite weakly because an edit doesn't tell you what else is missing). 
Eventually, it could feed into an actual interface similar to the Growth team's 
structured tasks 
 
that would provide even more direct feedback, but in the meantime this still 
feels much more machine-in-the-loop than a direct quality model.
  - Reducing data drift: alongside this shift in design from quality -> task 
identification, we can also make the model more sustainable by doing less 
hard-coding of outliers (like asteroids 
)
 and try to redesign the model to adapt to the existing structure of Wikidata 
when it is trained. For example, taking more the approach previously taken for 
external identifiers / media 

 where the relevant data structures that inform the model are easy to 
auto-generate and thus could be updated with each model training. This could be 
extended to e.g., lists of properties that commonly have references and lists 
of properties that commonly appear for a given instance-of.
- Then the model would take an item as input and perhaps go something like:
  - Extract it's instance-of and sitelinks
  - Sitelinks would be used to help determine which aliases/descriptions 
should exist
  - Instance-ofs would be used to identify which properties are expected
  - For each of those expected properties, it would either be rated as 
missing, incomplete (missing reference etc.), or complete
  - And then all of this information could be compiled as specific tasks
  - And for the quality score, the list of tasks could be compared against 
the existing data to come to some general score.
- The challenge then still is in the smart compiling of expected properties 
for a given instance-of, but I feel much better about the structure of this 
model because it's more transparent and anyone who is familiar with Wikidata 
could easily inspect the list of expected properties for a given instance-of 
and tweak it.
- I'm now working on extracting the list of existing properties for each 
instance-of to see if most have a clear set of common properties

TASK DETAIL
  https://phabricator.wikimedia.org/T321224

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Isaac
Cc: diego, Miriam, Isaac, Astuthiodit_1, karapayneWMDE, Invadibot, Ywats0ns, 
maantietaja, ItamarWMDE, Akuckartz, Nandana, Abdeaitali, Lahi, Gq86, 
GoranSMilovanovic, QZanden, LawExplorer, Avner, _jensen, rosalieper, 
Scott_WUaS, Wikidata-bugs, aude, Capt_Swing, Lydia_Pintscher, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T321224: Wikidata Item Quality Model

2022-12-02 Thread Isaac
Isaac added a comment.


  Update: past few weeks have been busy so I haven't had a chance to look into 
this but I'm hoping to get more time in December to focus on it.

TASK DETAIL
  https://phabricator.wikimedia.org/T321224

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Isaac
Cc: diego, Miriam, Isaac, Astuthiodit_1, karapayneWMDE, Invadibot, Ywats0ns, 
maantietaja, ItamarWMDE, Akuckartz, Nandana, Abdeaitali, Lahi, Gq86, 
GoranSMilovanovic, QZanden, LawExplorer, Avner, _jensen, rosalieper, 
Scott_WUaS, Wikidata-bugs, aude, Capt_Swing, Lydia_Pintscher, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T321224: Wikidata Item Quality Model

2022-11-14 Thread AOdit_WMF
AOdit_WMF added a project: Linked-Open-Data-Network-Program.

TASK DETAIL
  https://phabricator.wikimedia.org/T321224

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Isaac, AOdit_WMF
Cc: diego, Miriam, Isaac, Astuthiodit_1, karapayneWMDE, Invadibot, Ywats0ns, 
maantietaja, ItamarWMDE, Akuckartz, Nandana, Abdeaitali, Lahi, Gq86, 
GoranSMilovanovic, QZanden, LawExplorer, Avner, _jensen, rosalieper, 
Scott_WUaS, Wikidata-bugs, aude, Capt_Swing, Lydia_Pintscher, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T321224: Wikidata Item Quality Model

2022-11-04 Thread Isaac
Isaac added a comment.


  Weekly update:
  
  - Summarizing some past research shared / further examinations of the 
existing ORES model shared by LP:
- We have to be careful to adjust expectations for a given claim depending 
on its property type (distribution of property types on Wikidata 
) -- e.g., no references for 
`external-id` properties. Current model uses a static list for this 

 but we might want to re-evaluate.
- Even though number of sitelinks might correlate positively with quality, 
it's a feature we should avoid as it's really a proxy for popularity and not 
item quality
- Wikidata is constantly shifting in big ways and out-of-date data / rules 
can lead to models handling particular instance-ofs poorly. We should do our 
best to make aspects of the model unsupervised or not dependent on a fixed set 
of data so it can adapt easily.
- The current model is actually pretty good so maybe this is less about 
iterating on it significantly and more about thinking about redesigning it for 
new LiftWing paradigm and to be less susceptible to data drift.
  - Something I've been mulling over is how to ensure the model is actionable 
in a way that aligns with community goals and points to specific steps a 
contributor could take to raise quality.
- For instance, adding/improving references is quite actionable and 
important. For the verifiability component then, it's worthwhile to ensure that 
the model handles this well -- i.e. has a good sense of which statements do and 
do not need references and differentiates between the different types of 
references (external vs. Wikipedia).
- If we're less concerned about making items super extensive but do want to 
"require" a core set of basic properties (similar to Schemas or inteGraality 
), we might try to 
identify that core set of properties for each instance-of and try not to rely 
less on raw counts of statements in determining scores.
- What about consistency -- is there some way to capture how well an item 
matches related ones? And if so, should an item be penalized for being "unique"?
  - LP also asked us to consider how to extend this to Lexemes and Properties. 
Will have to think through that and whether we can reuse some of the resulting 
model for those item types or if they require fully separate approaches.

TASK DETAIL
  https://phabricator.wikimedia.org/T321224

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Isaac
Cc: diego, Miriam, Isaac, Astuthiodit_1, karapayneWMDE, Invadibot, Ywats0ns, 
maantietaja, ItamarWMDE, Akuckartz, Nandana, Abdeaitali, Lahi, Gq86, 
GoranSMilovanovic, QZanden, LawExplorer, Avner, _jensen, rosalieper, 
Scott_WUaS, Wikidata-bugs, aude, Capt_Swing, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T321224: Wikidata Item Quality Model

2022-10-30 Thread Lydia_Pintscher
Lydia_Pintscher added a project: Wikidata.

TASK DETAIL
  https://phabricator.wikimedia.org/T321224

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Isaac, Lydia_Pintscher
Cc: diego, Miriam, Isaac, Astuthiodit_1, karapayneWMDE, Invadibot, Ywats0ns, 
maantietaja, ItamarWMDE, Akuckartz, Nandana, Abdeaitali, Lahi, Gq86, 
GoranSMilovanovic, QZanden, LawExplorer, Avner, _jensen, rosalieper, 
Scott_WUaS, Wikidata-bugs, aude, Capt_Swing, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org