[Wikidata-bugs] [Maniphest] T321224: Wikidata Item Quality Model

2023-08-03 Thread Isaac
Isaac added a comment. I'm going to be out the next several weeks so FYI likely won't hear updates until mid-September on this. Thanks for these additional details though! > Now there are several Properties that can represent such relations. The main ones we should probably fo

[Wikidata-bugs] [Maniphest] T321224: Wikidata Item Quality Model

2023-07-25 Thread Isaac
Isaac added a comment. > That's quite an interesting table! Would it be possible to get the actual Item IDs for the last two rows? It could be instructive to know which Items the model thinks are very incomplete but have excellent quality :) @Michael thanks for the questions! S

[Wikidata-bugs] [Maniphest] T321224: Wikidata Item Quality Model

2023-07-21 Thread Isaac
Isaac added a comment. Oooh and the job worked! High-level data on overlap between the two scores where they are the same except completeness just takes into account how many of the expected claims/refs/labels are there and quality adds the total number of claims to the features too

[Wikidata-bugs] [Maniphest] T321224: Wikidata Item Quality Model

2023-07-21 Thread Isaac
Isaac added a comment. Updates: - Finally ported all the code from the API to work on the cluster. I don't know if it'll run to completeness yet but I ran it on a subset and the results largely matched the API: https://gitlab.wikimedia.org/isaacj/miscellaneous-wikimedia/-/blob/master

[Wikidata-bugs] [Maniphest] T321224: Wikidata Item Quality Model

2023-06-30 Thread Isaac
Isaac added a comment. Updates: - Wrestling with re-adapting everything to the cluster but making good progress. One of the main challenges is that the wikidata item schema is different between cluster and API so lots of little errors that I'm having to discover and correct as I make

[Wikidata-bugs] [Maniphest] T321224: Wikidata Item Quality Model

2023-06-23 Thread Isaac
Isaac added a comment. Updates: - Successfully generated the property data I need so now I have the necessary data to run the model in bulk on the cluster and can turn towards generating a dataset for sampling. Notebook: https://gitlab.wikimedia.org/isaacj/miscellaneous-wikimedia

[Wikidata-bugs] [Maniphest] T321224: Wikidata Item Quality Model

2023-06-16 Thread Isaac
Isaac added a comment. Updates: - Began process of regenerating property-frequency table on cluster given that we shouldn't depend on RECOIN for bulk computation even if it greatly simplifies the API prototype. Working out a few bugs but feel like I have the right approach

[Wikidata-bugs] [Maniphest] T321224: Wikidata Item Quality Model

2023-05-12 Thread Isaac
Isaac added a comment. No updates still with prep for wikiworkshop/hackathon but after next week, hoping to get back to this! TASK DETAIL https://phabricator.wikimedia.org/T321224 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Isaac Cc: Michael

[Wikidata-bugs] [Maniphest] T321224: Wikidata Item Quality Model

2023-04-11 Thread Isaac
Isaac added a comment. From discussion with Lydia/Diego: - The concept of `completeness` feels closer to what we want than `quality` -- i.e. allowing for more nuance in how many statements are associated with a given item. We came up with a few ideas for how to make assessing item

[Wikidata-bugs] [Maniphest] T321224: Wikidata Item Quality Model

2023-03-24 Thread Isaac
Isaac added a comment. Updated API to be slightly more robust to instance-of-only edge cases and provide the individual features. Output for https://wikidata-quality.wmcloud.org/api/item-scores?qid=Q67559155: { "item": "https://www.wikidata.org/wiki/Q67559155;,

[Wikidata-bugs] [Maniphest] T321224: Wikidata Item Quality Model

2023-03-17 Thread Isaac
Isaac added a comment. I still need to do some checks because I know e.g., this fails when the item lacks statements, but I put together an API for testing the model. It has two outputs: a quality class (E worst to A best) that uses the number of claims on the item as a feature (along

[Wikidata-bugs] [Maniphest] T321224: Wikidata Item Quality Model

2023-03-10 Thread Isaac
Isaac added a comment. Weekly updates: - Discussed with Diego the challenge of whether our annotated data is really assessing what we want it to. I'll try to join the next meeting with Lydia to hear more and figure out our options. - Diego is also considering how embeddings might help

[Wikidata-bugs] [Maniphest] T321224: Wikidata Item Quality Model

2023-03-03 Thread Isaac
Isaac added a comment. I slightly tweaked the model but also experimented with adding just a simple square-root of the number of existing claims to the model and found that that is essentially that's all that is needed to almost match ORES quality (which is near perfect) for predicting item

[Wikidata-bugs] [Maniphest] T321224: Wikidata Item Quality Model

2023-02-16 Thread Isaac
Isaac added a comment. Weekly update: - I cleaned up the results notebook <https://public.paws.wmcloud.org/User:Isaac_(WMF)/Annotation%20Gap/eval_wikidata_quality_model.ipynb#Results>. The original ORES model does better on the labeled data than my initial model. This isn't

[Wikidata-bugs] [Maniphest] T321224: Wikidata Item Quality Model

2023-02-10 Thread Isaac
Isaac added a comment. > Recoin I believe didn't exist at that point. It was also not integrated in the existing production systems. I don't think we ever did a proper analysis of what it's currently capable of and how good it is for judging Item quality. Thanks -- useful context. I

[Wikidata-bugs] [Maniphest] T321224: Wikidata Item Quality Model

2023-01-27 Thread Isaac
Isaac added a comment. I started a PAWS notebook where I will evaluate the proposed strategy (Recoin with additional of reference/labels rules) against the 2020 dataset (~4k items) of assessed Wikidata item qualities. This will allow me to relatively cheapily assess the method before trying

[Wikidata-bugs] [Maniphest] T321224: Wikidata Item Quality Model

2023-01-24 Thread Isaac
Isaac moved this task from FY2022-23-Research-October-December to FY2022-23-Research-January-March on the Research board. Isaac edited projects, added Research (FY2022-23-Research-January-March); removed Research (FY2022-23-Research-October-December). TASK DETAIL https

[Wikidata-bugs] [Maniphest] T321224: Wikidata Item Quality Model

2023-01-12 Thread Isaac
Isaac added a subscriber: Lydia_Pintscher. Isaac added a comment. @Lydia_Pintscher I was reminded recently of Recoin <https://www.wikidata.org/wiki/Wikidata:Recoin> (and the closely related PropertySuggester <https://www.mediawiki.org/wiki/Extension:PropertySuggester>) and

[Wikidata-bugs] [Maniphest] T321224: Wikidata Item Quality Model

2022-12-22 Thread Isaac
Isaac added a comment. Weekly updates: - I focused on the references component of the model this week. I built heavily on Amaral, Gabriel, Alessandro Piscopo, Lucie-Aimée Kaffee, Odinaldo Rodrigues, and Elena Simperl. "Assessing the quality of sources in Wikidata across lang

[Wikidata] Re: Wikidata Atlas: a geographic view of Wikidata entities [feedback welcome!]

2022-12-20 Thread Isaac Johnson
b. 2015. <http://www2015.thewebconf.org/documents/proceedings/proceedings/p12.pdf> - Sen, Shilad, et al. "Toward Universal Spatialization Through Wikipedia-Based Semantic Enhancement." ACM Transactions on Interactive Intelligent Systems (TiiS) 9.2-3 (2019): 1-29. <

[Wikidata-bugs] [Maniphest] T321224: Wikidata Item Quality Model

2022-12-16 Thread Isaac
Isaac added a comment. Able to start thinking about this again and a few thoughts: - Machine-in-the-loop: when we built quality models for the Wikipedia language communities, it was with the idea that the models could potentially support the existing editor processes for assigning

[Wikidata-bugs] [Maniphest] T321224: Wikidata Item Quality Model

2022-12-02 Thread Isaac
Isaac added a comment. Update: past few weeks have been busy so I haven't had a chance to look into this but I'm hoping to get more time in December to focus on it. TASK DETAIL https://phabricator.wikimedia.org/T321224 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel

[Wikidata-bugs] [Maniphest] T321224: Wikidata Item Quality Model

2022-11-04 Thread Isaac
Isaac added a comment. Weekly update: - Summarizing some past research shared / further examinations of the existing ORES model shared by LP: - We have to be careful to adjust expectations for a given claim depending on its property type (distribution of property types on Wikidata

Re: [Wikidata] Edit history-revisions

2020-09-14 Thread Isaac Johnson
imedia.org/wikidatawiki/20200701/wikidatawiki-20200701-change_tag.sql.gz>). I'm not familiar with Wikidata tags so you probably want to do some examination of what they're actually detecting to make sure it's what you are looking for before you rely on them for analysis. Best, Isaac On Fri, Sep

[Wikidata-bugs] [Maniphest] T249654: Categorize different types of Wikidata re-use within Wikimedia projects

2020-08-28 Thread Isaac
Isaac closed this task as "Resolved". TASK DETAIL https://phabricator.wikimedia.org/T249654 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Isaac Cc: Akuckartz, calbon, Addshore, Lydia_Pintscher, Nuria, MGerlach, GoranSMilovano

[Wikidata-bugs] [Maniphest] T249654: Categorize different types of Wikidata re-use within Wikimedia projects

2020-08-21 Thread Isaac
Isaac updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T249654 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Isaac Cc: Akuckartz, calbon, Addshore, Lydia_Pintscher, Nuria, MGerlach, GoranSMilovanovic, Isaac

[Wikidata-bugs] [Maniphest] T249654: Categorize different types of Wikidata re-use within Wikimedia projects

2020-08-21 Thread Isaac
Isaac added a comment. Weekly update: - cleaned up the meta page a little: https://meta.wikimedia.org/wiki/Research:External_Reuse_of_Wikimedia_Content/Wikidata_Transclusion - this task is essentially done but I'm going to leave the task open at least another week to allow

[Wikidata-bugs] [Maniphest] T249654: Categorize different types of Wikidata re-use within Wikimedia projects

2020-08-13 Thread Isaac
Isaac added a comment. @GoranSMilovanovic thanks! I'm pretty open on next steps. This work was done in part to help guide interpretation of potential WMF metrics around measuring transclusion but I would love to see some improvements made to the way we monitor transclusion if possible too

[Wikidata-bugs] [Maniphest] T249654: Categorize different types of Wikidata re-use within Wikimedia projects

2020-08-06 Thread Isaac
Isaac added a comment. > Thank you for this analysis - really useful! Thanks! Glad to hear :) Additionally, I made some notes here about how these findings my inform patrolling of Wikidata transclusion (T246709#6367012 <https://phabricator.wikimedia.org/T246709#6367012>

[Wikidata-bugs] [Maniphest] T246709: What proportion of a Wikipedia article's edit history might reasonably be changes via Wikidata transclusion?

2020-08-06 Thread Isaac
Isaac added a comment. The results reported in T249654#6352573 <https://phabricator.wikimedia.org/T249654#6352573> have some potential insight into how we think about supporting patrolling of Wikidata transclusion within Wikipedia articles so I wanted to record some of my initial th

[Wikidata-bugs] [Maniphest] T249654: Categorize different types of Wikidata re-use within Wikimedia projects

2020-07-31 Thread Isaac
Isaac added a comment. > This is "overall articles for all projects", correct? It's actually just for English Wikipedia. The number from the WMDE dashboard <https://wmdeanalytics.wmflabs.org/WD_percentUsageDashboard/> for all Wikipedia projects is 31.99% (i.e. the i

Re: [Wikidata] Partial RDF dumps

2020-05-01 Thread Isaac Johnson
PAWS example: https://paws-public.wmflabs.org/paws-public/User:Isaac_(WMF)/Simplified_Wikidata_Dumps.ipynb Best, Isaac On Thu, Apr 30, 2020 at 1:33 AM raffaele messuti wrote: > On 27/04/2020 18:02, Kingsley Idehen wrote: > >> [1] https://w.wiki/PBi <https://w.wiki/PBi> >

[Wikidata-bugs] [Maniphest] [Commented On] T246709: What proportion of a Wikipedia article's edit history might reasonably be changes via Wikidata transclusion?

2020-03-03 Thread Isaac
Isaac added a comment. @Lydia_Pintscher that makes sense and thanks for reaching out. I'm not going to schedule the meeting right now because I don't want to use up your time if we don't end up prioritizing this work, but when we do, I'll reach out! TASK DETAIL https

[Wikidata-bugs] [Maniphest] [Commented On] T246709: What proportion of a Wikipedia article's edit history might reasonably be changes via Wikidata transclusion?

2020-03-02 Thread Isaac
Isaac added a comment. > If we have a concrete example to look at I can try to figure that out :) Actually, I think I found the reason for most of the pages: https://en.wikipedia.org/wiki/Template:Authority_control It's generic because it pulls any external identifiers so ca

[Wikidata-bugs] [Maniphest] [Retitled] T246709: What proportion of a Wikipedia article's edit history might reasonably be changes via Wikidata transclusion?

2020-03-02 Thread Isaac
Isaac renamed this task from "What percentage of edits via Wikidata transclusion are missing on Recent Changes?" to "What proportion of a Wikipedia article's edit history might reasonably be changes via Wikidata transclusion?". TASK DETAIL https://phabricator.wikimedi

[Wikidata-bugs] [Maniphest] [Commented On] T246709: What percentage of edits via Wikidata transclusion are missing on Recent Changes?

2020-03-02 Thread Isaac
Isaac added a comment. Thanks for the additional details @Addshore ! Some context: this task isn't being worked right now. I just created it as a potential future analysis because I had just become aware that Wikidata item properties were tracked specifically in wbc_entity_usage

[Wikidata-bugs] [Maniphest] [Commented On] T209655: Copy Wikidata dumps to HDFS

2020-01-14 Thread Isaac
Isaac added a comment. > @JAllemandou Thank you - as ever! +1: these wikidata parquet (specifically item_page_link) dumps are super useful for us! TASK DETAIL https://phabricator.wikimedia.org/T209655 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/pa

[Wikidata-bugs] [Maniphest] [Commented On] T215616: Improve interlingual links across wikis through Wikidata IDs

2019-02-26 Thread Isaac
Isaac added a comment. Hey @JAllemandou - this is great! thanks for catching that - looks all good to me now too. TASK DETAIL https://phabricator.wikimedia.org/T215616 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Isaac Cc: Marostegui, Isaac

[Wikidata-bugs] [Maniphest] [Commented On] T215616: Improve interlingual links across wikis through Wikidata IDs

2019-02-25 Thread Isaac
Isaac added a comment. Hey @JAllemandou, some debugging: a number of items aren't showing up and I can't for the life of me figure out. The few I've looked at are pretty normal articles (for example: https://de.wikipedia.org/wiki/Gregor_Grillemeier) and show up in the original parquet files

[Wikidata-bugs] [Maniphest] [Commented On] T215616: Improve interlingual links across wikis through Wikidata IDs

2019-02-19 Thread Isaac
Isaac added a comment. @diego: my interpretation is that right now in the revision history version, the same wikidb/page ID/title is associated with the same wikidata ID regardless of when the revision occurred. what is the use for that over a table that has just one entry per wikidb/page ID/title

[Wikidata-bugs] [Maniphest] [Commented On] T215616: Improve interlingual links across wikis through Wikidata IDs

2019-02-19 Thread Isaac
Isaac added a comment. thank you @JAllemandou this is awesome!!! completely unblocks me (i have a bunch of page titles across all the wikipedias and need to check whether a pair of them match the same wikidata item)!TASK DETAILhttps://phabricator.wikimedia.org/T215616EMAIL PREFERENCEShttps

[Wikidata-bugs] [Maniphest] [Commented On] T215413: Image Classification Working Group

2019-02-07 Thread Isaac
Isaac added a comment. If we go down that pathway of trying to identify what images are photographs, we should look into work by a former colleague of mine on detecting visualizations on Commons (in some ways, the inverse task): http://brenthecht.com/publications/www18_vizbywiki.pdf He (Allen Lin

Re: [Wikidata-l] Data templates

2015-04-19 Thread Antoine Isaac
On 4/18/15 5:48 PM, Ricordisamoa wrote: Il 11/04/2015 13:29, Antoine Isaac ha scritto: Hi, Is the 'template' word so bad? Paraphrasing Daniel's definition of the MediaWiki template, one could see a 'WikiData template' as a set of of properties that can be re-used, e.g. to make create

Re: [Wikidata-l] Data templates

2015-04-11 Thread Antoine Isaac
things like this together with the Wikidata data would be great for data-reusers like us, instead of having to fetch it from elsewhere! Antoine --- Antoine Isaac RD Manager, Europeana.eu On 4/7/15 3:21 PM, Valentine Charles wrote: Hello, Yes I might not use the right term here especially

Re: [Wikidata-l] Wikidata-Freebase mappings

2015-04-09 Thread Antoine Isaac
Hi everyone All this sounds really good and useful! I was wondering: is there a relation between the Samsung mappings, and the Freebase/Wikidata script that Thomas Steiner has recently shared? https://github.com/google/primarysources/tree/master/frontend Cheers, Antoine On 4/8/15 9:52 PM,

Re: [Wikidata-l] Questions about statement qualifiers

2013-11-05 Thread Antoine Isaac
Hi Antoine, all, I was also a bit puzzled by this. If you want more discussion I there is stuff on Gerard's blog [1,2]. After some patient explanations of the kind on this list, I think I understood what qualifiers are about. Still I disagree with a part of what Markus said. Trying to

Re: [Wikidata-l] Wikidata-l Digest, Vol 22, Issue 22

2013-09-23 Thread Antoine Isaac
: Antoine Isaac ais...@few.vu.nl mailto:ais...@few.vu.nl To: wikidata-l@lists.wikimedia.org mailto:wikidata-l@lists.wikimedia.org Subject: [Wikidata-l] 'Person' or 'human', upper ontologies and migrating 4 million claims Message-ID: 523f5200.7080...@few.vu.nl mailto

[Wikidata-l] 'Person' or 'human', upper ontologies and migrating 4 million claims

2013-09-22 Thread Antoine Isaac
;-) Best, Antoine --- Antoine Isaac Scientific coordinator, Europeana.eu [1] https://www.wikidata.org/wiki/Wikidata:Requests_for_comment/Migrating_away_from_GND_main_type [2] http://lists.wikimedia.org/pipermail/wikidata-l/2013-September/002815.html [3] http://lists.wikimedia.org/pipermail/wikidata-l