Isaac added a comment.
I'm going to be out the next several weeks so FYI likely won't hear updates
until mid-September on this. Thanks for these additional details though!
> Now there are several Properties that can represent such relations. The
main ones we should probably fo
Isaac added a comment.
> That's quite an interesting table! Would it be possible to get the actual
Item IDs for the last two rows? It could be instructive to know which Items the
model thinks are very incomplete but have excellent quality :)
@Michael thanks for the questions! S
Isaac added a comment.
Oooh and the job worked! High-level data on overlap between the two scores
where they are the same except completeness just takes into account how many of
the expected claims/refs/labels are there and quality adds the total number of
claims to the features too
Isaac added a comment.
Updates:
- Finally ported all the code from the API to work on the cluster. I don't
know if it'll run to completeness yet but I ran it on a subset and the results
largely matched the API:
https://gitlab.wikimedia.org/isaacj/miscellaneous-wikimedia/-/blob/master
Isaac added a comment.
Updates:
- Wrestling with re-adapting everything to the cluster but making good
progress. One of the main challenges is that the wikidata item schema is
different between cluster and API so lots of little errors that I'm having to
discover and correct as I make
Isaac added a comment.
Updates:
- Successfully generated the property data I need so now I have the necessary
data to run the model in bulk on the cluster and can turn towards generating a
dataset for sampling. Notebook:
https://gitlab.wikimedia.org/isaacj/miscellaneous-wikimedia
Isaac added a comment.
Updates:
- Began process of regenerating property-frequency table on cluster given
that we shouldn't depend on RECOIN for bulk computation even if it greatly
simplifies the API prototype. Working out a few bugs but feel like I have the
right approach
Isaac added a comment.
No updates still with prep for wikiworkshop/hackathon but after next week,
hoping to get back to this!
TASK DETAIL
https://phabricator.wikimedia.org/T321224
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Isaac
Cc: Michael
Isaac added a comment.
From discussion with Lydia/Diego:
- The concept of `completeness` feels closer to what we want than `quality`
-- i.e. allowing for more nuance in how many statements are associated with a
given item. We came up with a few ideas for how to make assessing item
Isaac added a comment.
Updated API to be slightly more robust to instance-of-only edge cases and
provide the individual features. Output for
https://wikidata-quality.wmcloud.org/api/item-scores?qid=Q67559155:
{
"item": "https://www.wikidata.org/wiki/Q67559155;,
Isaac added a comment.
I still need to do some checks because I know e.g., this fails when the item
lacks statements, but I put together an API for testing the model. It has two
outputs: a quality class (E worst to A best) that uses the number of claims on
the item as a feature (along
Isaac added a comment.
Weekly updates:
- Discussed with Diego the challenge of whether our annotated data is really
assessing what we want it to. I'll try to join the next meeting with Lydia to
hear more and figure out our options.
- Diego is also considering how embeddings might help
Isaac added a comment.
I slightly tweaked the model but also experimented with adding just a simple
square-root of the number of existing claims to the model and found that that
is essentially that's all that is needed to almost match ORES quality (which is
near perfect) for predicting item
Isaac added a comment.
Weekly update:
- I cleaned up the results notebook
<https://public.paws.wmcloud.org/User:Isaac_(WMF)/Annotation%20Gap/eval_wikidata_quality_model.ipynb#Results>.
The original ORES model does better on the labeled data than my initial model.
This isn't
Isaac added a comment.
> Recoin I believe didn't exist at that point. It was also not integrated in
the existing production systems. I don't think we ever did a proper analysis of
what it's currently capable of and how good it is for judging Item quality.
Thanks -- useful context. I
Isaac added a comment.
I started a PAWS notebook where I will evaluate the proposed strategy (Recoin
with additional of reference/labels rules) against the 2020 dataset (~4k items)
of assessed Wikidata item qualities. This will allow me to relatively cheapily
assess the method before trying
Isaac moved this task from FY2022-23-Research-October-December to
FY2022-23-Research-January-March on the Research board.
Isaac edited projects, added Research (FY2022-23-Research-January-March);
removed Research (FY2022-23-Research-October-December).
TASK DETAIL
https
Isaac added a subscriber: Lydia_Pintscher.
Isaac added a comment.
@Lydia_Pintscher I was reminded recently of Recoin
<https://www.wikidata.org/wiki/Wikidata:Recoin> (and the closely related
PropertySuggester <https://www.mediawiki.org/wiki/Extension:PropertySuggester>)
and
Isaac added a comment.
Weekly updates:
- I focused on the references component of the model this week. I built
heavily on Amaral, Gabriel, Alessandro Piscopo, Lucie-Aimée Kaffee, Odinaldo
Rodrigues, and Elena Simperl. "Assessing the quality of sources in Wikidata
across lang
b. 2015.
<http://www2015.thewebconf.org/documents/proceedings/proceedings/p12.pdf>
- Sen, Shilad, et al. "Toward Universal Spatialization Through
Wikipedia-Based Semantic Enhancement." ACM Transactions on Interactive
Intelligent Systems (TiiS) 9.2-3 (2019): 1-29.
<
Isaac added a comment.
Able to start thinking about this again and a few thoughts:
- Machine-in-the-loop: when we built quality models for the Wikipedia
language communities, it was with the idea that the models could potentially
support the existing editor processes for assigning
Isaac added a comment.
Update: past few weeks have been busy so I haven't had a chance to look into
this but I'm hoping to get more time in December to focus on it.
TASK DETAIL
https://phabricator.wikimedia.org/T321224
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel
Isaac added a comment.
Weekly update:
- Summarizing some past research shared / further examinations of the
existing ORES model shared by LP:
- We have to be careful to adjust expectations for a given claim depending
on its property type (distribution of property types on Wikidata
imedia.org/wikidatawiki/20200701/wikidatawiki-20200701-change_tag.sql.gz>).
I'm not familiar with Wikidata tags so you probably want to do some
examination of what they're actually detecting to make sure it's what you
are looking for before you rely on them for analysis.
Best,
Isaac
On Fri, Sep
Isaac closed this task as "Resolved".
TASK DETAIL
https://phabricator.wikimedia.org/T249654
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Isaac
Cc: Akuckartz, calbon, Addshore, Lydia_Pintscher, Nuria, MGerlach,
GoranSMilovano
Isaac updated the task description.
TASK DETAIL
https://phabricator.wikimedia.org/T249654
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Isaac
Cc: Akuckartz, calbon, Addshore, Lydia_Pintscher, Nuria, MGerlach,
GoranSMilovanovic, Isaac
Isaac added a comment.
Weekly update:
- cleaned up the meta page a little:
https://meta.wikimedia.org/wiki/Research:External_Reuse_of_Wikimedia_Content/Wikidata_Transclusion
- this task is essentially done but I'm going to leave the task open at least
another week to allow
Isaac added a comment.
@GoranSMilovanovic thanks! I'm pretty open on next steps. This work was done
in part to help guide interpretation of potential WMF metrics around measuring
transclusion but I would love to see some improvements made to the way we
monitor transclusion if possible too
Isaac added a comment.
> Thank you for this analysis - really useful!
Thanks! Glad to hear :)
Additionally, I made some notes here about how these findings my inform
patrolling of Wikidata transclusion (T246709#6367012
<https://phabricator.wikimedia.org/T246709#6367012>
Isaac added a comment.
The results reported in T249654#6352573
<https://phabricator.wikimedia.org/T249654#6352573> have some potential insight
into how we think about supporting patrolling of Wikidata transclusion within
Wikipedia articles so I wanted to record some of my initial th
Isaac added a comment.
> This is "overall articles for all projects", correct?
It's actually just for English Wikipedia. The number from the WMDE dashboard
<https://wmdeanalytics.wmflabs.org/WD_percentUsageDashboard/> for all Wikipedia
projects is 31.99% (i.e. the i
PAWS example:
https://paws-public.wmflabs.org/paws-public/User:Isaac_(WMF)/Simplified_Wikidata_Dumps.ipynb
Best,
Isaac
On Thu, Apr 30, 2020 at 1:33 AM raffaele messuti
wrote:
> On 27/04/2020 18:02, Kingsley Idehen wrote:
> >> [1] https://w.wiki/PBi <https://w.wiki/PBi>
>
Isaac added a comment.
@Lydia_Pintscher that makes sense and thanks for reaching out. I'm not going
to schedule the meeting right now because I don't want to use up your time if
we don't end up prioritizing this work, but when we do, I'll reach out!
TASK DETAIL
https
Isaac added a comment.
> If we have a concrete example to look at I can try to figure that out :)
Actually, I think I found the reason for most of the pages:
https://en.wikipedia.org/wiki/Template:Authority_control
It's generic because it pulls any external identifiers so ca
Isaac renamed this task from "What percentage of edits via Wikidata
transclusion are missing on Recent Changes?" to "What proportion of a Wikipedia
article's edit history might reasonably be changes via Wikidata transclusion?".
TASK DETAIL
https://phabricator.wikimedi
Isaac added a comment.
Thanks for the additional details @Addshore !
Some context: this task isn't being worked right now. I just created it as a
potential future analysis because I had just become aware that Wikidata item
properties were tracked specifically in wbc_entity_usage
Isaac added a comment.
> @JAllemandou Thank you - as ever!
+1: these wikidata parquet (specifically item_page_link) dumps are super
useful for us!
TASK DETAIL
https://phabricator.wikimedia.org/T209655
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/pa
Isaac added a comment.
Hey @JAllemandou - this is great! thanks for catching that - looks all good
to me now too.
TASK DETAIL
https://phabricator.wikimedia.org/T215616
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Isaac
Cc: Marostegui, Isaac
Isaac added a comment.
Hey @JAllemandou, some debugging: a number of items aren't showing up and I
can't for the life of me figure out. The few I've looked at are pretty normal
articles (for example: https://de.wikipedia.org/wiki/Gregor_Grillemeier) and
show up in the original parquet files
Isaac added a comment.
@diego: my interpretation is that right now in the revision history version, the same wikidb/page ID/title is associated with the same wikidata ID regardless of when the revision occurred. what is the use for that over a table that has just one entry per wikidb/page ID/title
Isaac added a comment.
thank you @JAllemandou this is awesome!!! completely unblocks me (i have a bunch of page titles across all the wikipedias and need to check whether a pair of them match the same wikidata item)!TASK DETAILhttps://phabricator.wikimedia.org/T215616EMAIL PREFERENCEShttps
Isaac added a comment.
If we go down that pathway of trying to identify what images are photographs, we should look into work by a former colleague of mine on detecting visualizations on Commons (in some ways, the inverse task): http://brenthecht.com/publications/www18_vizbywiki.pdf
He (Allen Lin
On 4/18/15 5:48 PM, Ricordisamoa wrote:
Il 11/04/2015 13:29, Antoine Isaac ha scritto:
Hi,
Is the 'template' word so bad? Paraphrasing Daniel's definition of the
MediaWiki template, one could see a 'WikiData template' as
a set of of properties that can be re-used, e.g. to make create
things like this together with the
Wikidata data would be great for data-reusers like us, instead of having to
fetch it from elsewhere!
Antoine
---
Antoine Isaac
RD Manager, Europeana.eu
On 4/7/15 3:21 PM, Valentine Charles wrote:
Hello,
Yes I might not use the right term here especially
Hi everyone
All this sounds really good and useful!
I was wondering: is there a relation between the Samsung mappings, and the
Freebase/Wikidata script that Thomas Steiner has recently shared?
https://github.com/google/primarysources/tree/master/frontend
Cheers,
Antoine
On 4/8/15 9:52 PM,
Hi Antoine, all,
I was also a bit puzzled by this. If you want more discussion I there is stuff
on Gerard's blog [1,2].
After some patient explanations of the kind on this list, I think I understood
what qualifiers are about.
Still I disagree with a part of what Markus said. Trying to
: Antoine Isaac ais...@few.vu.nl mailto:ais...@few.vu.nl
To: wikidata-l@lists.wikimedia.org mailto:wikidata-l@lists.wikimedia.org
Subject: [Wikidata-l] 'Person' or 'human', upper ontologies and
migrating 4 million claims
Message-ID: 523f5200.7080...@few.vu.nl mailto
;-)
Best,
Antoine
---
Antoine Isaac
Scientific coordinator, Europeana.eu
[1]
https://www.wikidata.org/wiki/Wikidata:Requests_for_comment/Migrating_away_from_GND_main_type
[2] http://lists.wikimedia.org/pipermail/wikidata-l/2013-September/002815.html
[3] http://lists.wikimedia.org/pipermail/wikidata-l
48 matches
Mail list logo