Hi! Yes I already tested those two ways. I used the mediarequests api 
(https://wikimedia.org/api/rest_v1/metrics/mediarequests/top/en.wikipedia.org/image/2022/05/all-days)
 but since they are just the first 1000 the largest part is composed by icons, 
buttons ets. While I’d like to focus on the images that illustrate an article.

I wrote a script to download all the dumps, open, sort and filter them to get a 
longer list, but it’s very time consuming.

I used in the past articles popularity as proxy, but I was looking for a more 
granular approach and considering the usage of images also across different 
linguistic versions

Best

Michele

From: Dan Andreescu <dandree...@wikimedia.org>
Date: Friday, 4 November 2022 at 15:17
To: Michele Mauri <michele.ma...@polimi.it>
Cc: A mailing list for the Analytics Team at WMF and everybody who has an 
interest in Wikipedia and analytics. <analytics@lists.wikimedia.org>
Subject: [Analytics] Re: Mediacounts fields
I see.  In practice, the mediaviewer instrumentation also had some 
inaccuracies.  For example, the code pre-fetched certain images when opening a 
gallery even if the viewer never ended up looking at them.  I think they 
adjusted the instrumentation to account for that, but I don't remember the 
details.

One thought I had is, have you checked the mediarequests 
API<https://wikitech.wikimedia.org/wiki/Analytics/AQS/Mediarequests>?  It's 
used to power metrics like top media 
requests<https://stats.wikimedia.org/#/en.wikipedia.org/content/top-mediarequests>
 (per project per month).  And you can query it 
directly<https://wikimedia.org/api/rest_v1/metrics/mediarequests/per-file/all-referers/all-agents/%2Fwikipedia%2Fcommons%2F1%2F1a%2FFlag_of_Argentina.svg/monthly/2022010100/2022100100>
 for specific images.  It's backed by the same mediacounts data, so you're 
right, it counts all transfers.  But that's a pretty good proxy for what was 
seen by a user.  If you look at the top 1000 files requested I linked, you'll 
see a lot of icons and flags at the top, which makes sense.  But in between all 
that you'll see real images like Liz Truss's portrait and Socrates and all 
that.  You could filter to only larger images by downloading the image and 
checking its size.

Or you can go another way and look at the top 1000 
articles<https://stats.wikimedia.org/#/en.wikipedia.org/reading/top-viewed-articles>
 on a wiki, find all their images, and analyze those.

Take a look around at the APIs and see if there's a way forward through that 
data (the stats.wikimedia.org<http://stats.wikimedia.org> site queries the API 
directly on the client-side, so if you open up your browser's developer tools 
you can discover the API that way.  You can of course also browse the dynamic 
docs<https://wikimedia.org/api/rest_v1/#/Mediarequests%20data> :))

On Thu, Nov 3, 2022 at 5:52 PM Michele Mauri 
<michele.ma...@polimi.it<mailto:michele.ma...@polimi.it>> wrote:
Thanks. My goal is to understand which are the most viewed images on Commons 
through Wikipedia. By reading the mediacount description, it is possible to get 
the number of transfers. But if I got it well it counts all the images 
transferred to the user, making difficult to understand which have been really 
“seen” by the user. Furthermore, it provides all the interface images and 
icons, making difficult to filter only on the images used to illustrate the 
article.

Focusing only on media viewer clicks seems was a possible solution for solving 
those issues. If you have other suggestions, they are welcome!

Best

Michele

From: Dan Andreescu <dandree...@wikimedia.org<mailto:dandree...@wikimedia.org>>
Date: Thursday, 3 November 2022 at 22:30
To: A mailing list for the Analytics Team at WMF and everybody who has an 
interest in Wikipedia and analytics. 
<analytics@lists.wikimedia.org<mailto:analytics@lists.wikimedia.org>>
Cc: Michele Mauri <michele.ma...@polimi.it<mailto:michele.ma...@polimi.it>>
Subject: Re: [Analytics] Mediacounts fields
We don't have any public data on media viewer interactions specifically.  We 
used to have instrumentation on that feature but we haven't tracked it since 
last year.  To get access to some of the old sanitized data that was retained 
for research purposes, you'd have to file a formal research proposal, and it 
doesn't seem likely to get approved, but maybe tell us more about what you're 
trying to do?

What questions are you hoping to answer, maybe there's another way or another 
kind of dataset that would serve more use cases?

On Thu, Nov 3, 2022 at 4:12 PM Michele Mauri via Analytics 
<analytics@lists.wikimedia.org<mailto:analytics@lists.wikimedia.org>> wrote:
Hello,

For an academic research, I'd like to see which are the most viewed images 
through the "media viewer".

Do you know if it’s possible to get this information? I looked on the wikitech 
portal, but I found just the mediacounts 
(https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Mediacounts) 
which is not what I’m looking for.

Thank you

Michele
_______________________________________________
Analytics mailing list -- 
analytics@lists.wikimedia.org<mailto:analytics@lists.wikimedia.org>
To unsubscribe send an email to 
analytics-le...@lists.wikimedia.org<mailto:analytics-le...@lists.wikimedia.org>
_______________________________________________
Analytics mailing list -- analytics@lists.wikimedia.org
To unsubscribe send an email to analytics-le...@lists.wikimedia.org

Reply via email to