[prometheus-users] Re: How to run Prometheus on AWS ECS?

2022-01-02 Thread Nuria Ruiz
There is quite a bit of work involved in doing that, this post goes over all steps: https://aws.amazon.com/blogs/mt/monitor-and-scale-your-amazon-ecs-on-aws-fargate-application-using-prometheus-metrics/ On Sunday, January 2, 2022 at 2:19:07 PM UTC-8 vickyrat...@gmail.com wrote: > I am having

Re: [Analytics] EventLogging blocked by ad blockers

2020-09-22 Thread Nuria Ruiz
Hello, What are the problems you see with the beacon being blocked when it comes to extracting value from data? In most instances what we look when deriving insights are ratios. For example: "of the people that saw the red link how many clicked it". In this scenario, with an adequate sample

Re: [Analytics] Translations in wikistats

2020-08-31 Thread Nuria Ruiz
; <https://www.avast.com/sig-email?utm_medium=email_source=link_campaign=sig-email_content=webmail> > <#m_3789262350594832144_m_1878655813009564070_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> > > El vie., 28 ago. 2020 a las 19:45, Nuria Ruiz () > escribió: > >> Ruben: >&

[Analytics] Translations in wikistats

2020-08-28 Thread Nuria Ruiz
Ruben: Thanks for your question about translations in wikistats ( http://stats.wikimedia.org). You can contribute translations to wikistats via translate wiki. https://translatewiki.net/wiki/Translating:Wikistats_2.0 I think on our end we need to do a bit better at making obvious this is the

Re: [Analytics] nefarious bot/automated traffic analysis

2020-06-16 Thread Nuria Ruiz
Scott: A good place to start to read about "bot spam" and its impact on the data is this one: https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/BotDetection We recently released a new classification for traffic. Besides classifying traffic as "user" or "spider" we also have now

Re: [Analytics] Clickstream: mobile vs. desktop, empty referrers

2020-06-09 Thread Nuria Ruiz
Hello, See https://phabricator.wikimedia.org/T195880 for info on "none" referrers. Thanks, Nuria On Tue, Jun 9, 2020 at 6:10 AM Joseph Allemandou wrote: > Hi Robert > > From the `WHERE` clause here: > >

Re: [Analytics] "automated" marker added to pageview data

2020-05-18 Thread Nuria Ruiz
uest more details if they have a legitimate need for them. > > On Tue, 5 May 2020 at 02:40, Nuria Ruiz wrote: > >> Hello: >> >> We have added the 'automated' maker to Wikimedia's pageview data. Up to >> now pageview agents were classified as 'spider' (self reporte

[Analytics] "automated" marker added to pageview data

2020-05-04 Thread Nuria Ruiz
Hello: We have added the 'automated' maker to Wikimedia's pageview data. Up to now pageview agents were classified as 'spider' (self reported bots like 'google bot' or 'bing bot') and 'user'. We have known for a while that some requests classified as 'user' were, in fact, coming from automated

Re: [gcj] Re: Java: Parenting Partnering Returns WA

2020-04-15 Thread Nuria Ruiz
id/google-code/8f3fc265-eac3-45a9-a79f-aeb65e5564f3%40googlegroups.com > <https://groups.google.com/d/msgid/google-code/8f3fc265-eac3-45a9-a79f-aeb65e5564f3%40googlegroups.com?utm_medium=email_source=footer> > . > -- *Nuria Ruiz Sánchez* Desarrollos y Aplicaciones (+34) 686 871

Re: [gcj] Re: Java: Parenting Partnering Returns WA

2020-04-14 Thread Nuria Ruiz
t; To view this discussion on the web visit > https://groups.google.com/d/msgid/google-code/e34b0bcf-cd2f-4c89-908a-0a9e53c84d2a%40googlegroups.com > <https://groups.google.com/d/msgid/google-code/e34b0bcf-cd2f-4c89-908a-0a9e53c84d2a%40googlegroups.com?utm_medium=email_source=footer>

Re: [gcj] Re: Java: Parenting Partnering Returns WA

2020-04-13 Thread Nuria Ruiz
+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/google-code/30a84eb9-1736-4739-a1e3-9b5dc394019e%40googlegroups.com > <https://groups.google.com/d/msgid/google-code/30a84eb9-1736-4739-a1e3-9b5dc394019e%40googlegroups.com?utm_medium=ema

[gcj] Java: Parenting Partnering Returns WA

2020-04-10 Thread Nuria Ruiz
Hi, I'm trying to resolve Parenting Partnering Returns, but I'm getting WA. The problem is that for me the results are OK. My idea to solve that is to create a class named Activity and override the method equals, in the method equals I will check if the activity A intersects with activity B.

Re: [gcj] Submit problem for "Parenting Partnering Returns"

2020-04-07 Thread Nuria Ruiz
8dbc327c%40googlegroups.com > <https://groups.google.com/d/msgid/google-code/c643d1f0-d951-46d9-8c0e-35bf8dbc327c%40googlegroups.com?utm_medium=email_source=footer> > . > -- *Nuria Ruiz Sánchez* Desarrollos y Aplicaciones (+34) 686 871 682 nuria.r...@gmail.com Pa

[gcj] WA when submit attempt

2020-04-06 Thread Nuria Ruiz
Dear all, Yesterday I was trying to send my resolution for Parenting Partnering Returns but as a result, I obtained WA error. I don't know what's the meaning of that error, in my test the result is OK. And it is failing the first test. Here the code, import java.io.*; import java.util.*;

Re: [Analytics] [Research-Internal] Kerberos ticket expiry, Jupyterhub on stat1004/1006 and new memory/cpu limits for stat/notebook hosts

2020-03-12 Thread Nuria Ruiz
Hello, >We deployed jupyterhub on stat1004 and stat1006, So we are all clear on what this implies it means that disk space constrains in jupyter notebooks are no longer an issue. The stats machines have much more disk available than the notebook hosts. That being said that answer to larger

Re: [Analytics] SparkContext stopped and cannot be restarted

2020-02-25 Thread Nuria Ruiz
Hello: Following up on this issue, We think many of neil's issues come from the fact that a kerberos ticket expires after 24 hours and once it does your spark session would not work anymore. We will be extending expiration of tickets somewhat to 2/3 days but main point to take home is that

Re: [Analytics] Announcement - Mediawiki History Dumps

2020-02-17 Thread Nuria Ruiz
Hello, We have added a footer to dumps pages with the CC-0 note. Please see: https://dumps.wikimedia.org/other/analytics/ For other changes that you think are needed please do file a phab ticket. Thanks, Nuria On Tue, Feb 11, 2020 at 2:50 PM Nuria Ruiz wrote: > Regarding Licens

Re: [Analytics] Announcement - Mediawiki History Dumps

2020-02-11 Thread Nuria Ruiz
Regarding Licensing, there is already a ticket: https://phabricator.wikimedia.org/T244685 If you take a look the bottom of wikistats (https://stats.wikimedia.org/v2) you will see that dedication is CC0, the data in both systems is the same but, of course, it can be made more explicit. Thanks,

Re: [Analytics] SparkContext stopped and cannot be restarted

2020-02-07 Thread Nuria Ruiz
wikitech.wikimedia.org/wiki/Analytics#Contact> so it stays clear. > > On Fri, 7 Feb 2020 at 07:48, Nuria Ruiz wrote: > >> Hello, >> >> Probably this discussion is not of wide interest to this public list, I >> suggest to move it to analytics-internal? >> >> Tha

Re: [Analytics] SparkContext stopped and cannot be restarted

2020-02-07 Thread Nuria Ruiz
Hello, Probably this discussion is not of wide interest to this public list, I suggest to move it to analytics-internal? Thanks, Nuria On Fri, Feb 7, 2020 at 6:53 AM Andrew Otto wrote: > Hm, interesting! I don't think many of us have used > SparkSession.builder.getOrCreate > repeatedly in

Re: [Analytics] Hourly projectviews by country

2020-01-13 Thread Nuria Ruiz
>Is there any way I can get an hourly time series of which countries are viewing which Wikipedias? Even a (country x project) resolution summary of average views > for the 24 hours of the day would be helpful, if that data exists anywhere. The public data that exists on this regard is aggregated

Re: [Analytics] [Wiki-research-l] Active meta users v active wikimedia users

2020-01-06 Thread Nuria Ruiz
>I was looking to try and work out what percent lf the active wikimedia community are participating on meta and comparing to another wiki farm. Any thoughts on that? I think it will help to give a bit of an example of why you are looking to find this information, why is it important. Participating

Re: [Analytics] Pageviews anomaly‏

2019-12-22 Thread Nuria Ruiz
Hello, This spike is probably caused by bot traffic. I would disregard it entirely. Please see, for example, a similar problem in all top pageviews in hungarian wikipedia for last month. https://phabricator.wikimedia.org/T237282 Thanks, Nuria On Sun, Dec 22, 2019 at 2:42 PM Brian Keegan

Re: [Analytics] Availability of hourly pagecounts files

2019-12-16 Thread Nuria Ruiz
> thought that the hourly files were the source of data for the tool. Is there any estimate of when the missing files will be available? The source of data for the tool is the pagevioew API: https://wikitech.wikimedia.org/wiki/Analytics/AQS/Pageviews#Pageview_counts_by_article Thanks, Nuria On

[Analytics] Releasing a dataset for caching research and tunning

2019-12-05 Thread Nuria Ruiz
Hello, The Analytics team would like to announce the release of a new dataset for caching research and tunning. Please take a look, these datasets are used by the research community for evaluations of caching algorithms. https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Caching

Re: [Analytics] Statistics

2019-08-27 Thread Nuria Ruiz
Emin: You can see identified bot traffic versus user traffic in this graph: https://stats.wikimedia.org/v2/#/az.wikipedia.org/reading/total-page-views/normal|bar|2-year|agent~user*spider|monthly , sometimes bot traffic is about 30% of the traffic. As the prior reply said we know some of the user

Re: [Analytics] [Wiki-research-l] Analytics clients (stat/notebook hosts) and backups of home directories

2019-07-10 Thread Nuria Ruiz
>I have one question for you: As you allow/encourage for more copies of >the files to exist To be extra clear, we do not encourage for data to be in that notebooks hosts at all, there is no capacity of them to neither process nor hosts large amounts of data. Data that you are working with is best

Re: [Analytics] project Cultural Diversity Observatory / accessing analytics hadoop databases

2019-07-09 Thread Nuria Ruiz
I'll let you know when I have more info. > > Thanks again. > Best, > > Marc Miquel > > > Missatge de Nuria Ruiz del dia dt., 9 de jul. 2019 > a les 1:44: > >> >Will there be a release for these two tables? >> No, sorry, there will not be. The data

Re: [Analytics] project Cultural Diversity Observatory / accessing analytics hadoop databases

2019-07-08 Thread Nuria Ruiz
parameters for the entire > table or for specific parts (using batches). > > Will there be a release for these two tables? Could I connect to the > Hadoop to see if the queries on pagelinks and categorylinks run faster? > > If there is any other alternative we'd be happy to try as we cann

Re: [Analytics] project Cultural Diversity Observatory / accessing analytics hadoop databases

2019-07-08 Thread Nuria Ruiz
Hello, >From your description seems that your problem is not one of computation (well, your main problem) but rather data extraction. The labs replicas are not meant for big data extraction jobs as you have just found out. Neither is Hadoop. Now, our team will be releasing soon a dataset of edit

Re: [Wikitech-l] [Tech Talks] June 25, 2019, 6 PM UTC, Just what is Analytics doing back there?

2019-06-25 Thread Nuria Ruiz
The talk has started, this is the you tube stream: https://www.youtube.com/watch?reload=9=GD0PEDFysfM Thanks, Nuria On Mon, Jun 10, 2019 at 8:53 AM Subramanya Sastry wrote: > Hi Everyone, > > It's time for Wikimedia Tech Talks 2019 Episode 5! > This month's talk will take place *June 25,

[Wikitech-l] New time selector in wikistats2 UI

2019-05-17 Thread Nuria Ruiz
Hello, Over the last couple months we've been working on improving the experience of looking through the past on Wikistats2. Until now simple questions like "who were the top editors in June 2010" or "what countries were visiting Arabic Wikipedia the most in 2004" were difficult to answer because

Re: [Analytics] Superset 0.32 upgrade coming tomorrow (May 15th, early EU morning)

2019-05-15 Thread Nuria Ruiz
Hello, Superset is now been upgraded, there are notable fixes on this version and now you can go crazy creating histograms cause they actually work. An example: histogram of response sizes as reported by varnish last week: https://bit.ly/2vYB966 Also, there is a new dataset available called

Re: [Analytics] [ISSUE] dumps.wikimedia.org stop working

2019-04-04 Thread Nuria Ruiz
Hello, This issue should be corrected by now. Please check. Thanks, Nuria On Wed, Apr 3, 2019 at 9:18 AM Nuria Ruiz wrote: > > Sorry this has broken, Erik Z. retired recently and we are moving some of > the work he did to run somewhat differently. You can follow this issue:

Re: [Analytics] [ISSUE] dumps.wikimedia.org stop working

2019-04-03 Thread Nuria Ruiz
Sorry this has broken, Erik Z. retired recently and we are moving some of the work he did to run somewhat differently. You can follow this issue: https://phabricator.wikimedia.org/T220012 On Wed, Apr 3, 2019 at 6:36 AM Mauro Mascia wrote: > Hi, > > it seems that the daily dumps of pagecounts,

Re: [Analytics] Trouble getting yesterday's pageviews data

2019-04-02 Thread Nuria Ruiz
Outage docs now available: https://wikitech.wikimedia.org/wiki/Incident_documentation/20190402-0401KafkaJumbo On Tue, Apr 2, 2019 at 6:15 AM Luca Toscano wrote: > Hi Collin, > > you have anticipated my email :) We are tracking the issue in > https://phabricator.wikimedia.org/T219842, we had a

[Analytics] Easier mapping from Wikistats1 to Wikistats2 metrics

2019-03-28 Thread Nuria Ruiz
Hello! Analytics team would like to announce couple changes. We are working towards an easier way to navigate metrics that appear in both Wikistats1 and Wikistats2 and compare numbers, please take a look at changes deployed today for (for example) Italian Wikipedia:

[Wikitech-l] Easier mapping from Wikistats1 to Wikistats2 metrics

2019-03-28 Thread Nuria Ruiz
Hello! Analytics team would like to announce couple changes. We are working towards an easier way to navigate metrics that appear in both Wikistats1 and Wikistats2 and compare numbers, please take a look at changes deployed today for (for example) Italian Wikipedia:

Re: [Analytics] Availability of data on Wikipedia Zero rollout

2019-03-25 Thread Nuria Ruiz
Sneha, Some of the data that would be key to estimate the "increase of participation" you mention has either never been collected ("Whether those edits were being made using a device that accessed WP through WP Zero") or it was only retained short term, 90 days (" The kind of device being used

Re: [Analytics] R: Analytics Digest, Vol 85, Issue 3

2019-03-11 Thread Nuria Ruiz
ore specific > than "Re: Contents of Analytics digest..." > > > Today's Topics: > >1. R: Analytics Digest, Vol 85, Issue 2 (viviana paga) >2. Re: R: Analytics Digest, Vol 85, Issue 2 (Nuria Ruiz) > > > -

Re: [Analytics] R: Analytics Digest, Vol 85, Issue 2

2019-03-08 Thread Nuria Ruiz
>I thought having some stats by api-user-agent from backend could help me to understand these points and improve in the future my project in the best way. What do you >think ? Is there a procedure that can I follow to have these stats? The stats would be the same, viviana, raw counts of call from

Re: [Analytics] Further Development of Wikipedia statistics

2019-02-07 Thread Nuria Ruiz
Hello, Several things come to mind: Top views provides much of this info digested in a way that would not be hard to calculate what you want, gets data from pageviewAPI and does some useful filtering: https://tools.wmflabs.org/topviews/?project=de.wikipedia.org=all-access=last-month= You

Re: [Analytics] [Research-Internal] Article about ML in production woes

2019-02-07 Thread Nuria Ruiz
Team, Since everyone is here, we will be working on a machine learning infrastructure program this year. I will set up meetings with everyone on this thread and some others in SRE and Audiences to get a "bag of requests" of things that are missing, first round of talks that I hope to finish next

Re: [Wikitech-l] A difficult goodbye

2019-01-15 Thread Nuria Ruiz
litigation against the NSA, has instituted a process to better do promotions and has hired support staff that has made the work of all of us in the department much (much!) easier. On Mon, Jan 14, 2019 at 12:02 PM Nuria Ruiz wrote: > Many thanks for your work in these two years, Victoria. You le

Re: [Wikitech-l] A difficult goodbye

2019-01-14 Thread Nuria Ruiz
Many thanks for your work in these two years, Victoria. You leave the Technology team in much better shape than you found it. For those of you not in the know I think is worth mentioning that in her tenure here Victoria has created the Technical Engagement team to better attend technical

Re: [Analytics] Does prefetch count as a pageview?

2018-12-20 Thread Nuria Ruiz
here is a native browser feature that, when searching through the address >>>> bar (Google powered) by default silently starts loading the url of the top >>>> result shown below the address bar. Maybe there's a way we opted out, but I >>>> think it applies

Re: [Analytics] Does prefetch count as a pageview?

2018-12-19 Thread Nuria Ruiz
> I think that's for the Page Previews feature (i.e., when a user hovers over a link on desktop Wikipedia) or > its corresponding feature in the the Wikipedia for Android (triggered by default on link tap) The code that Fran pointed to only discounts "previews" by Android app as we stablished that

Re: [Analytics] Superset going down for a few hours

2018-12-13 Thread Nuria Ruiz
Superset is back up (should have said: "going down for a few minutes") , We have rolled back the upgrade in progress. Thanks, Nuria On Thu, Dec 13, 2018 at 1:00 PM Nuria Ruiz wrote: > Team: > > Superset will be going down for a few hours today as we rollback the > updat

[Analytics] Superset going down for a few hours

2018-12-13 Thread Nuria Ruiz
Team: Superset will be going down for a few hours today as we rollback the update we were trying to do. It turns out that the newest versions of superset are VERY non backwards compatible, they use python 3.6 which is not available on our debian distro and they introduce a bunch of other bugs.

[Analytics] Wikistats2 - Metrics available for project families

2018-12-12 Thread Nuria Ruiz
Hello! The Analytics team would like to announce that we have now in Wikistats2 metrics available for what we are calling (for the lack of a better name) "project families". That is, "all wikipedias", "all wikibooks"..etc See, for example, bytes added by users to all wikibooks in the last month:

[Wikitech-l] Wikistats2 - Metrics available for project families

2018-12-12 Thread Nuria Ruiz
Hello! The Analytics team would like to announce that we have now in Wikistats2 metrics available for what we are calling (for the lack of a better name) "project families". That is, "all wikipedias", "all wikibooks"..etc See, for example, bytes added by users to all wikibooks in the last month:

Re: [Analytics] EventLogging Hive Refine currently stalled for some Schemas

2018-11-19 Thread Nuria Ruiz
ice, once per hour and once daily looking 4 days back. Data >> should appear once daily job runs for the "holes" missing. > > +1 The EL2Druid daily loading job will cover up the holes for the 12th > and 13th in 1 or 2 days. > > On Thu, Nov 15, 2018 at 5:03 PM Nuria Rui

Re: [Analytics] EventLogging Hive Refine currently stalled for some Schemas

2018-11-15 Thread Nuria Ruiz
Hello, Not all data sources are populated at the same time, the data on Druid is ingested twice, once per hour and once daily looking 4 days back. Data should appear once daily job runs for the "holes" missing. Thanks, Nuria On Thu, Nov 15, 2018 at 7:49 AM Andrew Otto wrote: > > Does "fixed"

Re: [Analytics] Pageviews by agent for May 18-21 2015

2018-11-13 Thread Nuria Ruiz
Hello, > One question we have is whether the pageviews we observe are driven by bots and spiders. We know that the > wikimedia rest api provides this information going back to July 1 2015. Please have in mind that these are only self-identified bots, there is probably about 1-5% of bot pageview

Re: [Analytics] Wiktionary word page views?

2018-10-23 Thread Nuria Ruiz
The pageview API has that data as long as "individual words" are considered "articles". See sample query: https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/en.wiktionary/all-access/all-agents/table/daily/2017100100/2017103100 Docs:

Re: [Analytics] Academic paper of Wikimedia' statistics v2?

2018-10-23 Thread Nuria Ruiz
Abel, If you are talking about http://stats.wikimedia.org/v2 the metric definition has not changed from the (now-called) "legacy wikistats 1" ( http://stats.wikimedia.org) . In the V2 system metrics are surfaced over a new UI and also new APIs so they are available programatically. Some docs:

Re: [Analytics] Community health metrics kit: Input needed!

2018-10-22 Thread Nuria Ruiz
This seems a start towards way to message "community health" that anyone can grasp: https://meta.m.wikimedia.org/wiki/Grants:IdeaLab/Health_rating_radio_button_template_on_talk_pages On Mon, Oct 22, 2018 at 4:10 AM ABEL SERRANO JUSTE wrote: > Thank you for opening the discussion. In our

[Analytics] New reports in wikistats2: "top editors" (a.k.a most prolific contributors) and "top edited articles"

2018-10-11 Thread Nuria Ruiz
Hello, The analytics team would like to announce two new metrics available in wikistats2: 1. Top editors (a.k.a most prolific contributors) See example for Italian wikipedia: https://stats.wikimedia.org/v2/#/it.wikipedia.org/contributing/top-editors/normal|table|1-Month|~total 2. Top edited

[Wikitech-l] New Reports in wikistats2: "top editors" (a.k.a most prolific contributors) and "top edited articles"

2018-10-11 Thread Nuria Ruiz
Hello, The analytics team would like to announce two new metrics available in wikistats2: 1. Top editors (a.k.a most prolific contributors) See example for Italian wikipedia: https://stats.wikimedia.org/v2/#/it.wikipedia.org/contributing/top-editors/normal|table|1-Month|~total 2. Top edited

Re: [Analytics] When is the new pages API updated?

2018-10-10 Thread Nuria Ruiz
>Wikistats 1 generates data on content pages with a delay of 10-15 days after the end of the month This is true for full snapshots (for the reasons we have discussed before and that Dan has described on this thread). You can expect data to be available on the API soon after the 10th, but it is

Re: [Wikitech-l] My Phabricator account has been disabled

2018-08-15 Thread Nuria Ruiz
Rewriting the CoC in a positive rights framework is a daunting > >> project, > >> >> but > >> >> > it might be fun. > >> >> > > >> >> > Regards, > >> >> > Adam > >> >> > &g

Re: [Wikitech-l] My Phabricator account has been disabled

2018-08-11 Thread Nuria Ruiz
>After several negative examples discussed in the last few months on this list,* this action conclusively proves in my eyes the failure of the Code of conduct to be a positive force for our community, at least so far >and in the present conditions. The CoC will prioritize the safety of the

[Analytics] Wikistats2 Better maps and new metric: Legacy Pageviews (a.k.a Pagecounts)

2018-07-11 Thread Nuria Ruiz
Hello! Just a brief note to announce that we have two new things in Wikistats2 this quarter. We have reviewed maps and we now report more precise pageviews per country. Check, for example, pageviews for Portuguese Wikipedia on the world for last month:

[Wikitech-l] Better maps in Wikistats2 and new metric: Legacy Pageviews (a.k.a Pagecounts)

2018-07-11 Thread Nuria Ruiz
On Wed, Jul 11, 2018 at 1:26 PM, Nuria Ruiz wrote: > Hello! > > Just a brief note to announce that we have two new things in wikistats > this quarter. We have reviewed maps by popular demand to give more precise > pageviews per country. > > Check, for example, pageviews for

[Wikitech-l] Batter maps in Wikistats2 and new metric: Legacy Pageviews (a.k.a Pagecounts)

2018-07-11 Thread Nuria Ruiz
Hello! Just a brief note to announce that we have two new things in wikistats this quarter. We have reviewed maps by popular demand to give more precise pageviews per country. Check, for example, pageviews for portuguese wikipedia on the world for last month:

Re: [Analytics] most popular articles per country

2018-07-09 Thread Nuria Ruiz
Amir: FYI that this data has couple caveats: 1) the "-" is pageviews for a page for which we cannot extract a title. 2) data very much affected by bot spikes (you can mitigate that by filtering by agent_type="user" but still, a significant portion of bot traffic is not label as such).

[Analytics] Backfilling some eventlogging data on hadoop

2018-07-06 Thread Nuria Ruiz
Hello: An FYI that we are rerunning some of our jobs to backfill some eventlogging data on hadoop. Job should take a bout a day. Schemas affected are listed on ticket: https://phabricator.wikimedia.org/T198906 Thanks, Nuria ___ Analytics mailing list

Re: [Analytics] EventLogging MariaDB indexes

2018-05-27 Thread Nuria Ruiz
You can open a ticket and either our team or the dbas might be able to do it. Best might be looking at data in hadoop where you can query big amounts of it more easily. Evenloggibg data can be found on the “events” db on hive. Thanks, Nuria On Fri, May 25, 2018 at 11:22 AM Gilles Dubuc

Re: [Analytics] Content of wmf.wdqs_extract

2018-05-08 Thread Nuria Ruiz
Adrian: Please note that this table might disappear soon as the reserach it was created for has finished. Also, we will be rolling out (hopefully) next quarter similar tables that split our large dataset into smaller ones. That work is still WIP. Thanks, Nuria On Tue, May 8, 2018 at 12:22 AM,

[Analytics] Wikistats Data Outage issues

2018-04-23 Thread Nuria Ruiz
Hello! We are investigating a recent outage with data in wikistats. We shall report more as our understanding of issues progresses. Thanks, Nuria ___ Analytics mailing list Analytics@lists.wikimedia.org

Re: [Analytics] How to get the traces of requests to the Wikipedia site in each web server

2018-04-18 Thread Nuria Ruiz
> Is there any download link available for the *webrequest *datasets ? No, sorry, there is no download of webrequest data nor is it kept long term. As I mentioned before the best dataset that might fit your needs is this one: https://analytics.wikimedia.org/datasets/archive/public-

Re: [Wikitech-l] Wikistats 2.0 - Now with Maps!

2018-04-16 Thread Nuria Ruiz
> to table view. I get a list of countries - US, France, Spain, --, > Japan, > > > and so on. It's a link, and clicking it opens unexisting wiki article > > with > > > the same name. > > > Igal > > > > > > > >

Re: [Analytics] Licensing for screenshots of pageviews data

2018-04-13 Thread Nuria Ruiz
My 2 cents: Data on pageviews endpoint is available under: https://creativecommons.org/publicdomain/zero/1.0/ (you need to expand each endpoint to see this, sorry, that UX could be better). You can add to pageview tool a note about licensing of the features it provides. For example: see the

Re: [Analytics] [Research-Internal] Spark2 upgraded to Spark 2.3.0, Spark 1 on the way out

2018-04-10 Thread Nuria Ruiz
FYI that this is happening today. Users may see slowness and paused jobs. We will send a note when upgrade is complete. Thanks, Nuria On Thu, Apr 5, 2018 at 1:22 PM, Andrew Otto wrote: > Hi all! > > I just upgraded spark2 across the cluster to Spark 2.3.0 >

Re: [Analytics] How to get the traces of requests to the Wikipedia site in each web server

2018-04-09 Thread Nuria Ruiz
Hello, I do not think our downloads or API provide a dataset like the one you are interested on. From your question I get the feeling that your assumptions on how our system works does not match reality, wikipedia might not be the best fit for your study. The closest data to what you are asking

Re: [Analytics] Monitor the number of Wikipedia sites and the number of articles in each site

2018-04-03 Thread Nuria Ruiz
Zainan: Labs is our cloud environment for volunteers, you can direct questions about that to cloud e-mail list. https://wikitech.wikimedia.org/wiki/Help:Cloud_Services_Introduction Thanks, Nuria On Mon, Apr 2, 2018 at 7:44 PM, Zainan Zhou (a.k.a Victor) wrote: > Thanks Dan,

Re: [Analytics] [Services] Getting more than just 1000 top articles from REST API

2018-04-02 Thread Nuria Ruiz
>are trying to rebuild our stale encyclopedia apps for offline usage but are space-limited and would only like to include the most likely pages that would be looked at that can fit within a size envelope >that varies with the device in question (up to 100k article limit probably) For this use case

[Wikitech-l] Better Support for Mobile in Wikistats2

2018-03-29 Thread Nuria Ruiz
Hello! Analytics is working on better support for mobile in wikistats2: http://stats.wikimedia.org/v2 Do take a look at latest changes and if there are issues that prevent you from using this site in mobile let us know. A phabricator ticket (http://phabricator.wikimedia.org) with a screenshot

Re: [Analytics] Migrated Reportcard with Updated Data

2018-03-11 Thread Nuria Ruiz
ighted e.g. in our monthly reports, > and IIRC that report card dashboard also included regional numbers). > > Have we preserved this data somewhere? > > On Fri, Apr 7, 2017 at 11:30 AM, Nuria Ruiz <nu...@wikimedia.org> wrote: > >> Hello! >> >> The

Re: [Analytics] Wikipedia internal search clickstream

2018-03-05 Thread Nuria Ruiz
Short answer, no, this data is not available publicy such you can compute the dataset yourself as it is Private data. Thanks, Nuria On Mon, Mar 5, 2018 at 11:31 AM, Georg Sorst wrote: > Hi all, > > sorry for this messy post - I forgot to subscribe to the list so I

Re: [Wikitech-l] You can now translate Phabricator to your language

2018-03-02 Thread Nuria Ruiz
What an awesome project translatewiki is. Nuria On Fri, Mar 2, 2018 at 11:59 AM, Daniel Zahn wrote: > Great work! It made me create a fresh TranslateWiki user and add some > German translations of Phabricator strings. > ___ >

Re: [Analytics] PageView

2018-03-02 Thread Nuria Ruiz
>Or is there another method you also count that is gathered for other companies that collect views? Companies that do this such us comScore do it by getting their participants install (normally desktop software) in their machines and tracking page views that these participants do. It was the case

Re: [Analytics] Wikipedia internal search clickstream

2018-03-02 Thread Nuria Ruiz
>Did I miss something? Is this data available somewhere? You can find more information about click streams datasets here: https://blog.wikimedia.org/2018/01/16/wikipedia-rabbit-hole-clickstream/ Datasets do not include simple wiki, there are calculated for a few wikis some or which are not very

Re: [Wikitech-l] Wikistats 2.0 - Now with Maps!

2018-03-01 Thread Nuria Ruiz
e. [1] http://numeraljs.com/ Pull requests for locales: https://github.com/ adamwdraper/Numeral-js/tree/master/locales [2] https://imgur.com/a/sqHMZ [3] https://imgur.com/a/1FsBE [4] https://imgur.com/a/PBMrY On Thu, Feb 15, 2018 at 8:33 AM, Nuria Ruiz <nu...@wikimedia.org> wrote: > >

Re: [Wikitech-l] Wikistats 2.0 - Now with Maps!

2018-02-28 Thread Nuria Ruiz
ow is this new metric going to differ from that? Or are we talking about > the same thing? > > On Mon, Feb 26, 2018 at 8:42 AM, Nuria Ruiz <nu...@wikimedia.org> wrote: > > > Created a ticket to compute "active editors for all wikis": > > https://phabricator.wikimedia.or

Re: [Wikitech-l] Wikistats 2.0 - Now with Maps!

2018-02-26 Thread Nuria Ruiz
Created a ticket to compute "active editors for all wikis": https://phabricator.wikimedia.org/T188265 On Sat, Feb 24, 2018 at 3:56 PM, Nuria Ruiz <nu...@wikimedia.org> wrote: > >By the way, is there somewhere where I could find total active editors > of all wiki

Re: [Wikitech-l] Wikistats 2.0 - Now with Maps!

2018-02-24 Thread Nuria Ruiz
e wikisourcerer to evaluate number of attendees we might set for the > next Wikisource conference. > > Cheers. > > Le 14/02/2018 à 23:15, Nuria Ruiz a écrit : > > Hello from Analytics team: > > Just a brief note to announce that Wikistats 2.0 includes data about >

Re: [Analytics] How to get old page views data?

2018-02-22 Thread Nuria Ruiz
Peter: Do submit a phabricator tasks with your request, it'll be easier to follow on it than it is via e-mail. Our backlog: https://phabricator.wikimedia.org/tag/analytics/ I assume you know that per article views are available since 2015, a way to see those:

Re: [Analytics] Wikistats 2.0 - Now with Maps!

2018-02-22 Thread Nuria Ruiz
e that search bots and other obscure automated processes are distorting >> this data, and are there ways to filter that out in order to know where are >> the actual humans interested in a Wikimedia project? >> >> >> On Wed, Feb 14, 2018 at 11:15 PM, Nuria Ruiz <nu...@wik

Re: [Wikitech-l] Wikistats 2.0 - Now with Maps!

2018-02-15 Thread Nuria Ruiz
this ticket: https://phabricator.wikimedia.org/T187205 On Thu, Feb 15, 2018 at 9:04 AM, Nuria Ruiz <nu...@wikimedia.org> wrote: > >Sorry, I can't. I opened the link you gave on hewiki. Changed from map > view > >to table view. I get a list of countries - US, France, Spain, --,

Re: [Wikitech-l] Wikistats 2.0 - Now with Maps!

2018-02-15 Thread Nuria Ruiz
mail.com> wrote: > Sorry, I can't. I opened the link you gave on hewiki. Changed from map view > to table view. I get a list of countries - US, France, Spain, --, Japan, > and so on. It's a link, and clicking it opens unexisting wiki article with > the same name. > Igal > > &g

Re: [Wikitech-l] Wikistats 2.0 - Now with Maps!

2018-02-15 Thread Nuria Ruiz
Wed, Feb 14, 2018 at 11:15 PM, Nuria Ruiz <nu...@wikimedia.org> wrote: > > Hello from Analytics team: > > > > Just a brief note to announce that Wikistats 2.0 includes data about > > pageviews per project per country for the current month. > > > > Tak

Re: [Wikitech-l] Wikistats 2.0 - Now with Maps!

2018-02-15 Thread Nuria Ruiz
st of countries - US, France, Spain, --, > Japan, > > > and so on. It's a link, and clicking it opens unexisting wiki article > > with > > > the same name. > > > Igal > > > > > > > > > On Feb 15, 2018 02:23, "Nuria Ruiz" <

Re: [Wikitech-l] Wikistats 2.0 - Now with Maps!

2018-02-14 Thread Nuria Ruiz
"--"? > Igal (User:IKhitron) > > > On Feb 15, 2018 00:15, "Nuria Ruiz" <nu...@wikimedia.org> wrote: > > Hello from Analytics team: > > Just a brief note to announce that Wikistats 2.0 includes data about > pageviews per project per country for the c

[Wikitech-l] Wikistats 2.0 - Now with Maps!

2018-02-14 Thread Nuria Ruiz
Hello from Analytics team: Just a brief note to announce that Wikistats 2.0 includes data about pageviews per project per country for the current month. Take a look, pageviews for Spanish Wikipedia this current month: https://stats.wikimedia.org/v2/#/es.wikipedia.org/reading/pageviews-by-country

Re: [Analytics] Page hourly views

2018-02-11 Thread Nuria Ruiz
Sorry, not sure we understand this question. Can you elaborate? On Sun, Feb 11, 2018 at 12:10 PM, Bo Han wrote: > Hello, > > Is the process for generating pageview hourly backed up? > > Thank you > > ___ > Analytics mailing list

Re: [Analytics] [Ops] How best to accurately record page interactions in Page Previews

2018-02-07 Thread Nuria Ruiz
>Regarding the last few posts about the geolocation information, from the data analysis perspective, there is indeed another, more serious concern about using the GeoIP cookie: >It will create significant discrepancies with the existing geolocation data we record for pageviews, where we have

Re: [Analytics] [Ops] How best to accurately record page interactions in Page Previews

2018-02-01 Thread Nuria Ruiz
>Wow Sam, yeah, if this cookie works for you, it will make many things much easier for us This is how it is done on performance schemas for Navigation timing data per country, so there is a precedence.

Re: [Analytics] [Product] Fwd: Session #6 and into all hands

2018-01-31 Thread Nuria Ruiz
Sorry, my last correspondence was for analytics-internal@ On Wed, Jan 31, 2018 at 8:29 AM, Nuria Ruiz <nu...@wikimedia.org> wrote: > If you have time, do skim through these docs. I will do the same between > today and tomorrow, they are pretty informative as to how annual plan i

[Analytics] Fwd: [Product] Fwd: Session #6 and into all hands

2018-01-31 Thread Nuria Ruiz
If you have time, do skim through these docs. I will do the same between today and tomorrow, they are pretty informative as to how annual plan is and what audiences is doing. -- Forwarded message -- From: Jon Katz Date: Tue, Jan 30, 2018 at 8:16 PM

  1   2   3   4   5   >