Re: [Analytics] Wiki Workshop 2020 Announcement and Call for Papers

2020-03-15 Thread Pine W
Hi Leila,

Thank you for the updates. I have one small question. Will the
sessions which are available on Zoom also be recorded for later
viewing? On occasion, I watch or share presentations after they have
occurred.

Pine
( https://meta.wikimedia.org/wiki/User:Pine )

On Fri, Mar 13, 2020 at 9:43 PM Leila Zia  wrote:
>
> Hi all,
>
> We have an update for you regarding Wiki Workshop 2020 [0] in light of
> the global health situation related to COVID-19.
>
> ==Summary==
> We have turned Wiki Workshop 2020 from an in-person event to a fully
> virtual event. This was not an easy decision for us to make. We know
> that past year attendees had gained a lot from the in-person set-up of
> the workshop. This being said, we're excited about the opportunity of
> organizing the workshop in a virtual set-up: this allows us to reduce
> our carbon foot-print and to allow more people to benefit from the
> workshop.
>
> For this year's workshop, we have decided to remove the registration
> cost which is removing one more barrier for participation. The
> workshop will take place, as originally planned, on April 21 2020. We
> have changed the time of the workshop from Taipei's local time to
> afternoon UTC until evening UTC. (Exact times will be announced in the
> coming couple of weeks.) We also want you to know that we're working
> hard to transform the workshop program to one that can be engaging in
> a virtual set-up. We are making good progress on this front, thanks to
> the immense flexibility of everyone who is working with us including
> our speakers and the authors of the papers. Look for more information
> in the coming weeks about how to register (for fee). If you want to
> know more, please read on! :)
>
> ==Where?==
> Wiki Workshop 2020 is going fully virtual. All talks, conversations,
> poster sessions, and one on one meetings are moved to a virtual
> environment.
>
> ==When?==
> April 21, 2020. We will start in the afternoon UTC and will end in the
> evening UTC. Note that this is a change from the original plan to
> start at 9:00 local time in Taipei. We expect to be able to finalize
> the start and end times of the workshop no later than 2020-03-27.
>
> ==How?==
> We are testing a few different video communication options and most
> likely we will go with Zoom [1]. There is no cost for downloading
> Zoom, and there is even a web browser version of it. However, some of
> the features we will use, such as breakout rooms, will work more
> smoothly if you download Zoom. We will send specific instructions for
> how to connect to those who register for the event.
>
> ==Registration==
> If you are not an author of an accepted archival paper, you can
> request to attend the event for free. We will send the details for how
> to submit your request by 2020-03-27. We will review all registration
> requests and will let you know if your registration is through.
>
> If you are an author of an accepted paper in the workshop, you will
> need to make sure at least one of the authors of your paper is
> registered for a 1-day (or more) in-person registration option offered
> by the Web Conference 2020 organizers or the 5-day virtual attendance
> registration for the conference. Link to register:
> https://www2020.thewebconf.org/registration . Please note that your
> paper will be removed from the proceedings of the conference if you do
> not take this step and we, as workshop organizers, don't have any
> means to fix that for you.
>
> ==Program==
> We traditionally had 5-6 invited talks (45-min each) in Wiki Workshop
> along with a Featured and Lightning Talk session by the authors of the
> accepted papers followed by a poster session. The duration of the
> workshop in the old set-up was 8 hours.
>
> We have no doubt that our traditional model for the program has to
> change for this year. We know that the dynamics of engagement in the
> virtual set-ups are different from the in-person set-ups. Here is what
> we're thinking about the high-level format, the details to be
> announced as we finalize the program in the coming few weeks:
> * Ice-breaker: unchanged.
> * Introductions: some way of making sure every person knows at least a
> few other of the participants. (We can do 15-sec intros if we have 10s
> of attendees, but we can't scale that if we have many more.)
> * Keynote plus Q
> * A conversation: an interview style back-and-forth between two people
> with room for questions at the end.
> * A conversation: a panel style of conversation. we have a couple of
> topics in mind for the panel and are sorting out details.
> * Featured Talks and Lightning Talks: 10-min or 3-min presentations by
> the authors of the accepted papers. This is unchanged from last year,
> except that after this session we would go to a poster session and
> this year we are working on a topic-based virtual poster session. The
> authors will receive more information about this.
> * The virtual poster session
> * Wrap-up
> * One on one meetings: This 

[Analytics] Fwd: [Wikitech-l] [Wikimedia Technical Talks] Data and Decision Science at Wikimedia with Kate Zimmerman, 26 February 2020 @ 6PM UTC

2020-02-24 Thread Pine W
Hello colleagues,

I'm forwarding this announcement to additional email lists.

Most public WMF meetings that are livestreamed on Youtube remain
available for replay after the meeting, and I'm guessing that this one
will be also.

Pine
( https://meta.wikimedia.org/wiki/User:Pine )


-- Forwarded message -
From: Srishti Sethi 
Date: Mon, Feb 24, 2020 at 8:59 PM
Subject: Re: [Wikitech-l] [Wikimedia Technical Talks] Data and
Decision Science at Wikimedia with Kate Zimmerman, 26 February 2020 @
6PM UTC
To: Wikimedia developers 


Hello folks,

Just a reminder that this talk will take place Wednesday 26 February 2020
at 6 PM UTC.

Hope to see you there!

Cheers,
Srishti
*Srishti Sethi*
Developer Advocate
Wikimedia Foundation 



On Tue, Feb 18, 2020 at 2:57 PM Sarah R  wrote:

> Hello Everyone,
>
> It's time for Wikimedia Tech Talks 2020 Episode 1! This talk will take
> place on *26 February 2020 at 6 PM UTC*.
>
> This month's talk will be in an interview format. You are invited to send
> questions ahead of time by replying to this email, or you can ask during Q
> & A section of the live talk by asking through IRC or the Youtube
> Livestream.
>
> Title: Data and Decision Science at Wikimedia
>
> Speaker:  Kate Zimmerman,  Head of Product Analytics at Wikimedia
>
> Summary:
>
> How do teams at the Foundation use data to inform decisions?
>
> Sarah R. Rodlund talks with Kate Zimmerman, Head of Product Analytics at
> Wikimedia, about what sorts of data her team uses and how insights from
> their analysis have shaped product decisions.
>
> Kate Zimmerman holds an MS in Psychology & Behavioral Decision Research
> from Carnegie Mellon University and has over 15 years of experience in
> quantitative and experimental methods. Before joining Wikimedia, she built
> data teams from scratch at ModCloth and SmugMug, evolving their data
> capabilities from basic reports to strategic analysis, automated
> dashboards, and advanced modeling.
>
> The link to the Youtube Livestream can be found here:
> https://www.youtube.com/watch?v=J-CRsiwYM9w
>
> During the live talk, you are invited to join the discussion on IRC at
> #wikimedia-office
>
> You can watch past Tech Talks here:
> https://www.mediawiki.org/wiki/Tech_talks
>
> If you are interested in giving your own tech talk, you can learn more
> here:
>
> https://www.mediawiki.org/wiki/Project:Calendar/How_to_schedule_an_event#Tech_talks
>
> Note: This is a public talk. Feel free to distribute through appropriate
> email and social channels!
>
> Many kindnesses,
>
> Sarah R. Rodlund
> Technical Writer, Developer Advocacy
> srodl...@wikimedia.org
> ___
> Wikitech-l mailing list
> wikitec...@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
___
Wikitech-l mailing list
wikitec...@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Announcement - Mediawiki History Dumps

2020-02-10 Thread Pine W
I was thinking about the licensing issue some more. Apparently there
was a relevant United States court case regarding metadata several
years ago in the United States, but it's unclear to me from my brief
web search whether this holding would apply to metadata from every
nation. Also, I don't know if the underlying statues have changed
since the time of that ruling. I think that WMF Legal should be
consulted regarding the copyright status of the metadata. Also, I
think that the licensing of metadata should be explicitly addressed in
the Terms of Use or a similar document which is easily accessible to
all contributors to Wikimedia sites.

Pine
( https://meta.wikimedia.org/wiki/User:Pine )

On Tue, Feb 11, 2020 at 12:17 AM Pine W  wrote:
>
> Hi Joseph,
>
> Thanks for this announcement.
>
> I am looking for license information regarding the dumps, and I'm not
> finding it in the pages that you linked at [1] or [2]. The license
> that applies to text on Wikimedia sites is often CC-BY-SA 3.0, and the
> WMF Terms of Use at https://foundation.wikimedia.org/wiki/Terms_of_Use
> do not appear to provide any exception for metadata. In the absence of
> a specific license, I think that the CC-BY-SA or other relevant
> licenses would apply to the metadata, and that the licensing
> information should be prominently included on relevant pages and in
> the dumps themselves.
>
> What do you think?
>
> Pine
> ( https://meta.wikimedia.org/wiki/User:Pine )
>
> On Mon, Feb 10, 2020 at 4:28 PM Joseph Allemandou
>  wrote:
> >
> > Hi Analytics People,
> >
> > The Wikimedia Analytics Team is pleased to announce the release of the most 
> > complete dataset we have to date to analyze content and contributors 
> > metadata: Mediawiki History [1] [2].
> >
> > Data is in TSV format, released monthly around the 3rd of the month 
> > usually, and every new release contains the full history of metadata.
> >
> > The dataset contains an enhanced [3] and historified [4] version of user, 
> > page and revision metadata and serves as a base to Wiksitats API on edits, 
> > users and pages [5] [6].
> >
> > We hope you will have as much fun playing with the data as we have building 
> > it, and we're eager to hear from you [7], whether for issues, ideas or 
> > usage of the data.
> >
> > Analytically yours,
> >
> > --
> > Joseph Allemandou (joal) (he / him)
> > Sr Data Engineer
> > Wikimedia Foundation
> >
> > [1] https://dumps.wikimedia.org/other/mediawiki_history/readme.html
> > [2] 
> > https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits/Mediawiki_history_dumps
> > [3] Many pre-computed fields are present in the dataset, from edit-counts 
> > by user and page to reverts and reverted information, as well as time 
> > between events.
> > [4] As accurate as possible historical usernames and page-titles (as well 
> > as user-groups and blocks) is available in addition to current values, and 
> > are provided in a denormalized way to every event of the dataset.
> > [5] https://wikitech.wikimedia.org/wiki/Analytics/AQS/Wikistats_2
> > [6] https://wikimedia.org/api/rest_v1/
> > [7] 
> > https://phabricator.wikimedia.org/maniphest/task/edit/?title=Mediawiki%20History%20Dumps=Analytics-Wikistats,Analytics
> > ___
> > Analytics mailing list
> > Analytics@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/analytics

___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Announcement - Mediawiki History Dumps

2020-02-10 Thread Pine W
Hi Joseph,

Thanks for this announcement.

I am looking for license information regarding the dumps, and I'm not
finding it in the pages that you linked at [1] or [2]. The license
that applies to text on Wikimedia sites is often CC-BY-SA 3.0, and the
WMF Terms of Use at https://foundation.wikimedia.org/wiki/Terms_of_Use
do not appear to provide any exception for metadata. In the absence of
a specific license, I think that the CC-BY-SA or other relevant
licenses would apply to the metadata, and that the licensing
information should be prominently included on relevant pages and in
the dumps themselves.

What do you think?

Pine
( https://meta.wikimedia.org/wiki/User:Pine )

On Mon, Feb 10, 2020 at 4:28 PM Joseph Allemandou
 wrote:
>
> Hi Analytics People,
>
> The Wikimedia Analytics Team is pleased to announce the release of the most 
> complete dataset we have to date to analyze content and contributors 
> metadata: Mediawiki History [1] [2].
>
> Data is in TSV format, released monthly around the 3rd of the month usually, 
> and every new release contains the full history of metadata.
>
> The dataset contains an enhanced [3] and historified [4] version of user, 
> page and revision metadata and serves as a base to Wiksitats API on edits, 
> users and pages [5] [6].
>
> We hope you will have as much fun playing with the data as we have building 
> it, and we're eager to hear from you [7], whether for issues, ideas or usage 
> of the data.
>
> Analytically yours,
>
> --
> Joseph Allemandou (joal) (he / him)
> Sr Data Engineer
> Wikimedia Foundation
>
> [1] https://dumps.wikimedia.org/other/mediawiki_history/readme.html
> [2] 
> https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits/Mediawiki_history_dumps
> [3] Many pre-computed fields are present in the dataset, from edit-counts by 
> user and page to reverts and reverted information, as well as time between 
> events.
> [4] As accurate as possible historical usernames and page-titles (as well as 
> user-groups and blocks) is available in addition to current values, and are 
> provided in a denormalized way to every event of the dataset.
> [5] https://wikitech.wikimedia.org/wiki/Analytics/AQS/Wikistats_2
> [6] https://wikimedia.org/api/rest_v1/
> [7] 
> https://phabricator.wikimedia.org/maniphest/task/edit/?title=Mediawiki%20History%20Dumps=Analytics-Wikistats,Analytics
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics

___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


[Analytics] Effects on Wikimedia web traffic trends from sites that reuse Wikimedia content and/or trademarks

2019-07-27 Thread Pine W
Hi WMF Analytics,

In my web searches in the past few months I am seeing an increasing number
of websites that have republished Wikimedia content, sometimes in ways that
I suspect are in violation of trademark and/or Creative Commons licensing
rules. (My guess is that these sites make money through advertising that
they place on their sites.) Has WMF observed any negative effects in web
traffic that can be attributed to other websites reusing Wikimedia content
and/or trademarks?

It might be interesting if WMF can obtain statistics from web search
providers regarding how many times users click on search engine links to
sites that reuse Wikimedia content and/or trademarks.

Pine
( https://meta.wikimedia.org/wiki/User:Pine )
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Pageviews by agent for May 18-21 2015

2018-11-13 Thread Pine W
Hi there,

Although this doesn't answer your specific question, I thought that I'd
share that my observations from watching traffic patterns on some Wikimedia
pages suggests that the classification of readers into bot, spider, or
human has some margin of error, but I don't know what the margin of error
is. The margin of error might be worth considering as you analyze the
traffic that interests you, especially if you have reason to believe that
the margin of error is statistically significant.

Pine
( https://meta.wikimedia.org/wiki/User:Pine )


On Tue, Nov 13, 2018 at 2:41 PM Jennifer Pan  wrote:

> Hi there,
>
>
> I'm an assistant professor in the Department of Communication at Stanford.
> My co-author, Molly Roberts (Political Science, UCSD), and I are working on
> a paper examining the effect of China's 2015 block of Chinese language
> wikipedia on pageviews, which builds on our previous work on censorship in
> China.
>
> We are using the block to conduct a interrupted time series design to
> measure the effect of censorship on Chinese users. Our main finding is that
> Chinese users were using Wikipedia to browse (starting at the home page),
> and the block influenced users' ability to explore and encounter unexpected
> information. One question we have is whether the pageviews we observe are
> driven by bots and spiders. We know that the wikimedia rest api provides
> this information going back to July 1 2015. Since the China block of
> Wikipedia was on May 19, 2015, we are wondering if there is pageview data
> by agent type for zh.wikipedia.org pages (all or some subset like most
> popular) going back to May 2015 (specifically May 18-21, 2015)? From
> https://meta.wikimedia.org/wiki/Research:Timeline_of_Wikimedia_analytics,
> it says that pageview data is available in bulk starting on May 1, 2015,
> so we thought maybe there was some chance this data exists.
>
> Any suggestions would be greatly appreciated, and if this is not possible,
> please let us know.
>
> Thank you!
> Jennifer Pan
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Statistics about republication of Wikimedia content

2018-10-19 Thread Pine W
On Wed, Oct 17, 2018 at 7:38 PM Leila Zia  wrote:

> Hi Pine,
>
>
> On Tue, Sep 18, 2018 at 12:11 PM Pine W  wrote:
> >
> > Hi Analytics,
> >
> > Are views of republished Wikimedia content, such as on Google and
> Youtube, something that we could include in addition to Wikimedia pageview
> statistics? I imagine that this would require cooperation from Alphabet and
> other companies that reuse Wikimedia content. It would be nice if we could
> get that cooperation.
>
> This is an interesting idea, and as Dan has mentioned in his response,
> something that we're generally interested in. Measuring re-use can
> open up a lot of opportunities for us as a Movement: that the
> importance of Wikipedia does not end in Wikipedia, that the content
> and knowledge is presented to different audiences through a variety of
> channels.
>
> While we may be able to start getting some raw numbers for re-use from
> specific platforms (through cooperations that you called out or other
> means), the problem is much more complex than what those raw numbers
> can show and a part of me is interested to address that more
> fundamental question and not summarize the value of Wikipedia with
> direct pageviews. We all know that the value of WP doesn't end in
> Wikipedia. For example, the exact/rough value of Wikipedia for
> Knowledge Vault [1] which was/is the underlying mechanism for
> surfacing search results in Google and other major websites' products
> is unknown to us. It is easier to measure how many times Wikipedia is
> directly used in Google Home, Alexa, Google/bing/etc. search, and
> harder to see the value of Wikipedia for many of the services we enjoy
> using today on and outside of the Web (including search logic,
> Google/Yandex/etc. translation machines, many of the advancements in
> AI and ML fields (NLP field has highly benefited from WP for example
> and NLP is heavily used across many industries), ...).
>
> From the research perspective, the really interesting and informing
> research question is: what is the value of Wikipedia? (both economic
> and otherwise) across the many languages. It would be great to be able
> to get to the bottom of this question. If we can measure this, we have
> opened up a major force to open up more doors for Wikipedia.
>
> Best,
> Leila
>
> [1]
> https://storage.googleapis.com/pub-tools-public-publication-data/pdf/45634.pdf
>
>
Hi Leila,

I like how you're thinking about this. I think that Lisa in Fundraising
made a public statement along similar lines earlier this year, which went
something like "Wikipedia is like a public utility that people take for
granted."

Pine
( https://meta.wikimedia.org/wiki/User:Pine )
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


[Analytics] Statistics about republication of Wikimedia content

2018-09-18 Thread Pine W
Hi Analytics,

Are views of republished Wikimedia content, such as on Google and Youtube,
something that we could include in addition to Wikimedia pageview
statistics? I imagine that this would require cooperation from Alphabet and
other companies that reuse Wikimedia content. It would be nice if we could
get that cooperation.

Also, Is this republication taken into account in website traffic rankings?
My guess is that the answer is no, and that other types of republication
such as embedded Youtube videos are not taken into account for their
content provider's site rankings, although I think that Youtube would count
views of embedded videos in its own statistics of video views. I am
thinking that for Youtube and Wikipedia, and other similar sites for which
republication or embedding are common, site rankings which are based on
pageviews could significantly underestimate the popularity and influence of
the sites.

Regards,

Pine
( https://meta.wikimedia.org/wiki/User:Pine )
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


[Analytics] Fwd: [Wikitech-l] Change tag reading from the new column in the beta cluster

2018-09-06 Thread Pine W
Forwarding to Analytics and Research in case this is of interest.

Pine
( https://meta.wikimedia.org/wiki/User:Pine )


-- Forwarded message -
From: Amir Sarabadani 
Date: Thu, Sep 6, 2018 at 9:19 PM
Subject: [Wikitech-l] Change tag reading from the new column in the beta
cluster
To: Wikimedia developers 


Hello,
As part of normalizing change tag schema [1] I just switched on reading
from the new column (ct_tag_id in change_tag table, a foreign key to ctd_id
from change_tag_def table) in beta cluster [2] which means new rows will
have empty string as their value of ct_tag. [3]

We are not rushing to flip the switch in production but I just wanted to
send this email asking people who test in beta cluster to file a
phabricator ticket if they see anything unexpected in there that might be
related change tags. This table is being read if someone checks history,
recent changes, watchlist, user contributions, or whole lot of other
special pages plus lots of API queries. I checked anything I could think of
but I might have missed something. Any extra pair of eyes would be
extremely appreciated.

[1]: https://phabricator.wikimedia.org/T185355
[2]: https://phabricator.wikimedia.org/T196671
[3]: For example:
MariaDB [enwiki]> select ct_id, ct_rc_id, ct_rev_id, ct_tag, ct_tag_id from
change_tag order by ct_id desc limit 10;
++--+---+---+---+
| ct_id  | ct_rc_id | ct_rev_id | ct_tag| ct_tag_id |
++--+---+---+---+
| 217824 |   633991 |384018 |   | 3 |
| 217823 |   633990 |384017 |   | 3 |
| 217822 |   633989 |384016 |   | 3 |
| 217821 |   633988 |384015 |   | 3 |
| 217820 |   633987 |384014 |   | 3 |
| 217819 |   633986 |384013 | mw-undo   | 2 |
| 217818 |   633985 |384012 | mw-undo   | 2 |
| 217817 |   633984 |384011 | visualeditor-wikitext |29 |
| 217816 |   633983 |384010 | mobile web edit   |16 |
| 217815 |   633983 |384010 | mobile edit   |15 |
++--+---+---+---+
10 rows in set (0.00 sec)

Thank you!
Best
-- 
Amir Sarabadani
Software Engineer

Wikimedia Deutschland e. V. | Tempelhofer Ufer 23-24 | 10963 Berlin
Tel. (030) 219 158 26-0
http://wikimedia.de

Stellen Sie sich eine Welt vor, in der jeder Mensch an der Menge allen
Wissens frei teilhaben kann. Helfen Sie uns dabei!
http://spenden.wikimedia.de/

Wikimedia Deutschland – Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/029/42207.
___
Wikitech-l mailing list
wikitec...@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


[Analytics] Fwd: [Wikitech-l] huwiki, arwiki to be treated as 'big wikis' and run parallel jobs

2018-08-20 Thread Pine W
More changes are coming for dumps, this time for Hungarian Wikipedia
(approximately 436,000 articles) and Arabic Wikipedia.(approximately
595,000 articles).

Pine
( https://meta.wikimedia.org/wiki/User:Pine )


-- Forwarded message -
From: Ariel Glenn WMF 
Date: Mon, Aug 20, 2018 at 10:27 AM
Subject: [Wikitech-l] huwiki, arwiki to be treated as 'big wikis' and run
parallel jobs
To: Wikipedia Xmldatadumps-l ,
Wikimedia developers 


Starting September 1, huwiki and arwiki, which both take several days to
complete the revsion history content dumps, will be moved to the 'big
wikis' list, meaning that they will run jobs in parallel as do frwiki,
ptwiki and others now, for a speedup.

Please update your scripts accordingly.  Thanks!

Task for this: https://phabricator.wikimedia.org/T202268

Ariel
___
Wikitech-l mailing list
wikitec...@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


[Analytics] Fwd: [Wikitech-l] MultiContent Revisions and changes to the XML dumps

2018-08-03 Thread Pine W
Forwarding in case this is of interest to people on the Analytics or
Research lists who don't subscribe to Wikitech-l or Xmldatadumps-l.

Pine
( https://meta.wikimedia.org/wiki/User:Pine )

-- Forwarded message --
From: Ariel Glenn WMF 
Date: Thu, Aug 2, 2018 at 2:40 PM
Subject: [Wikitech-l] MultiContent Revisions and changes to the XML dumps
To: Wikipedia Xmldatadumps-l ,
Wikimedia developers 


As many of you may know, MultiContent Revisions are coming soon (October?)
to a wiki near you. This means that we need changes to the XML dumps
schema; these changes will likely NOT be backwards compatible.

Initial discussion will take place here:
https://phabricator.wikimedia.org/T199121

For background on MultiContent Revisions and their use on e.g. Commons or
WikiData, see:

https://phabricator.wikimedia.org/T200903 (Commons media medata)
https://phabricator.wikimedia.org/T194729 (Wikidata entites)
https://www.mediawiki.org/wiki/Requests_for_comment/Multi-Content_Revisions
(MCR generally)

There may be other, better tickets/pages for background; feel free to
supplement this list if you have such links.

Ariel
___
Wikitech-l mailing list
wikitec...@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


[Analytics] Fwd: [Wikitech-l] hewiki dump to be added to 'big wikis' and run with multiple processes

2018-07-23 Thread Pine W
Forwarding in case this is of interest to anyone on the Analytics or
Research lists who doesn't subscribe to Wikitech-l or Xmldatadumps-l.

Pine
( https://meta.wikimedia.org/wiki/User:Pine )

-- Forwarded message --
From: Ariel Glenn WMF 
Date: Fri, Jul 20, 2018 at 5:53 AM
Subject: [Wikitech-l] hewiki dump to be added to 'big wikis' and run with
multiple processes
To: Wikipedia Xmldatadumps-l ,
Wikimedia developers 


Good morning!

The pages-meta-history dumps for hewiki take 70 hours these days, the
longest of any wiki not already running with parallel jobs. I plan to add
it to the list of 'big wikis' starting August 1st, meaning that 6 jobs will
run in parallel producing the usual numbered file output; look at e.g.
frwiki dumps for an example.

Please adjust any download/processing scripts accordingly.

Thanks!

Ariel
___
Wikitech-l mailing list
wikitec...@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


[Analytics] Wikiscan statistics tool for Wikimedia projects

2017-07-30 Thread Pine W
Wikiscan is an interesting tool for statistics fans. I suggest briefly
reading this IEG page
, then
playing with the tool on https://wikiscan.org/

Pine
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Fwd: follow-up on editors

2017-04-11 Thread Pine W
Maybe looping back a little to Aaron's original question which I'm guessing
is from Shannon: we don't have a way to measure " Number of editors who
contribute 1 edit per month" as we don't have ways to accurately identify
people who use multiple accounts, IPs, etc. We do have ways to track number
of unique accounts, but that's different from number of unique editors.
Although it would be nice to know how many people edit the projects on any
given month, that's impossible to know, although maybe Nuria and the other
analytics folks would have some ideas on how to get an approximation.

Pine


On Tue, Apr 11, 2017 at 9:48 PM, Pine W <wiki.p...@gmail.com> wrote:

> If we're going to have a conversation about terminology, I would like to
> drop the terms "active editor" and "highly active editor" and replace them
> with "5+ edits per month" and "100+ edits per month". There are multiple
> ways of measuring productivity, and I'm wary of the amount of prominence
> that's given to the number of edits as the primary metric of productivity.
> Also, I don't think it's clear to analytics nebwbies that "active editor"
> is a term with a specific definition rather than a general description of
> people who edit "actively" (whatever that means). I'm fine with using 5+
> edits per month and 100+ edits per month as measures of productivity, but I
> would prefer to drop the terms "active editor" and "very active editor".
> I'd also like to see more prominence given to other metrics such as bytes
> changed and logged non-edit actions.
>
> Pine
>
>
> On Tue, Apr 11, 2017 at 10:22 AM, Erik Zachte <ezac...@wikimedia.org>
> wrote:
>
>> Aaron,
>>
>>
>>
>> Yeah my analogy is arguably imprecise.
>>
>> And for your analogy, you assume that the public astronomy database is
>> guarded Nupedia style, with credentials. Could be, explicit mention of this
>> assumption would resolve ambiguity ;-)
>>
>> > Our licensing asserts that they must be attributed.
>>
>>
>>
>> Sure these people who did one edit must be attributed whenever the page
>> they edited is published somewhere else.
>>
>> But do we ever do that for real these days? Seems like a dead clause from
>> a distant past, expect for our onwiki history page.
>>
>>
>>
>> Also giving credit is something else than counting, and publishing that
>> count as some meaningful metric (not saying that you want to do that, but
>> others will find the factoid and run with it)
>>
>> We can discuss semantics. But when a person writes one word a year we
>> wouldn't call that person a 'writer', do we?
>>
>> Words lose their meaning if their definition is stretched in extremo,
>> beyond common sense, beyond what any audience assumes those words mean.
>>
>>
>>
>> Long ago we found that a huge amount of registered users made not even
>> one edit.
>>
>> One explanation might be that many people habitually sign up, just out of
>> habit. Or that they want to tweak the UI (e.g. red links in preferences).
>>
>>
>>
>> My point: count as you like, but could we avoid using a term with so many
>> connotations for these edge cases, so as not to confuse people even more
>> about our metrics?
>>
>>
>>
>> Erik
>>
>>
>>
>>
>>
>> *From:* Analytics [mailto:analytics-boun...@lists.wikimedia.org] *On
>> Behalf Of *Aaron Halfaker
>> *Sent:* Tuesday, April 11, 2017 16:55
>>
>> *To:* A mailing list for the Analytics Team at WMF and everybody who has
>> an interest in Wikipedia and analytics.
>> *Subject:* Re: [Analytics] Fwd: follow-up on editors
>>
>>
>>
>> Erik,
>>
>>
>>
>> I appreciate pushing back on just looking for bigger metrics, but there's
>> something more important when it comes to measuring people who contribute
>> at least a little bit.  Our licensing asserts that they must be
>> attributed.  After all, they have contributed something.
>>
>>
>>
>> Also, for your astronomy comparison, this would be more like saying that
>> anyone who contributes to publicly recorded astronomy observations is an
>> astronomer -- even if they have only done so once.  In my estimation, that
>> doesn't sound crazy.  Your comparison to "looking at the night sky" is a
>> lot more like reading Wikipedia.
>>
>>
>>
>> -Aaron
>>
>>
>>
>> On Tue, Apr 11, 2017 at 6:35 AM, Erik Zachte <ezac...@wikimedia.org>
>> wrote:
>>
>> Abou

Re: [Analytics] Fwd: follow-up on editors

2017-04-11 Thread Pine W
If we're going to have a conversation about terminology, I would like to
drop the terms "active editor" and "highly active editor" and replace them
with "5+ edits per month" and "100+ edits per month". There are multiple
ways of measuring productivity, and I'm wary of the amount of prominence
that's given to the number of edits as the primary metric of productivity.
Also, I don't think it's clear to analytics nebwbies that "active editor"
is a term with a specific definition rather than a general description of
people who edit "actively" (whatever that means). I'm fine with using 5+
edits per month and 100+ edits per month as measures of productivity, but I
would prefer to drop the terms "active editor" and "very active editor".
I'd also like to see more prominence given to other metrics such as bytes
changed and logged non-edit actions.

Pine


On Tue, Apr 11, 2017 at 10:22 AM, Erik Zachte  wrote:

> Aaron,
>
>
>
> Yeah my analogy is arguably imprecise.
>
> And for your analogy, you assume that the public astronomy database is
> guarded Nupedia style, with credentials. Could be, explicit mention of this
> assumption would resolve ambiguity ;-)
>
> > Our licensing asserts that they must be attributed.
>
>
>
> Sure these people who did one edit must be attributed whenever the page
> they edited is published somewhere else.
>
> But do we ever do that for real these days? Seems like a dead clause from
> a distant past, expect for our onwiki history page.
>
>
>
> Also giving credit is something else than counting, and publishing that
> count as some meaningful metric (not saying that you want to do that, but
> others will find the factoid and run with it)
>
> We can discuss semantics. But when a person writes one word a year we
> wouldn't call that person a 'writer', do we?
>
> Words lose their meaning if their definition is stretched in extremo,
> beyond common sense, beyond what any audience assumes those words mean.
>
>
>
> Long ago we found that a huge amount of registered users made not even one
> edit.
>
> One explanation might be that many people habitually sign up, just out of
> habit. Or that they want to tweak the UI (e.g. red links in preferences).
>
>
>
> My point: count as you like, but could we avoid using a term with so many
> connotations for these edge cases, so as not to confuse people even more
> about our metrics?
>
>
>
> Erik
>
>
>
>
>
> *From:* Analytics [mailto:analytics-boun...@lists.wikimedia.org] *On
> Behalf Of *Aaron Halfaker
> *Sent:* Tuesday, April 11, 2017 16:55
>
> *To:* A mailing list for the Analytics Team at WMF and everybody who has
> an interest in Wikipedia and analytics.
> *Subject:* Re: [Analytics] Fwd: follow-up on editors
>
>
>
> Erik,
>
>
>
> I appreciate pushing back on just looking for bigger metrics, but there's
> something more important when it comes to measuring people who contribute
> at least a little bit.  Our licensing asserts that they must be
> attributed.  After all, they have contributed something.
>
>
>
> Also, for your astronomy comparison, this would be more like saying that
> anyone who contributes to publicly recorded astronomy observations is an
> astronomer -- even if they have only done so once.  In my estimation, that
> doesn't sound crazy.  Your comparison to "looking at the night sky" is a
> lot more like reading Wikipedia.
>
>
>
> -Aaron
>
>
>
> On Tue, Apr 11, 2017 at 6:35 AM, Erik Zachte 
> wrote:
>
> About 'Number of editors who contribute 1 edit per month?'
>
>
>
> I'm hoping we're not going that use that number for our next fundraiser ;-)
>
> The more inclusive our numbers are, the less meaningful, bordering on
> alternative facts.
>
>
>
> A person with one edit in any given month is as much an editor as a person
> who looks at the night sky a few times a year is an astronomer.
>
> We have billions of those on this planet!
>
>
>
> Erik
>
>
>
>
>
> *From:* Analytics [mailto:analytics-boun...@lists.wikimedia.org] *On
> Behalf Of *Neil Patel Quinn
> *Sent:* Friday, March 31, 2017 23:06
> *To:* A mailing list for the Analytics Team at WMF and everybody who has
> an interest in Wikipedia and analytics.
> *Subject:* Re: [Analytics] Fwd: follow-up on editors
>
>
>
> Funny story: I noticed that Aaron's graph has the 1-month new editor
> retention on enwiki at about 7%, while I had recently done some queries
> 
> that put it a little under 4%.
>
> It turns out I made an error in my Unix timestamp math, and I was looking
> at the *12 hour *new editor retention rate. It'll be interesting to see
> if the ranking of wikis by retention changes significantly when I correct
> that.
>
>
>
> On Wed, Mar 29, 2017 at 2:15 PM, Aaron Halfaker 
> wrote:
>
>
> *https://commons.wikimedia.org/
> wiki/File:Enwiki.monthly_user_retention.survival_proportion.svg*
>
>
>
> 

[Analytics] Fwd: [Wiki-research-l] Research Scientist position at WMF

2017-03-28 Thread Pine W
Forwarding.

Pine


-- Forwarded message --
From: Leila Zia 
Date: Tue, Mar 28, 2017 at 10:36 AM
Subject: [Wiki-research-l] Research Scientist position at WMF
To: Research into Wikimedia content and communities <
wiki-researc...@lists.wikimedia.org>


Hi all,

The Research team at the Wikimedia Foundation has just opened a full-time
research scientist position
.
In the past years, the team has worked on a variety of projects, including:
building ML-based scoring systems for Wikipedia and Wikidata

, recommendations systems for article creation
,
models
to detect harassment and personal attacks
,
and more. we are looking to add one more full-time role to our team to
expand our research capacity and strengthen our collaborations with
academia and industry.

If this is the kind of job you're interested in, please consider applying.
If you know people in your network who may be a good fit, please encourage
them to apply.

Best,
Leila

--
Leila Zia
Senior Research Scientist
Wikimedia Foundation
___
Wiki-research-l mailing list
wiki-researc...@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Fwd: follow-up on editors

2017-03-22 Thread Pine W
Hi Aaron and Shannon,

Could you clarify " Average hours of spent by editors by segment (5+ edits
and 100+ edits)?" If that's referring to logged-in time, that information
might be available, but keep in mind that a number of contributors do a
variety of Wikimedia-related activities off-wiki, so the measure of time
spent on-wiki will understate the total number of hours spent by
contributors on Wikimedia-related activities.

Regarding numbers of editors, keep in mind that some humans may have more
than 1 account, sometimes for legitimate purposes and sometimes for
illegitimate purposes. Also, getting an accurate count of "anonymous"
editors is difficult.

I don't mean to sound critical of the questions; I just want to emphasize
that answers are likely to be incomplete. (:

Regards,

Pine


On Wed, Mar 22, 2017 at 2:43 PM, Aaron Halfaker 
wrote:

> Hey folks,
>
> I just got the following data requests emailed to me and I figured that
> this list is probably best equipped to answer:
>
> · Number of editors who contribute 1 edit per month?
>
> · Is it possible/feasible to run editor retention metrics
> globally (versus just based on a single project?
>
> · Total number of editors on all projects over the past 16 years
> (not just ENWP)?
>
> · Global distribution of editors by region (or country), 2016
> (the last I saw is from 2008
> )?
>
> · Average hours of spent by editors by segment (5+ edits and 100+
> edits)?
>
>
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


[Analytics] Fwd: [Wikidata] WMDE looking for a data analyst

2017-02-07 Thread Pine W
Forwarding.

Pine


-- Forwarded message --
From: Léa Lacroix 
Date: Tue, Feb 7, 2017 at 2:45 AM
Subject: [Wikidata] WMDE looking for a data analyst
To: "Discussion list for the Wikidata project." <
wikid...@lists.wikimedia.org>


Hello all,

Our development team is looking for a data analyst, in freelance, remotely,
to work mostly on Wikidata.

The person will:

   - Work closely with product managers and UX researchers to maintain and
   improve detailed on-going analysis of the department’s products, their
   usage patterns and performance.
   - Write database queries and supporting code to analyze usage volume,
   user behaviour and performance data to identify opportunities and areas for
   improvement.
   - Collaborate with other analysts in the department to maintain our
   department’s dashboards, ensuring they are up-to-date, accurate, fair and
   focussed on representations of our product efficiency.
   - Support product managers through rapidly surfacing positive and
   adverse data trends, and complete ad hoc analysis support as needed.
   - Communicate clearly and responsively your findings to a range of
   departmental, organisational, volunteer and public stakeholders in order to
   inform and educate them.

If you want to know more and apply: https://software.wikimedia.de/
jobs/data-analyst

See also our other job offers: https://software.wikimedia.de/jobs

-- 
Léa Lacroix
Project Manager Community Communication for Wikidata

Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.

Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/029/42207.

___
Wikidata mailing list
wikid...@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Making Charts More Interactive

2016-11-16 Thread Pine W
Hi Dhaya,

Thanks for your interest in charts on Wikipedia. I am interested in this
topic too. For future reference, I believe that this thread should be on
the Multimedia mailing list instead of the Analytics mailing list. (: I'm
copying this thread to that list because the thread is likely to be of
interest to people there.

Have a look at https://lists.wikimedia.org/mailman/listinfo/multimedia

Pine


On Wed, Nov 16, 2016 at 7:17 AM, Andrew Otto  wrote:

> Also CC yurik, as he’s doing tons of awesome stuff with interactive maps
> these days.
>
>
>
> On Wed, Nov 16, 2016 at 10:06 AM, Marcel Ruiz Forns 
> wrote:
>
>> Dear Dhaya,
>>
>> Thanks for your comments!
>>
>> The general legibility of Charts in wikipedia are relatively poor.
>>> We can improve it with making them more interactive and dynamic.
>>
>>
>> I agree with you that there is room for improvement when it comes to
>> visualizations in Wikipedia.
>> Actually, "Handling wiki content beyond plaintext" (which includes
>> graphs) is one of the hot topics of the Mediawiki Developer Summit[1] in
>> January 2017.
>> Also, there's the awesome Graph Extension[2] that lets you add
>> interactive dynamic visualizations to the wiki pages.
>>
>> Cheers!
>>
>> [1] https://www.mediawiki.org/wiki/Wikimedia_Developer_Summit
>> [2 https://www.mediawiki.org/wiki/Extension:Graph]
>>
>>
>> On Wed, Nov 2, 2016 at 6:53 PM, dhayakar marur 
>> wrote:
>>
>>> Dear Analytics team,
>>>
>>> The general legibility of Charts in wikipedia are relatively poor.
>>> We can improve it with making them more interactive and dynamic.
>>> Please refer to the Chart in the attachment (Boloid Events.jpg).
>>>
>>> The chart represents the distribution of Bolide events from 1994-2013 on
>>> the world map.
>>> The legend describe the magnitude of each event in Joules.
>>> From the chart can you count the number of 10GJ Bolide events in Africa?
>>> You can count, but we take an awfully long time to find the answer.
>>>
>>> If we were to make the legend Interactive and the world map dynamic, we
>>> can improve legibility.
>>> We should making all the values (1 GJ, 10GJ etc) in the legend as
>>> clickable buttons.
>>> On clicking say 10kJ the World Map should show Boloid Events of 10GJ
>>> magnitude and remove the rest. This will make it easier to answer my
>>> earlier question.
>>>
>>> Regards
>>> Dhaya
>>>
>>> ___
>>> Analytics mailing list
>>> Analytics@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>
>>>
>>
>>
>> --
>> *Marcel Ruiz Forns*
>> Analytics Developer
>> Wikimedia Foundation
>>
>> ___
>> Analytics mailing list
>> Analytics@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] pageviews data

2016-11-16 Thread Pine W
Hi Alexander,

If you have relevant skills with developing software, and the interest and
time to help out, then WMF Project Grants might be able to provide you with
some financial support for work in this area. Have a look at
https://meta.wikimedia.org/wiki/Grants:Project



Pine


On Wed, Nov 16, 2016 at 7:24 AM, Dan Andreescu 
wrote:

> Alexander, thanks for writing.  It's possible to get data by category and
> country, though it is quite hard and limited to internal use at the
> moment.  We are working to both make it easier and available for publishing
> to the world, but there is a lot of work to be done.  We're an open source
> project, so of course you can contribute to that work, I can link you and
> others interested.  You can also apply for a research project here:
>
> https://meta.wikimedia.org/wiki/Research:New_project
>
> If you apply for a research project, you'll have to sign an NDA to get
> access to this data, and meet all the requirements of the research team.
>
> Either way, I hope that within a year or so, the kind of question you're
> asking will be possible to answer with public data.
>
> On Wed, Nov 2, 2016 at 3:22 PM, Alexander Ugarov 
> wrote:
>
>> Hi!
>>
>> I'm a Ph.D. student in economics, using some of the Wikimedia data in my
>> research. My question is whether it's possible to get the data on Wikipedia
>> pageviews by country and article category? Currently the Wikimedia
>> Foundation provides the aggregate data on pageviews by country and the less
>> aggregate data on pageviews by article, but it looks that there is no way
>> to find out, for example, the pageviews of math articles in India.
>>
>> More specifically, my questions are:
>> 1) If is it possible in some way to extract the information on pageviews
>> by country and subject area from your publicly available data? The amount
>> of data currently available is already vast, and I could miss it.
>> 2) If it is not possible, then how can I persuade you into making this
>> data available? I'm going to argue that the data can be made available
>> without losing confidentiality by using either first IP numbers or by
>> publishing only the country of the user, as well as aggregating by the
>> category.
>>
>> I'm looking forward to hear from you. I'm sure that many social
>> scientists will be also glad to use the opportunity to produce more
>> interesting and policy-relevant research.
>>
>> Best regards,
>> Alexander Ugarov,
>> Ph.D. Candidate
>> Sam M. Walton College of Business
>> Department of Economics
>> University of Arkansas
>>
>>
>>
>> ___
>> Analytics mailing list
>> Analytics@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] ensuring reader anonymity

2016-11-11 Thread Pine W
Realistically, if a government in a country that hosts one of the WMF data
centers decides that they want unfiltered access to the data, I'm not sure
how much WMF could do about it. I won't speculate on what kind of defenses
WMF might have against that scenario, but I would encourage Analytics,
Legal, and Security to have that conversation if they have not already done
so. (The US government is not the only government that might engage in this
kind of mass surveillance, and such a government may or may not use legal
means to accomplish their objectives; other options include various kinds
of phishing and social engineering attacks.)

Returning to previous discussions about limiting the number of people who
have access to raw IPs and related data, I'm thinking that I like the idea
of hashing the data and/or geolocating the data and then giving that
processed data to researchers, rather than letting researchers have the raw
data. I would be more comfortable with people who are not WMF employees and
not community checkusers having access to the processed data than to true
IP addresses, UAs, and other similar kinds of data.

Pine


On Fri, Nov 11, 2016 at 1:58 PM, C. Scott Ananian 
wrote:

> On Fri, Nov 11, 2016 at 2:16 PM, Leila Zia  wrote:
>
>> * Subpoena related concerns: the best way to handle this from the data
>> storage perspective is to not have the data at all. That is why very
>> sensitive data is purged after 60 days at the moment in webrequest logs. As
>> Nuria said, this length of time may be shortened by a little, but at least
>> because of operational constraints, we won't be able to not store this data
>> at all.
>>
>
> It is worth considering this in context of https://twitter.com/
> Pinboard/status/797167026481442816
>
> That is, not storing the data is nice, but do we have any plans in place
> in case a government decides to place a recording device in our data center
> beside our servers?  We may have the best of intentions, but "we don't
> store it" could in fact be misleading comfort if there is a third-party who
> *is* storing it.
>
> This is perhaps a broader question (and more in line with James' initial
> inquiry?), as it suggests that we reconsider what sort of protections we
> can actually provide to our editors, and make sure they know if we can't
> protect them from state-level monitoring.
>  --scott
> --
> (http://cscott.net)
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] ensuring reader anonymity

2016-11-11 Thread Pine W
>On Fri, Nov 11, 2016 at 7:36 AM, Marcel Ruiz Forns 
wrote:
>Hi Pine,


> > I thought that was specified in either the Privacy Policy or Terms of
> Use but I can't find the specific reference, and that bothers me.


>This is specified in the data retention guidelines:
>https://meta.wikimedia.org/wiki/Data_retention_guidelines


>Cheers!

Thanks. Why is that info specified in the Data retention guidelines rather
than in the Terms of Use or Privacy Policy? I worry that the retention
guidelines require a lower threshold of notice for change than the ToU or
PP, and may not have the same degree of legal assurance as the ToU and PP
that WMF will abide by the guideline. Could the Data retention guidelines
be fully incorporated into the PP and/or ToU?


On Fri, Nov 11, 2016 at 9:25 AM, Leila Zia  wrote:

> Nuria, regarding the IP addresses specifically (not the proxy, for which,
> I'll need more time to go through the use-cases we've had and see if we can
> find work-arounds if we hash proxy information):
>
> Have we considered in the past to create at least two levels of access
> when it comes to the IP addresses? From what you describe, it is clear to
> me that your team will need to have access to raw IPs for a certain period
> of time. It may be the case that no one else uses that information (for all
> of the use-cases of the research I've been involved in, hashed IP works as
> well, as long as we have geolocation available to us). By creating two
> layers of access, we can make sure that your team has access to raw IP
> while everyone else doesn't. Is this an option?
>
> And one suggestion: if we want to reconsider the way we provide access to
> IP address, I'd like to suggest that we step back and reconsider the way we
> give access to other fields in the webrequest logs as well. This will be a
> longer process, but it may be worthwhile. For example, if we decide that
> access to raw IP should be limited even further, do we want to have the
> same restrictions applied to access to UAs? It's not obvious to me that the
> answer should be no.
>
> Best,
> Leila
>
>
I'd be happy to have Legal and Analytics take a look at what could be done
to tighten the screws a bit on who has access to other data in the logs
such as UAs. (To follow up on a comment from Wikimedia-l: I'm also very
wary of letting people outside of WMF and the community have access to this
kind of information, even with a signed NDA.)

Pine
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] DizzyLogic Wiki Parser

2016-11-10 Thread Pine W
Was this something on Labs? If so, it might have been purged during one of
the Labs cleanups.

Pine


On Tue, Nov 8, 2016 at 2:33 PM, Reem Al-Kashif 
wrote:

> Hi,
>
> I'm just wondering if anybody knows what happened to DizzyLogic wiki
> parser? The website and program vanished. I used it in January 2016 so I
> know it was there at this time.
>
> Best,
> Reem
>
> --
>
> *Kind regards,Reem Al-Kashif*
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] ensuring reader anonymity

2016-11-10 Thread Pine W
By the way, to the best of my knowledge, all recordings to "permanent
media" are overwritten or destroyed after 60 days. I thought that was
specified in either the Privacy Policy or Terms of Use but I can't find the
specific reference, and that bothers me. Can someone at Legal explain why
this isn't specified in either in the PP or ToU? (Feel free to fork this
question if it becomes a distraction to the original thread.)

Thanks,

Pine


On Tue, Nov 8, 2016 at 1:26 PM, James Salsman  wrote:

> Are there any reasons to not replace HTTP GET request IP addresses and
> proxy information with their SHA-512 secure hash prior to writing them
> to permanent media?
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] [Wikimedia Cascadia] Page views on Cascadia article

2016-11-10 Thread Pine W
Hi WMF Analytics,

It would be interesting to know where all this traffic is originating. Do
we have a way of tracking back the origin of Wikipedia pageviews to
particular sites or pages?

It would also be interesting to get a geographic picture of where this
traffic is originating.

Pine

Pine


On Thu, Nov 10, 2016 at 10:45 PM, SounderBruce 
wrote:

> Since Election Day, they've increased from 900/day to over 273,000
>  wikipedia.org=all-access=user=
> latest-10=Cascadia_(independence_movement)>
> .
>
> --
> *Bruce Englehardt / SounderBruce*
> sounderbr...@gmail.com
> ___
> Wikimedia-Cascadia mailing list
> wikimedia-casca...@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikimedia-cascadia
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Tool to identify the articles versions across Wikipedia language editions?

2016-10-31 Thread Pine W
Hi Reem,

There was a presentation at WikiConference North America about a research
project that discussed a very similar topic [1] by an OCLC researcher. If
no one else on the Analytics list has suggestions, I would suggest that you
contact the Wikidata list. Many interwiki article links are now going
through Wikidata, and I would guess that the Wikidata folks have some
thoughts about this. You might also contact Diane Vizine-Goetz (her email
is shown on the WikiConference session page) to see if she has suggestions.

Pine

[1]
https://wikiconference.org/wiki/Submissions:2016/Linking_a_controlled_subject_vocabulary_to_Wikipedia

Pine


On Mon, Oct 31, 2016 at 11:09 AM, Reem Al-Kashif 
wrote:

> Hi,
>
> Hope this finds you all well. I'm wondering if there is a way/tool to
> identify the articles that exist in the one edition of Wikipedia and have
> counterparts in another. I'm also wondering if there is a way to generate a
> list of these articles' titles for certain categories.
>
> Best,
> Reem
> --
>
> *Kind regards,Reem Al-Kashif*
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


[Analytics] Fwd: [Wikimedia-l] We appear have been partially blocked in France (probably accidentally)

2016-10-17 Thread Pine W
In case you recently observed unexpected drops in Wikimedia site traffic
from France, see below.

Pine
-- Forwarded message --
From: "geni" 
Date: Oct 17, 2016 1:55 PM
Subject: [Wikimedia-l] We appear have been partially blocked in France
(probably accidentally)
To: "Wikimedia Mailing List" 
Cc:

Apparently on the orders of the french government orange added us to
their blocked terrorist sites list. This did apparently have the fun
effect of  DOS the government page people were redirected to, Source
(among others):

http://www.lemonde.fr/pixels/article/2016/10/17/une-erreur-
bloque-l-acces-a-google-pour-les-clients-d-orange_5014900_4408996.html



--
geni

___
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/
wiki/Mailing_lists/Guidelines
New messages to: wikimedi...@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,

___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Ranking Wikimedia projects by sizes or activity levels

2016-07-12 Thread Pine W
Thanks Nemo. It looks like I can get most of what I had in mind from
Wikistats.

I'm confused about https://www.wikimedia.org/ though. If that list is
sorted by pageviews, I would expect Commons to outrank Wikinews by a large
margin. Are the viewership stats shown by project somewhere? I checked
https://reportcard.wmflabs.org but that mostly focuses on Wikipedia.

Pine

On Mon, Jul 11, 2016 at 11:34 PM, Federico Leva (Nemo) 
wrote:

> http://wikistats.wmflabs.org/ has this on the main page.
>
> https://www.wikimedia.org/ is the easy way to remember the rank by
> "size", which as always is determined by how used they are.
>
> Nemo
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


[Analytics] Ranking Wikimedia projects by sizes or activity levels

2016-07-12 Thread Pine W
Hi Analytics,

Is there an easy way to rank our projects, with languages being
consolidated, by (1) size in GB, or (2) number of content pages, or (3)
number of active users in the previous month? I imagine that the ordered
list would look something like this: Commons, all Wikipedias, all
Wiktionaries, all Wikisources, Wikispecies, Wikidata, etc.

If there's an easy way to get an ordered list I'd like to include that info
in an introductory portion of my LearnWiki video tutorials, but if this
question would consume more than a few moments of staff time to research
then I can skip it. I'm thinking that the sizes of the databases would be
an easy way to measure sizes in GB, but I don't know with certainty if the
databases are consolidated or if each language has its own database.

Thanks!

Pine
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Pageview analysis graphs not loading

2016-06-23 Thread Pine W
:) I owe him a barnstar.

Pine

On Thu, Jun 23, 2016 at 12:38 AM, Ryan Kaldari <rkald...@wikimedia.org>
wrote:

> Musikanimal fixed it.
>
>
> On Jun 23, 2016, at 1:09 AM, Pine W <wiki.p...@gmail.com> wrote:
>
> Thanks Toby.
>
> Pine
> On Jun 22, 2016 16:02, "Toby Negrin" <tneg...@wikimedia.org> wrote:
>
>> ok -- I'm getting a spinning wheel of doom where the graphs used to be. I
>> suspect there's something amiss with the underlying service.
>>
>> https://phabricator.wikimedia.org/T138448
>>
>> -Toby
>>
>> On Wed, Jun 22, 2016 at 5:56 PM, Pine W <wiki.p...@gmail.com> wrote:
>>
>>> Hi Erik,
>>>
>>> Examples:
>>>
>>>
>>> https://tools.wmflabs.org/pageviews/?project=en.wikipedia.org=all-access=user=latest-20=Album
>>>
>>>
>>> https://tools.wmflabs.org/pageviews/?project=meta.wikimedia.org=all-access=user=latest-20=Main_Page
>>>
>>> Toby, thanks for the suggestion, but I tried multiple browsers with no
>>> ad blockers. The graphs still don't display.
>>>
>>> Pine
>>>
>>> ___
>>> Analytics mailing list
>>> Analytics@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>
>>>
>>
>> ___
>> Analytics mailing list
>> Analytics@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Pageview analysis graphs not loading

2016-06-22 Thread Pine W
Thanks Toby.

Pine
On Jun 22, 2016 16:02, "Toby Negrin" <tneg...@wikimedia.org> wrote:

> ok -- I'm getting a spinning wheel of doom where the graphs used to be. I
> suspect there's something amiss with the underlying service.
>
> https://phabricator.wikimedia.org/T138448
>
> -Toby
>
> On Wed, Jun 22, 2016 at 5:56 PM, Pine W <wiki.p...@gmail.com> wrote:
>
>> Hi Erik,
>>
>> Examples:
>>
>>
>> https://tools.wmflabs.org/pageviews/?project=en.wikipedia.org=all-access=user=latest-20=Album
>>
>>
>> https://tools.wmflabs.org/pageviews/?project=meta.wikimedia.org=all-access=user=latest-20=Main_Page
>>
>> Toby, thanks for the suggestion, but I tried multiple browsers with no ad
>> blockers. The graphs still don't display.
>>
>> Pine
>>
>> ___
>> Analytics mailing list
>> Analytics@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Pageview analysis graphs not loading

2016-06-22 Thread Pine W
Hi Erik,

Examples:

https://tools.wmflabs.org/pageviews/?project=en.wikipedia.org=all-access=user=latest-20=Album

https://tools.wmflabs.org/pageviews/?project=meta.wikimedia.org=all-access=user=latest-20=Main_Page

Toby, thanks for the suggestion, but I tried multiple browsers with no ad
blockers. The graphs still don't display.

Pine
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


[Analytics] Pageview analysis graphs not loading

2016-06-22 Thread Pine W
Hi folks,

I can't get pageview analysis graphs to load on 2 wikis that I've tested,
and I've tried desktop and mobile on multiple browsers. Can someone take a
look at what might need fixing?

Thanks!

Pine
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


[Analytics] Fwd: [Wiki-research-l] [ANN] Wikipedia Tools for Google Spreadsheets

2016-05-08 Thread Pine W
Forwarding because this may be of interest to Analytics subscribers as well.

Pine
-- Forwarded message --
From: "Thomas Steiner" 
Date: May 2, 2016 01:18
Subject: [Wiki-research-l] [ANN] Wikipedia Tools for Google Spreadsheets
To: "Thomas Steiner" 
Cc: "public-...@w3.org" , "Semantic Web" <
semantic-...@w3.org>, "Discussion list for the Wikidata project." <
wikid...@lists.wikimedia.org>, "Research into Wikimedia content and
communities" 

Esteemed Wikipedia, Wikidata, Linked Data, and Semantic Web communities[*],

===
tl;dr: Released a Google Spreadsheets add-on called Wikipedia Tools
[1] that makes working with data from Wikipedia and Wikidata a breeze.
===

I am happy to release a Google Spreadsheets add-on called Wikipedia
Tools [1]. This add-on allows you to work with data from Wikipedia and
Wikidata from within a spreadsheet context using custom formulas. Let
me motivate the tools with a short example:

You may have heard of Volkswagen's #DieselGate scandal. Is this still
a problem for Volkswagen—and if so, where? Google Trends to the
rescue? Maybe [2]. But what about global impact? How do people in
Korea, an important Volkswagen export market [citation needed],
refer to the scandal? Turns out they call it 폭스바겐 배기가스 조작 (among
probably other options).

With a custom function from Wikipedia Tools, we can safely "translate"
from one English (a language that, for the sake of this example, we
assume we dominate well enough) Wikipedia article to many other
languages (that we do not necessarily dominate):

=WIKITRANSLATE("en:Volkswagen_emissions_scandal")
  bg Афера на Фолксваген
  cs Dieselgate
  de VW-Abgasskandal
  […]
  zh 福斯集團汽車舞弊事件

Then, using Wikipedia page views as one (among others) reasonable
popularity indicator, for each of these language results, for example
for Korean, we can get =WIKIPAGEVIEWS("ko:폭스바겐 배기가스 조작") for the last
n days, and plot the results [3] (in practice, you would probably
still normalize by size and/or total views of the particular
Wikipedia[**]).

There are a lot more custom functions implemented than I could cover
in this short example. I have put together a slide deck [4] and paper
[5] that go into more detail if you are interested, a demo with all
functions is available at [6]. The add-on also has a built-in manual
(in Google Sheets, click Add-ons→Wikipedia Tools→Show documentation)
and its underlying code is open-source [7].

Please let me know in case of any open question, feature request, or
bug. Thanks!

Cheers,
Tom

--
[1] http://bit.ly/wikipedia-tools-add-on
[2]
http://www.google.com/trends/explore?hl=en-US=volkswagen+emissions+scandal,+dieselgate=today+12-m
[3]
https://docs.google.com/spreadsheets/d/1PyFq59iEeLWpPQrWDUyU8mlmQrb4GDv2QElmEU9aFec/edit?usp=sharing
[4] bit.ly/wikipedia-tools-slides
[5] bit.ly/wikipedia-tools-paper (PDF)
[6]
https://docs.google.com/spreadsheets/d/1sVduZul787O-bRzuy0UKpRl7bkouxwaIOsxXuJGm6yg/edit?usp=sharing
[7] https://github.com/tomayac/wikipedia-tools-for-google-spreadsheets/
[*] Cross-posted on purpose
(http://ruben.verborgh.org/blog/2014/01/31/apologies-for-cross-posting/),
please choose your reply options accordingly.
[**] This is a simple example for illustrative purposes, I do _not_
claim it is an accurate popularity prediction, nor do I mean to bash
Volkswagen.

--
Dr. Thomas Steiner, Employee (http://blog.tomayac.com,
https://twitter.com/tomayac)

Google Germany GmbH, ABC-Str. 19, 20354 Hamburg, Germany
Managing Directors: Matthew Scott Sucherman, Paul Terence Manicle
Registration office and registration number: Hamburg, HRB 86891

-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.29 (GNU/Linux)

iFy0uwAntT0bE3xtRa5AfeCheCkthAtTh3reSabiGbl0ck0fjumBl3DCharaCTersAttH3b0ttom
hTtPs://xKcd.cOm/1181/
-END PGP SIGNATURE-

___
Wiki-research-l mailing list
wiki-researc...@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Sections of Code of Conduct resolved and Code of Conduct approval process

2016-03-31 Thread Pine W
Hi Matt,
Thanks for working on this. Has the WMF Board approved this procedure for
ratifying the Code of Conduct, or is the Board planning to approve the
final document? It sounds to me like this is effectively a Terms of Use
amendment, so I would expect that it would need a similar level of legal
review and Board approval. Also, because this policy appears also to apply
to WMF staff, I hope that HR and/or Katherine are in the loop on this.

Pine

On Thu, Mar 31, 2016 at 3:16 PM, Matthew Flaschen 
wrote:

> We’ve gotten good participation as we’ve worked on sections of the Code of
> Conduct over the past few months, and have made considerable improvements
> to the draft based on your feedback.
>
> Given that, and the community approval through the discussions on each
> section, the best approach is to proceed by approving section-by-section
> until the last section is done.
>
> So, please continue to improve the Code of Conduct by participating now
> and as future sections are discussed.  When the last section is completed
> and approved on the talk page, the Code of Conduct will become policy and
> no longer be marked as a draft.
>
> Also, two more discussions regarding the Code of Conduct have been
> resolved and incorporated into the draft.
>
> * "Enforcement issues" addressed the reporting process and clarified that
> Committee decisions could not be circumvented
> * "Marginalized and underrepresented groups" forbids discrimination
>
> Thanks,
>
> Matt Flaschen
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] [wmf.webrequest data] one-time access

2016-03-22 Thread Pine W
Hi Dan,

Agreed, I think it makes sense to consider a subject-specific request for
pages that are within the scope of epidemiology, such as influenza, where
we have reason to think that there could be public health benefits in
analyzing the data and there are reasonable safeguards to protect user
anonymity.

A request for 1 month of the private data requested here, which appears to
be for all pages on all projects, is far too broadly scoped. Also, in
general, I my instinct would be to deny external requests for WMF private
data for purposes of performance testing. It seems to me that the risks far
outweigh the benefits to Wikimedia, and that processing requests like these
would be a suboptimal use of WMF staff time.

Pine

On Tue, Mar 22, 2016 at 12:44 PM, Dan Andreescu <dandree...@wikimedia.org>
wrote:

> Pine, there are actually two separate requests and they shouldn't be
> mixed.  The performance-related one is research as far as I understand, and
> the other one we have no details yet.  I welcome a public discussion of
> either, and of course would respect any opinions held by the analytics
> community at large.  We have every intention to be good stewards of this
> data and for what it's worth, I'm very skeptical of allowing access to
> private data, unless for obviously beneficial purposes like flu
> forecasting, etc.
>
> On Tue, Mar 22, 2016 at 1:37 PM, Pine W <wiki.p...@gmail.com> wrote:
>
>> I'd appreciate a clarification about the purpose of this request if
>> Wikimedia private data is involved. If I am understanding correctly, the
>> purpose of this request is for access to Wikimedia private data for
>> assistsnce with 3rd party performance testing. If that is the case, I
>> believe that the access request for private should simply be denied.
>>
>> Pine
>>
>> ___
>> Analytics mailing list
>> Analytics@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


[Analytics] Fwd: [Wikitech-l] Fwd: [Wmfall] February 2016 Lightning Talks

2016-02-16 Thread Pine W
Lightning talks start in about 5 minutes. Public link at
http://www.youtube.com/watch?v=D3fyCgBWvFc

Optional IRC participation in #wikimedia-tech. (Note, not #wikimedia-office)

Cheers,

Pine

--



Hi everyone,

Just a reminder that the February Lightning Talks
 start in
*25 minutes.*

Come join us in the 5th Floor Collab Space or follow along here:
http://www.youtube.com/watch?v=D3fyCgBWvFc

IRC: #wikimedia-tech

Hope to see you there!
Megan


On Tue, Feb 2, 2016 at 4:22 PM, Kevin Leduc  wrote:

> Hi All,
>
>
> The next Lightning Talks are scheduled for February 16th (two weeks from
> today).  We hope at least 4 people will sign up for the talks by Friday
> February 12th otherwise we will postpone them another month.  Lightning
> Talks are an opportunity for teams @ WMF & in the Community to showcase
> something they have achieved:  a quarterly goal, milestone, release, or
> anything of significance to the rest of the foundation and the movement as
> a whole.
>
>
> Each presentation will be 10 minutes or less including time for questions.
>
> Sign up here: https://www.mediawiki.org/wiki/Lightning_Talks#February_2016
>
>
> Next round of Lightning Talks:
>
> When: Tuesday February 16, 1900 UTC
> <
http://www.timeanddate.com/worldclock/fixedtime.html?msg=Lightning+Talks=20160216T19=1440=1
>,
> 11am PST (We have added this Lightning Talk to the WMF Engineering, Fun &
> Learning, and Staff calendars)
>
> Where: 5th Floor
>
> Remotees: On-Air google hangout will be provided just before the meeting
>
> IRC: #wikimedia-tech
>
> YouTube stream: http://www.youtube.com/watch?v=D3fyCgBWvFc
>
>
> Thanks!
>
> Kevin Leduc, Megan Neisler, Brendan Campbell
>
>
> ___
> Wmfall mailing list
> wmf...@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wmfall
>
>


--
Megan Neisler
Project Coordinator- Engineering
Wikimedia Foundation
mneis...@wikimedia.org 



--
Megan Neisler
Project Coordinator- Engineering
Wikimedia Foundation
mneis...@wikimedia.org 
___
Wikitech-l mailing list
wikitec...@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


[Analytics] Fwd: [Wikitech-l] Fwd: February 2016 Lightning Talks

2016-02-15 Thread Pine W
(cross-posting)

Reminder that these lightning talks are happening tomorrow, Tuesday
February 16, at 1900 UTC / 11:00 AM Pacific. Because there are 3 presenters
and a 1-hour block of time, each presenter has about 15 minutes including
time for questions. We might finish early.

On the agenda:

* Pine: "LearnWIki" Instructional video series on Wikipedia mechanics
(Including VE and citoid) and community practices
<https://meta.wikimedia.org/wiki/Grants:IEG/Motivational_and_educational_video_to_introduce_Wikimedia/January_message_to_village_pumps>

* Madhu Viswanathan: "Counting unique devices accessing Wikipedia projects
using Last access method"

* Rosemary Rein: "Program Capacity and Learning-Building a Roadmap Together"
<https://commons.wikimedia.org/wiki/File:Program_Capacity_and_Learning_Roadmap_-_Office_Hours_Feb_16.pdf>

Hope to see you there!

Pine




On Tue, Feb 2, 2016 at 5:47 PM, Kevin Leduc <ke...@wikimedia.org> wrote:

> Thanks for forwarding Pine!  I welcome any 10 minute talks from GLAM and
> Education as well.  If you add your name to the list [1], email me as well
> so I can contact you and forward notes for Lightning Talk speakers.
>
> [1] https://www.mediawiki.org/wiki/Lightning_Talks#February_2016
>
> On Tue, Feb 2, 2016 at 4:59 PM, Pine W <wiki.p...@gmail.com> wrote:
>
>> Boldly forwarding* in case others would like to view or present a
>> lightning talk. I plan to give a lightning talk about the video series
>> <https://meta.wikimedia.org/wiki/Grants:IEG/Motivational_and_educational_video_to_introduce_Wikimedia/January_message_to_village_pumps>
>> which I'm in the process of producing with the support of an individual
>> engagement grant.
>>
>> Although these talks can be about technical topics like video formats, I
>> think that there are education and GLAM activities that could fit under the
>> umbrella as well, especially if they have technical or research aspects.
>> For example, I'll probably focus much of my presentation on my background
>> research and project design process.
>>
>> Hope to see you there!
>> Pine
>>
>> * To boldly forward where no one has forwarded before
>>
>> -- Forwarded message --
>> From: Kevin Leduc <ke...@wikimedia.org>
>> Date: Tue, Feb 2, 2016 at 4:23 PM
>> Subject: [Wikitech-l] Fwd: February 2016 Lightning Talks
>> To: Wikimedia developers <wikitec...@lists.wikimedia.org>
>>
>>
>> -- Forwarded message --
>> From: Kevin Leduc <ke...@wikimedia.org>
>> Date: Tue, Feb 2, 2016 at 4:22 PM
>> Subject: February 2016 Lightning Talks
>> To: "Staff (All)" <wmf...@lists.wikimedia.org>
>>
>>
>> Hi All,
>>
>>
>> The next Lightning Talks are scheduled for February 16th (two weeks from
>> today).  We hope at least 4 people will sign up for the talks by Friday
>> February 12th otherwise we will postpone them another month.  Lightning
>> Talks are an opportunity for teams @ WMF & in the Community to showcase
>> something they have achieved:  a quarterly goal, milestone, release, or
>> anything of significance to the rest of the foundation and the movement as
>> a whole.
>>
>>
>> Each presentation will be 10 minutes or less including time for questions.
>>
>> Sign up here:
>> https://www.mediawiki.org/wiki/Lightning_Talks#February_2016
>>
>>
>> Next round of Lightning Talks:
>>
>> When: Tuesday February 16, 1900 UTC
>> <
>> http://www.timeanddate.com/worldclock/fixedtime.html?msg=Lightning+Talks=20160216T19=1440=1
>> >,
>> 11am PST (We have added this Lightning Talk to the WMF Engineering, Fun &
>> Learning, and Staff calendars)
>>
>> Where: 5th Floor
>>
>> Remotees: On-Air google hangout will be provided just before the meeting
>>
>> IRC: #wikimedia-tech
>>
>> YouTube stream: http://www.youtube.com/watch?v=D3fyCgBWvFc
>>
>>
>> Thanks!
>>
>> Kevin Leduc, Megan Neisler, Brendan Campbell
>> ___
>> Wikitech-l mailing list
>> wikitec...@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>
>>
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


[Analytics] Fwd: [Wikitech-l] Tech Talk: A Hands-on Estimation Exercise, With Discussion: Feb 8th

2016-02-06 Thread Pine W
Forwarding.

Pine

-- Forwarded message --
From: Rachel Farrand 
Date: Fri, Feb 5, 2016 at 4:59 PM
Subject: [Wikitech-l] Tech Talk: A Hands-on Estimation Exercise, With
Discussion: Feb 8th
To: Wikimedia developers 


Please join for the following tech talk:

*Tech Talk**:* A Hands-on Estimation Exercise, With Discussion
*Presenter:* Joel Aufrecht
*Date:* February 8th, 2016
*Time: *18:30 UTC
<
http://www.timeanddate.com/worldclock/fixedtime.html?msg=Tech+Talk%3A+A+Hands-on+Estimation+Exercise%2C+With+Discussion=20160208T1830=%3A=1
>
Link to live YouTube stream 
*IRC channel for questions/discussion:* #wikimedia-office

*Summary: *Estimation is an unnatural activity for human brains, which tend
to hide our own ignorance from us.  This brown-bag begins with an exercise,
adapted from Steve McConnell's software estimation training, in balancing
accuracy with precision.  The exercise is fully available to remotees.
Facilitated discussion follows, on what we can learn from the exercise and
on general estimation and forecasting topics as raised.
___
Wikitech-l mailing list
wikitec...@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Pageview stats tools

2016-02-03 Thread Pine W
Thanks for the update! Would it be possible for the students to publish
their PPD on Commons under a CC-BY or CC-BY-SA license?

Looking forward to seeing the mockups,

Pine

On Wed, Feb 3, 2016 at 12:54 AM, Jan Ainali <j...@aina.li> wrote:

> A brief update on the students work. They have just finished a Project
> Planning Document (PPD), which is a formal part of their course (20 pages
> long). This week they are working on mockups, which hopefully will be ready
> on Friday and shared for feedback.
>
> Med vänliga hälsningar
> Jan Ainali
> http://ainali.com
>
> 2016-02-03 8:57 GMT+01:00 Pine W <wiki.p...@gmail.com>:
>
>> https://phabricator.wikimedia.org/T120497. It's great to see the amount
>> of interest in this!
>>
>> Pine
>>
>> On Tue, Feb 2, 2016 at 11:43 PM, Quim Gil <q...@wikimedia.org> wrote:
>>
>>>
>>>
>>> On Tue, Feb 2, 2016 at 11:14 PM, Dan Andreescu <dandree...@wikimedia.org
>>> > wrote:
>>>>
>>>> Yes there is, a group of students from Sweden are working on the first
>>>> attempt.
>>>>
>>>
>>> Is there a URL to learn more (i.e. a Phabricator task)? This is
>>> interesting news, and we might want to advertize.
>>>
>>> --
>>> Quim Gil
>>> Engineering Community Manager @ Wikimedia Foundation
>>> http://www.mediawiki.org/wiki/User:Qgil
>>>
>>> ___
>>> Analytics mailing list
>>> Analytics@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>
>>>
>>
>> ___
>> Analytics mailing list
>> Analytics@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Pageview stats tools

2016-02-02 Thread Pine W
https://phabricator.wikimedia.org/T120497. It's great to see the amount of
interest in this!

Pine

On Tue, Feb 2, 2016 at 11:43 PM, Quim Gil  wrote:

>
>
> On Tue, Feb 2, 2016 at 11:14 PM, Dan Andreescu 
> wrote:
>>
>> Yes there is, a group of students from Sweden are working on the first
>> attempt.
>>
>
> Is there a URL to learn more (i.e. a Phabricator task)? This is
> interesting news, and we might want to advertize.
>
> --
> Quim Gil
> Engineering Community Manager @ Wikimedia Foundation
> http://www.mediawiki.org/wiki/User:Qgil
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Pageview stats tools

2016-02-01 Thread Pine W
Cool, thank you Nemo.

Pine

On Sun, Jan 31, 2016 at 12:50 AM, Federico Leva (Nemo) <nemow...@gmail.com>
wrote:

> Pine W, 31/01/2016 09:07:
>
>> Apologizes if this information was already published and I missed it.
>>
>
> https://phabricator.wikimedia.org/T120497
> https://phabricator.wikimedia.org/T43327
>
> Nemo
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


[Analytics] Pageview stats tools

2016-01-31 Thread Pine W
Hi Analytics folks,

My understanding is that the new pageview definition, which excludes
automata to a certain extent, is now published. I have a few questions:

1. Is stats.grok.se already transitioned to the new definition, or will it?

2. Is there a replacement for stats.grok.se planned or already available? A
reliable substitute would be great, and it would be nice if we could either
replace the existing on-wiki "page view statistics" link or add a
supplemental link to the new resource.

Apologizes if this information was already published and I missed it.

Thanks,

Pine
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] How many times has a video been played?

2016-01-14 Thread Pine W
Making sure that I'm understanding this correctly: if I use
https://github.com/hay/wiki-tools/blob/master/etc/mediacounts-stats.py:

1. Does the data reflect views through the media players on Wikipedias and
other non-Commons sites?
2. Does the data reflect the number of views *and* downloads in all image
sizes and formats?
2. Is the transfer count information available indefinitely, or only for 90
days?

Pine
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


[Analytics] https://reportcard.wmflabs.org questions

2015-12-27 Thread Pine W
Hi Analytics,

The report card site's most recent data for the unique visitors stats is
from May 2015. Will this be updated in the future?

Also, the information shown on the "New Editors Per Month for All Wikimedia
Projects" chart goes back only to late 2012. Is there a way to get the data
for that chart all the way back to 2001? I can pull the tables for all
Wikipedias back to 2001 from the report cards site, but I can't pull the
tables for all Wikimedia projects back to 2001 AFAIK.

Thanks!

Pine
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


[Analytics] How many times has a video been played?

2015-12-14 Thread Pine W
Hi Analytics,

How do I determine how many times this video

has been played in the last 90 days?

Thanks,

Pine
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] How many times has a video been played?

2015-12-14 Thread Pine W
Hi Dan,

I have a Labs account which I've barely used. Is access to the cluster a
separate step from having access to Labs?

Also, is there a "how to" guide somewhere for how to query the cluster?

Thanks,
Pine

On Mon, Dec 14, 2015 at 2:11 PM, Dan Andreescu <dandree...@wikimedia.org>
wrote:

> Pine, right now you can either query Hive if you have access to the
> cluster, or you can download the days you're interested from here:
> http://dumps.wikimedia.org/other/mediacounts/daily/2015/ and crunch the
> numbers for the articles you're interested in (not too bad)
>
> On Mon, Dec 14, 2015 at 5:01 PM, Pine W <wiki.p...@gmail.com> wrote:
>
>> Hi Analytics,
>>
>> How do I determine how many times this video
>> <https://commons.wikimedia.org/wiki/File:Wikipedia_5_million_articles_milestone_video_November_2015.ogv>
>> has been played in the last 90 days?
>>
>> Thanks,
>>
>> Pine
>>
>> ___
>> Analytics mailing list
>> Analytics@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] On toxic communities

2015-11-13 Thread Pine W
We're discussing this on the Research mailing list, among others. (:
Pine

On Fri, Nov 13, 2015 at 2:12 PM, Denny Vrandečić 
wrote:

> Very interesting read (via Brandon Harris):
>
>
> http://recode.net/2015/07/07/doing-something-about-the-impossible-problem-of-abuse-in-online-games/
>
> "the vast majority of negative behavior ... did not originate from the
> persistently negative online citizens; in fact, 87 percent of online
> toxicity came from the neutral and positive citizens just having a bad day
> here or there."
>
> "... incidences of homophobia, sexism and racism ... have fallen to a
> combined 2 percent of all games. Verbal abuse has dropped by more than 40
> percent, and 91.6 percent of negative players change their act and never
> commit another offense after just one reported penalty."
>
> I have plenty of ideas how to apply this to Wikipedia, but I am sure Dario
> and his team as well :) - and some opportunity for the communities to use
> such results.
>
> Cheers,
> Denny
>
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Special:Log/move and Special:NewPages

2015-10-30 Thread Pine W
Recent conversations on this mailing list are leading me to a new
definition of "fuzzy math". (: Thanks Nemo.

Pine

On Fri, Oct 30, 2015 at 12:49 AM, Federico Leva (Nemo) 
wrote:

> https://meta.wikimedia.org/wiki/Article_counts_revisited
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] stats.wikimedia.org typo

2015-10-29 Thread Pine W
TY (:

On Wed, Oct 28, 2015 at 5:05 PM, Erik Zachte <ezac...@wikimedia.org> wrote:

> Fixed
>
>
>
> Erik
>
>
>
> *From:* Analytics [mailto:analytics-boun...@lists.wikimedia.org] *On
> Behalf Of *Pine W
> *Sent:* Wednesday, October 28, 2015 23:27
> *To:* A mailing list for the Analytics Team at WMF and everybody who has
> an interest in Wikipedia and analytics.
> *Subject:* [Analytics] stats.wikimedia.org typo
>
>
>
> Is it possible to fix the "1000,000" here that should be "1,000,000"?
> https://stats.wikimedia.org/EN/PlotPageviewsEN.png
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


[Analytics] Special:Log/move and Special:NewPages

2015-10-29 Thread Pine W
 Antony–22 raised a question about accounting for "new articles" that are
moved from other namespaces to article space. For the purposes of counting
total articles, I'm guessing that these are properly accounted for as
deltas to the total, even if they're not considered new articles for the
purpose of NPP under Special:NewPages. Is that correct?

Thanks,
Pine
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


[Analytics] How is "article" defined in Special:Statistics?

2015-10-28 Thread Pine W
When I go to https://en.wikipedia.org/wiki/Special:Statistics, clicking on
"content pages" takes me to a list of pages that are included in
"(Article)" namespace. But this includes redirects, which is surprising to
me because the Special:Statistics page implies to me that redirects are
included in "Pages (All pages in the wiki, including talk pages, redirects,
etc.)" which is a separate link. So, is there a way to verify that what
Special:Statistics is showing for "content pages" actually excludes the
redirects that are shown in "(Article)" namespace?

I also have a question about disambiguation pages. When I go to
https://en.wikipedia.org/wiki/Special:AllPages, select "(Article)", and
select "hide redirects", disambiguation pages like
https://en.wikipedia.org/wiki/!!,
https://en.wikipedia.org/wiki/Panda_(disambiguation),
https://en.wikipedia.org/wiki/Teel and https://en.wikipedia.org/wiki/Parsons
are all still appearing in that list of pages. Should we be counting
disambiguation pages as "articles"? I suppose it makes sense to think of
editing disambiguation pages as editing in content space, but I'm a little
hesitant to count them as articles for the purposes of the 5,000,000
milestone. Curious to hear what others think about whether disambiguation
pages should be counted as articles for this purpose.

Thanks,

Pine
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Pageview_hourly dataset. Preventing Identity reconstruction

2015-09-28 Thread Pine W
Hi Nuria,

OK, so the useragent data for edits is stored in a different database, is
heavily sampled when used for research, and will still be accessible for CU
use if user_agent_map  is removed from the pageview_hourly data, right?

On Mon, Sep 28, 2015 at 10:48 AM, Nuria Ruiz <nu...@wikimedia.org> wrote:

> Pine:
>
> The pageview_hourly dataset on hive contains pageviews, not edits.
>
> The majority of data for edits is not associated to a user-agent as it is
> stored on mediawiki database. Some of it comes via Eventlogging as
> experiments are run in, for example, visual editor. This second venue of
> data is of a very different nature than the one we just run this test on,
> it is heavily sampled, not public, and will be purged every 90 days.
>
> https://wikitech.wikimedia.org/wiki/Analytics/EventLogging#Data_retention_and_auto-purging
>
>
> Thanks,
>
> Nuria
>
>
>
>
>
>
>
>
>
>
>
>
> On Mon, Sep 28, 2015 at 7:23 AM, Pine W <wiki.p...@gmail.com> wrote:
>
>> Hi Nuria,
>>
>> Thanks for wirking on this.
>>
>> Removing user_agent_map would be only for readership data, correct? Would
>> this data still be stored for edits, and if so, for how long?
>>
>> Pine
>> On Sep 28, 2015 7:16 AM, "Nuria Ruiz" <nu...@wikimedia.org> wrote:
>>
>>> Hello,
>>>
>>> We have been working on the exercise of reconstructing an identity using
>>> the (still private) pageview_hourly dataset (
>>> https://wikitech.wikimedia.org/wiki/Analytics/Data/Pageview_hourly)
>>>
>>> TL;DR
>>> It is possible (and easy) to do that with the fields the dataset has
>>> now, before releasing it publicly we need to further anonymize it.
>>>
>>> More info here:
>>>
>>> https://wikitech.wikimedia.org/wiki/Analytics/Data/PreventingIdentityReconstruction
>>>
>>> Thanks,
>>>
>>> Nuria
>>>
>>> ___
>>> Analytics mailing list
>>> Analytics@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>
>>>
>> ___
>> Analytics mailing list
>> Analytics@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Pageview_hourly dataset. Preventing Identity reconstruction

2015-09-28 Thread Pine W
Hi Nuria,

Thanks for wirking on this.

Removing user_agent_map would be only for readership data, correct? Would
this data still be stored for edits, and if so, for how long?

Pine
On Sep 28, 2015 7:16 AM, "Nuria Ruiz"  wrote:

> Hello,
>
> We have been working on the exercise of reconstructing an identity using
> the (still private) pageview_hourly dataset (
> https://wikitech.wikimedia.org/wiki/Analytics/Data/Pageview_hourly)
>
> TL;DR
> It is possible (and easy) to do that with the fields the dataset has now,
> before releasing it publicly we need to further anonymize it.
>
> More info here:
>
> https://wikitech.wikimedia.org/wiki/Analytics/Data/PreventingIdentityReconstruction
>
> Thanks,
>
> Nuria
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] User statistics for video marking ENWP 5m article milestone

2015-09-17 Thread Pine W
Aha, that is important for me to know. Thanks Andrew.

Pine


On Thu, Sep 17, 2015 at 11:07 AM, Andrew Gray 
wrote:

> On 11 September 2015 at 19:19, James Forrester 
> wrote:
>
> >> Does it include editors on all Wikimedia projects
> >
> > No.
> >
> >> or just those who have registered and/or edited on ENWP?
> >
> > Registered, regardless of having edited.
>
> James is of course correct, but one small caveat worth adding: because
> of SUL, a substantial proportion of these will be "autocreated"
> accounts from other projects - so even 'registration' may not mean
> what it seems.
>
> --
> - Andrew Gray
>   andrew.g...@dunelm.org.uk
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] User statistics for video marking ENWP 5m article milestone

2015-09-12 Thread Pine W
Aha, I just figured it out. The two pages are using very different
definitions for "active editors".
https://en.wikipedia.org/wiki/Special:Statistics refers to anyone who has
made a *single* edit in the last 30 days as an "active editor", while
https://stats.wikimedia.org/EN/TablesWikipediaEN.htm refers to edits that
have made *5 or more* edits in the past month as active. This mix of
terminology is confusing. I think that the definition on Special:Statistics
makes more sense for "active editors" than the >=5 definition than is
commonly used in discussions on mailing lists. Can anyone suggest a better
set of terminology to distinguish the >=1 "active editors" from the >=5
"active editors"?

Pine


On Sat, Sep 12, 2015 at 2:21 PM, Pine W <wiki.p...@gmail.com> wrote:

> Next question: https://en.wikipedia.org/wiki/Special:Statistics shows
> that ENWP alone has had 123,512 active editors (5 or more actions) in the
> last 30 days. But https://reportcard.wmflabs.org/ shows that for June
> 2015 (the latest data available there), there were only 31k active editors
> on ENWP and 77k active editors for all projects combined. 
> https://stats.wikimedia.org/EN/TablesWikipediaEN.htm
> seems consistent with the latter, showing that for August 2015 there were
> 30,789 active editors. Is there an explanation for the large difference
> between the 123,512 active editors shown on
> https://en.wikipedia.org/wiki/Special:Statistics, and the 30,789 active
> editors shown on https://stats.wikimedia.org/EN/TablesWikipediaEN.htm?
>
> Thanks,
>
> Pine
>
>
> On Fri, Sep 11, 2015 at 11:29 AM, Pine W <wiki.p...@gmail.com> wrote:
>
>> Thanks!
>> Pine
>> On Sep 11, 2015 11:20 AM, "James Forrester" <jforres...@wikimedia.org>
>> wrote:
>>
>>> On 11 September 2015 at 11:13, Pine W <wiki.p...@gmail.com> wrote:
>>>
>>>> Hi Analytics,
>>>>
>>>> On ENWP, does the number of 26,163,773 users
>>>>
>>> ​You mean https://en.wikipedia.org/wiki/Special:Statistics "Registered
>>> users"? Assuming yes…​
>>>
>>>> include IPs who have made edits?
>>>>
>>> ​No.
>>>
>>>> Does it include editors on all Wikimedia projects
>>>>
>>> ​No.​
>>>
>>>
>>>> or just those who have registered and/or edited on ENWP?
>>>>
>>> ​Registered, regardless of having edited.
>>>
>>> J.
>>> --
>>> James D. Forrester
>>> Lead Product Manager, Editing
>>> Wikimedia Foundation, Inc.
>>>
>>> jforres...@wikimedia.org | @jdforrester
>>>
>>> ___
>>> Analytics mailing list
>>> Analytics@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>
>>>
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] User statistics for video marking ENWP 5m article milestone

2015-09-12 Thread Pine W
Hi Erik,

How about dropping the terms "active editor" and "very active editor"
entirely, and instead using C1, C5, C100, etc with C(X) being the number of
contribs in a given period of time (last 30 days or last month, most
likely)?

Another alternative is to change the terminology or measures on
Special:Statistics to align with stats.wikimedia.org.

Thoughts?

Pine
On Sep 12, 2015 4:25 PM, "Erik Zachte" <ezac...@wikimedia.org> wrote:

> Hi Pine,
>
>
>
> > I think that the definition on Special:Statistics makes more sense for
> "active editors" than the >=5 definition than is commonly used in
> discussions on mailing lists.
>
>
>
> tl;dr 'active editor'  is a term with a long history. If we recoin that
> term and keep informing the public how many active editors we counted we
> will make our public stats more vain and empty.
>
>
>
> Long version:
>
>
>
> This is a recurring discussion, with minor variations.
>
>
>
> In my personal opinion our movement has a tendency to publish too extreme
> numbers already, however bloated, as if our more substantial achievements
> aren't awe-inspiring enough.
>
> (examples are 'Wikipedias in 280 languages', '800 wikis', not to mention
> our extreme 'article' counts)
>
> As long as we keep these extreme counts with little substance for
> ourselves I wouldn't care much about terminology, but we tend not to keep
> these for ourselves.
>
>
>
> Can I illustrate my point by reductio ad absurdum (sort of)?
>
> Would you call a person who jots his name on a paycheck once a month and
> writes nothing else a writer?
>
> Would you call a person who climbs three steps to enter a bus a climber?
>
> Are you a reader if you glance at a glossy's cover once at your local
> barber?
>
>
>
> A person with one edit in one particular month and maybe none in the rest
> of the year to me is not much of an editor really.
>
> It's one more person who knows of Wikipedia (we have 500+ million of
> those) and found the edit and submit buttons and tried those, to see what
> happens.
>
> Now if that person likes what happened and wants to do it again we are on
> to something.
>
> The threshold of edits a person should reach before we can infer intention
> and motivation is of course arbitrary, but clearly more than one in my view.
>
>
>
> I'm not saying we shouldn’t count one-off's. If people get deterred by one
> problematic edit that is hugely relevant. And the enormous gap between 1+
> and 3+ edits is of course a major concern.
>
> I would just prefer a different term rather than 'active editor', which is
> what you suggest to adopt.
>
>
>
> Cheers,
>
>
>
> Erik
>
>
>
> *From:* analytics-boun...@lists.wikimedia.org [mailto:
> analytics-boun...@lists.wikimedia.org] *On Behalf Of *Pine W
> *Sent:* Saturday, September 12, 2015 23:29
> *To:* A mailing list for the Analytics Team at WMF and everybody who has
> an interest in Wikipedia and analytics.
> *Subject:* Re: [Analytics] User statistics for video marking ENWP 5m
> article milestone
>
>
>
> Aha, I just figured it out. The two pages are using very different
> definitions for "active editors".
> https://en.wikipedia.org/wiki/Special:Statistics refers to anyone who has
> made a *single* edit in the last 30 days as an "active editor", while
> https://stats.wikimedia.org/EN/TablesWikipediaEN.htm refers to edits that
> have made *5 or more* edits in the past month as active. This mix of
> terminology is confusing. I think that the definition on Special:Statistics
> makes more sense for "active editors" than the >=5 definition than is
> commonly used in discussions on mailing lists. Can anyone suggest a better
> set of terminology to distinguish the >=1 "active editors" from the >=5
> "active editors"?
>
>
> Pine
>
>
>
>
>
> On Sat, Sep 12, 2015 at 2:21 PM, Pine W <wiki.p...@gmail.com> wrote:
>
> Next question: https://en.wikipedia.org/wiki/Special:Statistics shows
> that ENWP alone has had 123,512 active editors (5 or more actions) in the
> last 30 days. But https://reportcard.wmflabs.org/ shows that for June
> 2015 (the latest data available there), there were only 31k active editors
> on ENWP and 77k active editors for all projects combined. 
> https://stats.wikimedia.org/EN/TablesWikipediaEN.htm
> seems consistent with the latter, showing that for August 2015 there were
> 30,789 active editors. Is there an explanation for the large difference
> between the 123,512 active editors shown on
> https://en.wikipedia.org/wiki/Special:Statistics, and the 30,789 active
> editors s

Re: [Analytics] User statistics for video marking ENWP 5m article milestone

2015-09-11 Thread Pine W
Thanks!
Pine
On Sep 11, 2015 11:20 AM, "James Forrester" <jforres...@wikimedia.org>
wrote:

> On 11 September 2015 at 11:13, Pine W <wiki.p...@gmail.com> wrote:
>
>> Hi Analytics,
>>
>> On ENWP, does the number of 26,163,773 users
>>
> ​You mean https://en.wikipedia.org/wiki/Special:Statistics "Registered
> users"? Assuming yes…​
>
>> include IPs who have made edits?
>>
> ​No.
>
>> Does it include editors on all Wikimedia projects
>>
> ​No.​
>
>
>> or just those who have registered and/or edited on ENWP?
>>
> ​Registered, regardless of having edited.
>
> J.
> --
> James D. Forrester
> Lead Product Manager, Editing
> Wikimedia Foundation, Inc.
>
> jforres...@wikimedia.org | @jdforrester
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Editor population stats for August

2015-09-08 Thread Pine W
Follow up question: are the stats for earlier months revised downward as
articles are deleted? If so, have the July and earlier stats been updated?

Pine
On Sep 8, 2015 9:35 AM, "Pine W" <wiki.p...@gmail.com> wrote:

> Great. Thanks for getting this done so fast. WSC and Ed are likely
> interested in this data also. I will update my graphs that I posted to
> Research-l with the new data.
>
> Pine
>
> Pine
> On Sep 8, 2015 6:53 AM, "Erik Zachte" <ezac...@wikimedia.org> wrote:
>
>> Dump based Wikistats reports for August are now online, even earlier than
>> expected.
>>
>>
>>
>> Observations regarding Wikipedia editor stats:
>>
>>
>>
>> Number of active editors for all Wikipedias combined for August 2015,
>> deduplicated [1]
>>
>> 5+ edits:   74884, MoM -0.7%, YoY -1.8%,   PoM 78.5% (max
>> April 2007)
>>
>> 100+ edits:  12775, MoM +2.8%, YoY +9.2% PoM 99.3% (max Feb 2010)
>>
>>
>>
>> MoM is month over month
>>
>> YoY is year over year
>>
>> PoM is percentage of max value ever
>>
>>
>>
>> [1]
>> http://stats.wikimedia.org/EN/TablesWikimediaAllProjects_AllMonths.htm
>>
>> [2] for PoM only first 28 days per month are taken into account, as a
>> rough 'normalization' (in other words to give February an equal opportunity
>> to rise to the top, as it did for 100+)
>>
>>
>>
>> See also
>>
>> https://stats.wikimedia.org/EN/TablesWikipediansEditsGt5.htm
>>
>> https://stats.wikimedia.org/EN/TablesWikipediansEditsGt100.htm
>>
>> (note here first column is not deduplicated like above, pls ignore)
>>
>>
>>
>> for charts see
>>
>> http://stats.wikimedia.org/EN/ReportCardTopWikis.htm
>>
>>
>>
>> Cheers,
>>
>> Erik
>>
>>
>>
>>
>>
>> *From:* Erik Zachte [mailto:ezac...@wikimedia.org]
>> *Sent:* Tuesday, September 01, 2015 13:59
>> *To:* 'A mailing list for the Analytics Team at WMF and everybody who
>> has an interest in Wikipedia and analytics.'
>> *Subject:* RE: [Analytics] Editor population stats for August
>>
>>
>>
>> Hi Pine,
>>
>>
>>
>> Expect Wikistats reports mid September.
>>
>>
>>
>> Since a few months the stub dumps are produced separately, which speeded
>> up the process considerably.
>>
>> I expect all stub dumps are done around 8th/9th of the month.
>>
>> It takes up to week after that to produce the counts and reports.
>>
>>
>>
>> BTW last month it took longer, as part of the process had to be rerun.
>>
>>
>>
>> Cheers,
>>
>> Erik
>>
>>
>>
>> *From:* analytics-boun...@lists.wikimedia.org [mailto:
>> analytics-boun...@lists.wikimedia.org] *On Behalf Of *Pine W
>> *Sent:* Monday, August 31, 2015 19:41
>> *To:* A mailing list for the Analytics Team at WMF and everybody who has
>> an interest in Wikipedia and analytics.
>> *Subject:* [Analytics] Editor population stats for August
>>
>>
>>
>> A number of us are discussing the year to date editor population stats.
>> When can we anticipate seeing the August stats? It would be helpful to have
>> them be published at least a week before the publication of the monthly
>> Recent Research report for September.
>>
>> Thanks,
>> Pine
>>
>> ___
>> Analytics mailing list
>> Analytics@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


[Analytics] Editor population stats for August

2015-08-31 Thread Pine W
A number of us are discussing the year to date editor population stats.
When can we anticipate seeing the August stats? It would be helpful to have
them be published at least a week before the publication of the monthly
Recent Research report for September.

Thanks,
Pine
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Webrequest loss on 08-03 and 08-10

2015-08-26 Thread Pine W
Thanks for reporting this.

Pine
On Aug 26, 2015 1:27 PM, Andrew Otto ao...@wikimedia.org wrote:

 Hi all,

 Now that we’ve had a little space to analyze the problem, I wanted to call
 out a recent webrequest data loss issue that we experienced on two separate
 occasions.

 We attempted to upgrade to Kafka 0.8.2.1, and it wasn’t until the second
 attempt that we actually found the problem.  Kafka 0.8.2.1 ships with a
 buggy version of Snappy[1] that causes messages to not be compressed
 properly.  This caused a ~4x increase network and disk I/O around the
 cluster all at once.

 We’ve documented the incidents and the occasions of significant data loss
 here:

 https://wikitech.wikimedia.org/wiki/Incident_documentation/20150803-Kafka


 https://wikitech.wikimedia.org/wiki/Incident_documentation/20150810-Kafka#Conclusions

 https://wikitech.wikimedia.org/wiki/Analytics/Data/Webrequest

 This loss will affect the output of pagecount* and pageview datasets, as
 well as other webrequest generated statistics.  Please consider statistics
 that are generated from webrequest data using the following UTC hours
 unreliable:

   2015-08-03T18:00 - 2015-08-03T23:00
   2015-08-10T15:00 - 2015-08-10T21:00
   2015-08-11T17:00 - 2015-08-11T18:00

 Many apologies for any inconvenience this causes.  We’ve learned a lot
 during this turmoil, and have a lot of ideas on how to hopefully prevent
 this from happening in the future, and also how to reduce loss and
 complexity if and when it does.  The analytics engineering team will be
 doing a post mortem on this soon, in which we will document these ideas.

 Thanks,
 -Andrew Otto

 [1] https://issues.apache.org/jira/browse/KAFKA-2189


 ___
 Analytics mailing list
 Analytics@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics


___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


[Analytics] Wikipedia live monitor for identifying breaking news on Wikipedia

2015-07-19 Thread Pine W
The Wikipedia live monitor tool was designed by Thomas Steiner
http://research.google.com/pubs/author39477.html, an engineer of Google
Germany, with the intention of identifying breaking news stories. This tool
was mentioned in the *Signpost *in 2013
https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2013-04-22/In_the_media
but this is the first that I can recall seeing it, and it's not in my
mailing list archives so I'm forwarding the links in case list subscribers
are interested.

https://wikipedia-live-monitor.herokuapp.com/

https://twitter.com/wikilivemon

Pine
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Request for three viewership statistics

2015-07-07 Thread Pine W
Interesting, thanks Nemo.

Any ideas about how to check how many people saw the watchlist geonotice?

Pine
On Jul 7, 2015 12:52 AM, Federico Leva (Nemo) nemow...@gmail.com wrote:

 Pine W, 07/07/2015 02:29:

 (2) During the past 90 days or so, how many unique users have viewed

 https://en.wikipedia.org/wiki/File:Cascadiawikimedians_transparent_Gill_Sans_155px_high.png
 on the various Wikimedia pages where it's included?

 (2) During the past 90 days or so, how many times has

 https://en.wikipedia.org/wiki/File:Cascadiawikimedians_transparent_Gill_Sans_155px_high.png
 been viewed on the various Wikimedia pages where it's included?


 https://github.com/hay/wiki-tools/blob/master/etc/mediacounts-stats.py

 Nemo

 ___
 Analytics mailing list
 Analytics@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics

___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


[Analytics] Request for three viewership statistics

2015-07-06 Thread Pine W
Hi WMF Analytics,

We have a request at Cascadia Wikimedians User Group. Can you determine:

(1) How many unique users saw this geonotice:
https://en.wikipedia.org/wiki/Wikipedia:Geonotice#Seattle_Wiki-picnic_2015?

(2) During the past 90 days or so, how many unique users have viewed
https://en.wikipedia.org/wiki/File:Cascadiawikimedians_transparent_Gill_Sans_155px_high.png
on the various Wikimedia pages where it's included?

(2) During the past 90 days or so, how many times has
https://en.wikipedia.org/wiki/File:Cascadiawikimedians_transparent_Gill_Sans_155px_high.png
been viewed on the various Wikimedia pages where it's included?

Thanks,

Pine
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


[Analytics] Upcoming in October: Seattle GNU/Linux conference

2015-07-02 Thread Pine W
The 2015 Seattle GNU/Linux conference (SeaGL) call for proposals is now
open. I've submitted a proposal for a 50-minute workshop regarding how to
edit Wikipedia, and may submit more proposals in collaboration with my
colleagues in Cascadia Wikimedians. I'm sure that SeaGL would be happy to
consider technical presentations about MediaWiki, Wikimedia analytics,
Wikimedia Labs, our upstream tools, etc. if you're interested. Please
consider submitting a proposal and coming to Seattle for the conference.

http://seagl.org/

Regards,

Pine
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] If it didn't happen in HDFS, it didn't happen

2015-06-10 Thread Pine W
Question about the budget this year has ensured, at least for Discovery,
that ops and hardware support are slashed to the bone. I'm trying to
figure out the paradox of hiring more peope for Discovery at the same time
that ops and hardware support are reduced. Can someone explain?

Thanks,
Pine
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] clicks on red links

2015-05-22 Thread Pine W
It would be useful to the community, to readers, and perhaps to the WMF
search and readership teams to have a list of pages that are most visited
but have no content and aren't redirects.

Pine
On May 22, 2015 11:50 AM, Kevin Leduc ke...@wikimedia.org wrote:

 We do not have such statistics.

 I wonder if it would be possible to set up an EventLogging schema to log
 hits to redlinks and what happens after.

 On Wed, May 20, 2015 at 10:37 PM, Amir E. Aharoni 
 amir.ahar...@mail.huji.ac.il wrote:

 Hi,

 Are there statistics about the number of people who click on red links in
 Wikimedia projects?

 And about what they do as the next step - go back, close the page, create
 an article, something else?

 --
 Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי
 http://aharoni.wordpress.com
 ‪“We're living in pieces,
 I want to live in peace.” – T. Moore‬

 ___
 Analytics mailing list
 Analytics@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics



 ___
 Analytics mailing list
 Analytics@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics


___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] [EE] Facebook edit button for Wikipedia

2015-05-11 Thread Pine W
Sounds good. Perhaps Kourosh Karimkhany could help with a conversation with
Facebook.

Pine
On May 11, 2015 7:47 AM, Dan Andreescu dandree...@wikimedia.org wrote:

On Fri, May 8, 2015 at 10:00 PM, Jeremy Baron jer...@tuxmachine.com wrote:

 On Sat, May 9, 2015 at 1:58 AM, Oliver Keyes oke...@wikimedia.org wrote:
  Facebook sanitises their users' referers. There's no research and
  engagement work to perform there.

 well facebook surely has this data. (both views of Wikipedia content
 and probably also clicks of edit buttons) we could ask them about
 sharing it.


We could even create a prominent who brought editors to Wikipedia
dashboard.  And if Facebook shares with us verifiable data, we can use it
as an incentive for google to add an Edit button in their Knowledge graph
(because then they'd get a spot on our spiffy dashboard).  Of course, we'd
put a bit this is all self reported data, etc. as a disclaimer at the
bottom.

___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Page views on a more frequent than hourly basis

2015-04-13 Thread Pine W
Hi Oliver, re ccing people who are on list, this is the protocol we
followed in IEGCom to ping people who are subscribed and mentioned in
certain emails but, like many of us, may automatically move emails from
lists directly to folders where they may be unread for days. So there is a
reason to do this.

Thanks,

Pine
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Page views on a more frequent than hourly basis

2015-04-13 Thread Pine W
Hi,

This issue of pageview data granularity has been discussed before, and the
answer has been that hourly is the smallest increment allowed to be
revealed publicly, for privacy reasons.

I believe that the person you will want to discuss your request with is
Toby, who I have cc'd here.

Pine
On Apr 13, 2015 12:11 AM, Hirav Gandhi hirav.gan...@gmail.com wrote:

 Hi Wikimedia Analytics Team,

 My colleague Bharath and I are doing research on dynamic server allocation
 algorithms and we were looking for a suitable datasets to test our
 predictive algorithm on. We noticed that Wikimedia has an amazing data set
 of hourly page views, but we were looking for something a bit more
 granular, such as aggregated page requests to English Wikipedia on a minute
 by minute basis or second by second basis if possible.

 We are more than happy to pour through any raw data you might have that
 would help us calculate page requests at this granular level. Please let us
 know if it would be possible to get such data and if so how. Thank you in
 advance for your help.

 Best,

 Hirav Gandhi
 ___
 Analytics mailing list
 Analytics@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics

___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] [Cluster] Monitoring the impact Hive jobs have on the Analytics cluster

2015-03-07 Thread Pine W
Chris, may I quote your email on BASH?

Pine
On Mar 7, 2015 6:14 AM, Christian Aistleitner christ...@quelltextlich.at
wrote:

 Hi,

 around running jobs on the Analytics cluster, I've sometime seen
 people say in IRC: “Let's run this heavy job. I'll keep an eye on it”.

 But more often than not, this seems to have meant:
 “Let's just run this heavy job and wait. If QChris joins IRC, let's
 hope he doesn't ping us about having overloaded the cluster.”

 That's not nice^Wscalable ;-)

 So just in case someone is vague on how to “keep an eye on it”, I did
 a short write-up at:

   https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hadoop/Load

 which details on detecting how the cluster is doing on a very high
 level.
 Especially, it allows you to detect if the cluster got stalled, and if
 it did, it tells you what to do.

 Have fun,
 Christian

 P.S.: The above URL has diagrams! Click the URL!

 --
  quelltextlich e.U.  \\  Christian Aistleitner 
Companies' registry: 360296y in Linz
 Christian Aistleitner
 Kefermarkterstrasze 6a/3 Email:  christ...@quelltextlich.at
 4293 Gutau, Austria  Phone:  +43 7946 / 20 5 81
  Fax:+43 7946 / 20 5 81
  Homepage: http://quelltextlich.at/
 ---

 ___
 Analytics mailing list
 Analytics@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics


___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] [Wikimedia-l] Ciritical level for addiction to Wikipedia

2015-03-04 Thread Pine W
Pinging Analytics to ask about editor longevity data (:

My understanding is that newbies (= 10 edits) are more likely to disappear
early in their careers than they were 5 years ago, but that editors that
have been active for years are likely to remain active for years.

It would be interesting, as part of the strategic plan process, to work on
improving editor retention. I believe that this may be related to our
treatment and training of newcomers (onboarding, civility, NPP, Teahouse,
etc.) in addition to external changes in our environment (e.g. the rise of
Facebook).


Pine

*This is an Encyclopedia* https://www.wikipedia.org/






*One gateway to the wide garden of knowledge, where lies The deep rock of
our past, in which we must delve The well of our future,The clear water we
must leave untainted for those who come after us,The fertile earth, in
which truth may grow in bright places, tended by many hands,And the broad
fall of sunshine, warming our first steps toward knowing how much we do not
know.*

*—Catherine Munro*

On Wed, Mar 4, 2015 at 8:44 AM, Anders Wennersten m...@anderswennersten.se
wrote:

 On svwp there has over the years been 45 individuals who have each made
 more then 38000 edits.
 Of these 45, 44 are still active, only one has  left (in 2009) making 97,7
 still around. For the users with less then 38 000 edits, only about 6 out
 of 10 is still active.

 Is this a global valid number, that when you have made 38000 edits you are
 fully addicted to Wikipedia (until death do us part)?


 Anders





 ___
 Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/
 wiki/Mailing_lists/Guidelines
 wikimedi...@lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
 mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] [Release]

2015-02-25 Thread Pine W
Excellent!

Pine
On Feb 25, 2015 1:26 PM, Oliver Keyes oke...@wikimedia.org wrote:

 Totally! I'm also going to get together with some NEU hackers tomorrow
 and work on actually visualising the data on *drumroll* maps, which'd
 probably be more interesting eye candy than infinite bar plots :)

 On 25 February 2015 at 16:19, Pine W wiki.p...@gmail.com wrote:
  Very nice. Do you think that you could pick out a few of your favorite
  graphs and add them to this week's Recent Research report in a gallery?
 
  Thanks!
  Pine
 
  Hey all!
 
  We've released a highly-aggregated dataset of readership data -
  specifically, data about where, geographically, traffic to each of our
  projects (and all of our projects) comes from. The data can be found
  at http://dx.doi.org/10.6084/m9.figshare.1317408 - additionally, I've
  put together an exploration tool for it at
  https://ironholds.shinyapps.io/WhereInTheWorldIsWikipedia/
 
  Hope it's useful to people!
 
  --
  Oliver Keyes
  Research Analyst
  Wikimedia Foundation
 
  ___
  Analytics mailing list
  Analytics@lists.wikimedia.org
  https://lists.wikimedia.org/mailman/listinfo/analytics
 
  ___
  Analytics mailing list
  Analytics@lists.wikimedia.org
  https://lists.wikimedia.org/mailman/listinfo/analytics
 



 --
 Oliver Keyes
 Research Analyst
 Wikimedia Foundation

 ___
 Analytics mailing list
 Analytics@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics

___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Welcome Joseph

2015-02-19 Thread Pine W
Welcome, Joseph.

Do you know what your early work assignments will be?

Cheers,
Pine
On Feb 19, 2015 8:13 AM, Dan Andreescu dandree...@wikimedia.org wrote:

 Welcome!

 On Thu, Feb 19, 2015 at 10:58 AM, Kevin Leduc ke...@wikimedia.org wrote:

 Welcome Joseph!

 On Wed, Feb 18, 2015 at 9:40 PM, Leila Zia le...@wikimedia.org wrote:

 Welcome to the team, Joseph!

 b.t.w., I didn't know you have a background in NLP. That skill may
 become handy soon. ;-)

 On Wed, Feb 18, 2015 at 6:37 PM, Toby Negrin tneg...@wikimedia.org
 wrote:

 Hi Everyone,

 I'd like to welcome Joseph Allemendou to the Analytics team! We are
 really excited to get some of Joseph's calibre to help take our analytics
 work to the next level.

 In his own words:

 Joseph's experiences were mostly with private companies and almost
 always involved open source software. After a M.S. in Computer Science
 with a specialization in programming languages theory and a PhD in the
 Natural Language Processing and Dialog Systems fields, Joseph worked
 four years in Ireland. He spent two years at IBM learning and applying
 project management and process improvement methodologies, and two other
 years building a start-up to help English as a foreign language teachers
 find up-to-date teaching material. Then he moved back to France and worked
 for Criteo as a specialist in scalabilty for one year, and as a manager for
 another year. Lastly Joseph worked with Fotolia, where he built the
 analytics architecture and team. Working with the Wikimedia Foundation
 allows him to really apply his energy and skills in the direction he wish
 the world to move on.

 Joseph is based in Brittany, France. Welcome Joseph!

 -Toby

 ___
 Analytics mailing list
 Analytics@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics



 ___
 Analytics mailing list
 Analytics@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics



 ___
 Analytics mailing list
 Analytics@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics



 ___
 Analytics mailing list
 Analytics@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics


___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] [Wikimedia-l] statistics of usage of wikimedia project per language used in the interface

2015-02-05 Thread Pine W
I think that aggregated geodata like this might be available. I am adding
Analytics to this email thread.

Pine
On Feb 5, 2015 8:37 AM, Romaine Wiki romaine.w...@gmail.com wrote:

 For Belgium I would like to know something different. Belgium doesn't have
 a primary language, but have Dutch, French, German and English. All these
 Wikipedia have other countries with a larger population where they speak
 the various languages. What would be interesting for us is to know what
 subjects are visited most in Belgium. This would be interesting per
 language, but also the languages combined (through interwiki
 links/Wikidata).

 Romaine


 2015-02-04 19:01 GMT+01:00 Federico Leva (Nemo) nemow...@gmail.com:

  charles andrès (WMCH), 04/02/2015 14:25:
 
  Is there a way to know how many people use Wikipedia per interface
  language?
 
 
  No.
 
   Said in other words, I want to know how many people display the
 Wikimedia
  project interface in the different version of German and Alemannisch.
 
 
  Until https://phabricator.wikimedia.org/T58464 is fixed (hopefully in
  this decade), the requests with non-default language are negligible.*
 What
  makes you think that you need such a level of precision and
  https://stats.wikimedia.org/EN/TablesPageViewsMonthlyCombined.htm is not
  enough?
 
  Once the use case for such precise numbers is clarified, probably we can
  extract exact data with a method similar to
 https://phabricator.wikimedia.
  org/T65416 after it's fixed (hopefully this year; the bug has made
  localisation and new subdomain requests practically impossible or
  unfeasible in dozens languages, for many months now).
 
  Nemo
 
  (*) Even considering the sum of requests with uselang parameter** and of
  registered users with a non-default language choice in preferences.
  (**) Even in Commons, despite all the uselang-specific incoming links and
  the language selection gadget.
 
 
  ___
  Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/
  wiki/Mailing_lists/Guidelines
  wikimedi...@lists.wikimedia.org
  Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
  mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe
 
 ___
 Wikimedia-l mailing list, guidelines at:
 https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
 wikimedi...@lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
 mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Relevant Content Availability

2015-01-21 Thread Pine W
Hi Rawia,

In response to your first two questions, these links might help you:

* https://stats.wikimedia.org/EN/Sitemap.htm
* https://stats.wikimedia.org/EN/TablesArticlesTotal.htm
* https://stats.wikimedia.org/EN/PlotsPngArticlesTotal.htm

Also, yes, article count includes stubs. Article count probably does not
include pages that are in the Draft or Articles for Creation processes, or
deleted articles; it would be good if someone can confirm this.

I'm not sure there have been major step changes in the number of articles
on the major language WIkipedias. Again, see by
https://stats.wikimedia.org/EN/PlotsPngArticlesTotal.htm. (There has been a
great deal of analysis and theorizing done about the changes in *active
editor* statistics over time.) However, there may be step changes in
article count on smaller Wikipedias; someone from Analytics, the Small Wiki
Monitoring Team, or the Incubator project might be able to help with that
question.

Thank you for your interest,

Pine (writing in an unofficial personal capacity only)

*This is an Encyclopedia* https://www.wikipedia.org/






*One gateway to the wide garden of knowledge, where lies The deep rock of
our past, in which we must delve The well of our future,The clear water we
must leave untainted for those who come after us,The fertile earth, in
which truth may grow in bright places, tended by many hands,And the broad
fall of sunshine, warming our first steps toward knowing how much we do not
know.*

*—Catherine Munro*

On Wed, Jan 21, 2015 at 12:47 AM, Abdel Samad, Rawia 
rawia.abdelsa...@strategyand.pwc.com wrote:

  Hello,



 I work for a consulting firm called Strategy. We have been engaged by
 Facebook on behalf of Internet.org to conduct a study on assessing the
 state of connectivity globally. One key area of focus is the availability
 of relevant online content. We are using a the availability of encyclopedic
 knowledge in one’s primary language as a proxy for relevant content. We
 define this as 100K+ Wikipedia articles in one’s primary language. We have
 a few questions related to this analysis prior to publishing it:

 · We are currently using the article count by language based on
 Wikimedia’s foundation public link: Source:
 http://meta.wikimedia.org/wiki/List_of_Wikipedias. Is this a reliable
 source for article count – does it include stubs?

 · Is it possible to get historic data for article count. It would
 be great to monitor the evolution of the metric we have defined over time?

 · What are the biggest drivers you’ve seen for step change in the
 number of articles (e.g., number of active admins, machine translation,
 etc.)

 · We had to map Wikipedia language codes to ISO 639-3 language
 codes in Ethnologue (source we are using for primary language data). The 2
 language code for a wikipedia language in the “List of Wikipedias”
 sometimes matches but not always the ISO 639-1 code. Is there an easy way
 to do the mapping?



 Many Thanks,

 Rawia




 [image: Description: Strategy Logo]

 *Formerly Booz  Company*



 *Rawia Abdel Samad*

 Direct: +9611985655 | Mobile: +97455153807

 Email: rawia.abdelsa...@strategyand.pwc.com

 www.strategyand.com



 ___
 Analytics mailing list
 Analytics@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics


___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


[Analytics] Fwd: Re: [Wikimedia-l] Chapters and GLAM tooling

2014-10-26 Thread Pine W
Forwarding comments from Wikimedia-l that may be of interest to a number of
subscribers on other lists.

Pine
-- Forwarded message --
From: Erik Moeller e...@wikimedia.org
Date: Oct 25, 2014 5:59 PM
Subject: Re: [Wikimedia-l] Chapters and GLAM tooling
To: Wikimedia Mailing List wikimedi...@lists.wikimedia.org
Cc:

On Sat, Oct 25, 2014 at 7:16 AM, MZMcBride z...@mzmcbride.com wrote:

 Labs is a playground and Galleries, Libraries, Archives, and Museums are
 serious enough to warrant a proper investment of resources, in my view.
 Magnus and many others develop magnificent tools, but my sense is that
 they're largely proofs of concept, not final implementations.

Far from being treated as mere proofs of concept, Magnus' GLAM tools
[1] have been used to measure and report success in the context of
project grant and annual plan proposals and reports, ongoing project
performance measurements, blog posts and press releases, etc. Daniel
Mietchen has, to my knowledge, been the main person doing any
systematic auditing or verification of the reports generated by these
tools, and results can be found in his tool testing reports, the last
one of which is unfortunately more than a year old. [2]

Integration with MediaWiki should IMO not be viewed as a runway that
all useful developments must be pushed towards. Rather, we should seek
to establish clearer criteria by which to decide that functionality
benefits from this level of integration, to such an extent that it
justifies the cost. Functionality that is not integrated in this
manner should, then, not be dismissed as proofs of concept but
rather judged on its own merits.

GWToolset [3] is a good example. It was built as a MediaWiki extension
to manage GLAM batch uploads, but we should not regard this decision
as sacrosanct, or the only correct way to develop this kind of
functionality. The functionality it provides is of highly specialized
interest, and indeed, the number of potential users to-date is 47
according to [4], most of whom have not performed significant uploads
yet.  Its user interface is highly specialized and special permissions
+ detailed instructions are required to use it. At the same time, it
has been used to upload 322,911 files overall, an amazing number even
without going into the quality and value of the individual
collections.

So, why does it need to be a MediaWiki extension at all? When
development began in 2012, OAuth support in MediaWiki did not exist,
so it was impossible for an external tool (then running on toolserver)
to manage an upload on the user's behalf without asking for the user's
password, which would have been in violation of policy. But today, we
have other options. It's possible that storage requirements or other
specific desired integration points would make it impossible to create
this as a Tool Labs tool -- but if we created the same tool today, we
should carefully consider that.

Indeed, highly specialized tools for the cultural and education sector
_are_ being developed and hosted inside Tool Labs or externally.
Looking at the current OAuth consumer requests [5], there are
submissions for a metadata editor developed by librarians at the
University of Miami Libraries in Coral Gables, Florida, and an
assignment creation wizard developed by the Wiki Education Foundation.
There's nothing improper about that, as Marc-André pointed out.

As noted before, for tools like the ones used for GLAM reporting to
get better, WMF has its role to play in providing more datasets and
improved infrastructure. But there's nothing inherent in the
development of those tools that forces them to live in production
land, or that requires large development teams to move them forward.
Auditing of numbers, improved scheduling/queuing of database requests,
optimization of API calls and DB queries; all of this can be done by
individual contributors, making this suitable work for even chapters
with limited experience managing technical projects to take on.

On the analytics side, we're well aware that many users have asked for
better access to the pageview data, either through MariaDB, or through
a dedicated API. We have now said for some time that our focus is on
modernizing the infrastructure for log analysis and collection,
because the numbers collected by the old webstatscollector code were
incomplete, and the infrastructure subject to frequent packet loss
issues. In addition, our ability to meet additional requirements on
the basis of simple pageview aggregation code was inherently
constrained.

To this end, we have put into production use infrastructure to collect
and analyze site traffic using Kafka/Hadoop/Hive. At our scale, this
has been a tremendously complex infrastructure project which has
included custom development such as varnishkafka [6]. While it's taken
longer than we've wanted, this new infrastructure is being used to
generate a public page count dataset as of this month, including
article-level mobile traffic for 

[Analytics] Research discussion: Visions for Wikipedia

2014-10-20 Thread Pine W
Both of the presentations at the October Wikimedia Research Showcase were
fascinating and I encourage everyone to watch them [1]. I would like to
continue to discuss the themes from the showcase about Wikipedia's
adaptability, viability, and diversity.

Aaron's discussion about Wikipedia's ongoing internal adaptations, and
the slowing of those adaptations, reminded me of this statement from MIT
Technology Review in 2013 (and I recommend reading the whole article [2]):

The main source of those problems (with Wikipedia) is not mysterious. The
loose collective running the site today, estimated to be 90 percent male,
operates a crushing bureaucracy with an often abrasive atmosphere that
deters newcomers who might increase partipcipation in Wikipedia and broaden
its coverage.

I would like to contrast that vision of Wikipedia with the vision presented
by User:CatherineMunro (formatting tweaks by me), which I re-read when I
need encouragement:

THIS IS AN ENCYCLOPEDIA
One gateway
to the wide garden of knowledge,
where lies
The deep rock of our past,
in which we must delve
The well of our future,
The clear water
we must leave untainted
for those who come after us,
The fertile earth,
in which truth may grow
in bright places,
tended by many hands,
And the broad fall of sunshine,
warming our first steps
toward knowing
how much we do not know.

How can we align ouselves less with the former vision and more with the
latter? [3]

I hope that we can continue to discuss these themes on the Research mailing
list. Please contribute your thoughts and questions there.

Regards,

Pine

[1] youtube.com/watch?v=-We4GZbH3Iw

[2]
http://www.technologyreview.com/featuredstory/520446/the-decline-of-wikipedia/

[3] Lest this at first seem to be impossible, I will borrow and tweak a
quote from from George Bernard Shaw and later used by John F. Kennedy:
Some people see things as they are and say, 'Why?' Let us dream things
that never were and say, 'Why not?'
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Records of article access

2014-10-20 Thread Pine W
Sorry, let's see if I can rephrase.

The issue is not really about the Signpost (I didn't know which person was
involved, and I wasn't planning to ask for a name; as I said up front, I am
not interested in stirring up trouble). My questions were about how to
reconcile that access with the recent discussions about instrumenting
Wikipedia to track unique readers (which I was under the impression is
still in the planning stages, so I was surprised to find out that this
already happens), how to reconcile that access with the Privacy Policy, and
how to make sure that in the general case of anyone accessing raw access
logs that the access itself is logged. On the last point, to use an
analogy, it's like someone accessing patient charts in a medical facility;
there might be a good reason for a technician to view 100s of records of
patients that he/she wasn't directly involved in treating, such as if the
technician is helping to conduct a study of which doctors prescribe which
treatments most often; on the other hand, if a technician is able to access
those records at will and without that access being logged, then this
creates worrisome potential for large-scale data harvesting for
unauthorized uses without that access being noticed, including uses by
someone whose account is compromised by a third party. If I was accessing
the Wikipedia raw logs, I would expect that my access would be logged and
monitored in the same way that I'm suggesting should be happening here, and
it's not because I'm any more or less trustworthy than anyone else.

Does that make sense? I am less worried about the specific case of the
Signpost and more worried about the general case of how the raw logs are
accessed and making sure that there are good controls and logs for that
access.

Thanks,
Pine

Pine








*This is an Encyclopedia https://www.wikipedia.org/One gateway to the
wide garden of knowledge, where lies The deep rock of our past, in which we
must delve The well of our future,The clear water we must leave untainted
for those who come after us,The fertile earth, in which truth may grow in
bright places, tended by many hands,And the broad fall of sunshine, warming
our first steps toward knowing how much we do not know.*

*—Catherine Munro*

On Mon, Oct 20, 2014 at 7:15 AM, Oliver Keyes oke...@wikimedia.org wrote:

 Sorry, but no; what additional conditions attached? We're *not giving
 them any information* except for a boolean this looks like illegitimate
 traffic, this one is legitimate or we can't tell and a wild stab at what
 kind of illegitimate traffic it might be.

 Please bear in mind that what you're essentially saying - or, how it's
 coming off - is that there is some shady, undocumented,
 privacy-policy-thorny thing going on here. That's a pretty big statement to
 make about the activities of a researcher. If you think you can
 substantiate it: tell me what conditions you might attach to the
 aforementioned information? Better yet, what information do you think is
 being transmitted? If you don't think you can substantiate it, don't say it.

 Again, I'm sorry to be blunt. But to me this is kind of a big deal. If
 I've screwed up in some way I'd like you to stop talking in subtext and
 tell me how you think I have. Because at the moment I'm not entirely sure
 what I'm meant to be clarifying. But if I haven't, this sort of discussion
 can have a big impact on someone's reputation, and I'd like to clear it up.

 On 19 October 2014 03:24, Pine W wiki.p...@gmail.com wrote:

 Thanks very much, Toby and everyone.

 Ironholds, I appreciate your doing traffic research on a volunteer basis
 for the benefit of the Signpost and the community. I'm concerned about the
 system as a whole may need a closer look, and I'm glad that Toby will be
 doing this with input from Legal.

 Toby: I hope we can continue to get some Ironholds-sponsored filtering
 for the Traffic Report, although we may need to get it with some additional
 conditions attached.

 Thanks and regards,

 Pine

 On Fri, Oct 17, 2014 at 3:20 PM, Toby Negrin tneg...@wikimedia.org
 wrote:

 Folks --

 While I'm pleased that this validation was being done by a team member
 with full knowledge of our privacy and data retention policies, I think
 some good points have been raised that we're going to need to discuss as a
 team. I've reached out to legal for their assistance is figuring out the
 path forward.

 -Toby

 On Fri, Oct 17, 2014 at 3:16 PM, Dan Andreescu dandree...@wikimedia.org
  wrote:

 I see - Oliver's batman.  Nothing to see here, moving on.

 On Fri, Oct 17, 2014 at 4:58 PM, Oliver Keyes oke...@wikimedia.org
 wrote:

 I should also point out that Toby not knowing who the staffer doing
 this one, highly specific, very minor piece of data-dogging is does not
 equate to analytics not knowing who it is. I don't know what you do for a
 living but do you tend to give your boss's boss a constant play-by-play,
 or? ;p. It's documented in Trello just like everything

Re: [Analytics] Records of article access

2014-10-20 Thread Pine W
I think we are now all getting on the same wavelength.

The one piece of this puzzle that I am still missing is understanding how
it seems like this traffic research for the Signpost was a surprise to Toby
and he was thinking that it would benefit from Legal's input, because if
the queries were being logged then I would have thought Toby would be aware
of them because he would see them in the logs, and I would think that he
and others would be regularly checking the logs to make sure that all
accesses look normal. Toby, can you comment on that, and also clarify what
part of this you are thinking will benefit from Legal's input?

Thanks,

Pine

*This is an Encyclopedia* https://www.wikipedia.org/






*One gateway to the wide garden of knowledge, where lies The deep rock of
our past, in which we must delve The well of our future,The clear water we
must leave untainted for those who come after us,The fertile earth, in
which truth may grow in bright places, tended by many hands,And the broad
fall of sunshine, warming our first steps toward knowing how much we do not
know.*

*—Catherine Munro*

On Mon, Oct 20, 2014 at 10:53 AM, Oliver Keyes oke...@wikimedia.org wrote:

 Makes sense. Yeah, I had a assuming everyone knows what you know moment;
 I appreciate the automated query logging may not be a known thing (for the
 reasons Jeremy sets out, it's currently accessible only via an internal
 proxy, which makes it a wee bit difficult for people to know that it exists
 ;p). Sorry about that.

 We could probably do it via Hadoop (it'd be a lot easier to automate!) if
 we come up with some useful heuristics for what automated activity looks
 like. I'm hoping that the spider/bot/automation identification as part of
 the pageviews definition will give us some of that.

 On 20 October 2014 13:50, Jeremy Baron jer...@tuxmachine.com wrote:

 On Oct 20, 2014 1:36 PM, Oliver Keyes oke...@wikimedia.org wrote:
  I guess mostly I'm just confused as to what you'd add on top of SSH
 keys, automated logging and transparent documentation.

 I *think* Pine was asking for automatic query logging similar to what
 you've just said is already happening.

 Eventually maybe we'll get these types of queries mostly running on
 hadoop+M/R. (vs. processing a local file on disk) We could publish public
 logs of M/R jobs and for some of them allow public download of the output.
 (but this particular query would not allow public downloading of the output
 because IP/UA string/etc.)

 -Jeremy

 ___
 Analytics mailing list
 Analytics@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics




 --
 Oliver Keyes
 Research Analyst
 Wikimedia Foundation

 ___
 Analytics mailing list
 Analytics@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics


___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


[Analytics] Editor interaction data sets and visualizations

2014-10-17 Thread Pine W
Dear colleagues,

As you may have heard,
https://meta.wikimedia.org/wiki/Grants:IEG/Editor_Interaction_Data_Extraction_and_Visualization
is an individual engagement grant proposal. I am working on this proposal
with volunteer assistance and advice from Aaron Halfaker (WMF), Haitham
Shammaa (WMF), and Fabian Flöck (Karlsruhe Institute of Technology
https://en.wikipedia.org/wiki/Karlsruhe_Institute_of_Technology).

We are still developing this proposal, and plan to have it finalized in the
next few days.

We would greatly appreciate your comments on whether you support or oppose
the general concept of this project, and any suggestions about how to
refine the proposal.

Additionally, we would like to hear from you about which sets of editor
interaction data, and what visualizations of editor interaction data, would
be most relevant to your interests. We intend to prioritize our outputs
with your comments in mind.

Please comment on the proposal talk page. Questions and feedback, both
positive and critical, are helpful to us as the proposers, and also help
the Individual Engagement Grants Committee [1] to assess the proposal.

Regards,

Pine

[1] I am a member of the Individual Engagement Grants Committee. I am
recusing from reviewing proposals in this funding round.
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Analytics dev points

2014-10-16 Thread Pine W
Thanks all.

Pine

On Thu, Oct 16, 2014 at 6:08 PM, Kevin Leduc ke...@wikimedia.org wrote:

 Hi Pine,

 Here's some documentation on the Analytics Team's methodology, and
 particularly the point scale:
 https://www.mediawiki.org/wiki/Analytics/Development_Process#Planning_Poker

 This morning the team tasked out some high priority features we need to
 build and then voted on how many points to assign to each story.  At our
 sprint planning meeting, we used the points to inform us on how much work
 we can commit to accomplishing in the next Sprint based on past Sprint
 velocity: http://sb.wmflabs.org/t/analytics-developers/



 On Thu, Oct 16, 2014 at 5:22 PM, Dan Garry dga...@wikimedia.org wrote:

 In Agile methodologies, story points are arbitrary unit [1] of
 measurement for the difficulty of completing a story. The number of points
 a story has correspond, roughly, to the amount of time the story will take
 to complete. Story points are decided by the team of engineers implementing
 the story.

 You might find this enlightening:
 http://programmers.stackexchange.com/questions/182057/why-do-we-use-story-points-instead-of-man-days-when-estimating-user-stories

 Dan

 [1]: https://en.wikipedia.org/wiki/Arbitrary_unit

 On 16 October 2014 17:15, Pine W wiki.p...@gmail.com wrote:

 I apologize if this is an elementary question, but what are points used
 to quantify when doing analytics development and how are points assigned?

 Pine

 ___
 Analytics mailing list
 Analytics@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics




 --
 Dan Garry
 Associate Product Manager, Mobile Apps
 Wikimedia Foundation

 ___
 Analytics mailing list
 Analytics@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics



 ___
 Analytics mailing list
 Analytics@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics


___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] [Wikitech-l] Tech Talk: Design Research in Product Development: Oct 22

2014-10-16 Thread Pine W
Thanks Rachel. I'm forwarding this invite to other lists.

Pine

On Wed, Oct 8, 2014 at 9:17 AM, Rachel Farrand rfarr...@wikimedia.org
wrote:

 Hi Pine,

 The streaming youtube link is public. Feel free to distribute it however
 you like. People can also ask questions on the google+ page during the talk
 if they don't have access to IRC. I will not be monitoring it quite as
 closely, but I still will check it during the talk.

 As you probably already know, the youtube video will also be public after
 the talk so anyone can watch or rewatch the talk at any point after it is
 over as well.

 I hope this answers your question,

 Rachel



 On Tue, Oct 7, 2014 at 11:08 PM, Pine W wiki.p...@gmail.com wrote:

 Hi Rachel,

 Would it be appropriate to invite people who are outside of the Wikimedia
 universe to watch on Youtube and participate on IRC? This talk in
 particular may interest outsiders who are designers, PMs,  researchers, or
 coders.

 Pine
 On Oct 7, 2014 3:40 PM, Rachel Farrand rfarr...@wikimedia.org wrote:

 Please join us for the following tech talk:

 *Tech Talk**:* Design Research in Product Development
 *Presenter:* Abbey Ripstra, Design  Usability Research Analyst on The UX
 team at the Wikimedia Foundation
 *Date:* October 22
 *Time:* 1900 UTC
 
 http://www.timeanddate.com/worldclock/fixedtime.html?msg=Tech+Talk%3A+Design+Research+in+Product+Developmentiso=20141022T19p1=1440ah=1
 
 Link to live YouTube stream http://www.youtube.com/watch?v=jYMTzzosUIw
 *IRC channel for questions:* #wikimedia-office
 Google+ page
 
 https://plus.google.com/u/0/b/103470172168784626509/events/caiiagf75bvddr09nf4jbgccn30
 ,
 another
 place for questions

 Talk description: The value of design research in product development is
 being recognized more frequently these days. This talk will quickly
 describe the innovation process, and how, when and why design research
 fits
 into the different parts of the innovation process. For most of the talk,
 Abbey will focus in on the product development part of innovation and
 describe how, when and why to best utilize the various methodologies of
 design research toward building intuitive, easy to use products that meet
 the needs of users. Abbey will also talk about, and want to collaborate
 on,
 the best ways to integrate design research, specifically, into product
 development at Wikimedia Foundation.
 ___
 Wikitech-l mailing list
 wikitec...@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l



___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


[Analytics] Records of article access

2014-10-16 Thread Pine W
Hi again Analytics,

I was under the impression that no records are kept of which IPs access
which articles on Wikipedia when no edits are made, but it appears that
such records are in fact kept [1].

Is this proper? This practice appears to be permissible under the Privacy
Policy which states that We use IP addresses for research and analytics;
to better personalize content, notices, and settings for you; to fight
spam, identity theft, malware, and other kinds of abuse; and to provide
better mobile and other applications.

It is possible that this information is relevant for determining the number
of unique visitors that Wikipedia gets and that this information is always
properly filtered before it gets to the Signpost. However, given recent
discussions which I thought said that Wikipedia was not instrumented to
track unique visitors, I am surprised to learn that this already seems to
be happening and that the situation has been this way for some time, so I
would appreciate clarification.

I want to emphasize that this question is about clarifying the practice of
tracking likely unique visitors by IP. This question is not intended to
start flame wars, get people into trouble, or limit the Signpost's access
to properly filtered information if there has been a determination that
WMF's retention of the raw data is appropriate. There might be appropriate
secondary questions about making sure that access to the raw IP access data
is carefully contained and secured.

Thank you very much,

Pine

[1]
https://en.wikipedia.org/w/index.php?title=User_talk%3ASerendipodousdiff=629934257oldid=629932288
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Records of article access

2014-10-16 Thread Pine W
Thanks Toby.

I understand that IPs are not an especially accurate way to look at unique
visitors, but for the purposes of the Signpost's traffic report and the Top
25 I feel that they are reasonable approximations of ways to filter out
what appear to be automated requests.

I am ok with holding those logs for 30 days, although I am a little
surprised to hear that this is happening. However, what worries me a bit
more is the idea that a staff member can be accessing those logs without
that access being recorded. This might be something that you wish to
investigate further.

I am not interested in getting this staff person into trouble. The
information that they are providing is useful to the Signpost and certainly
seems to be sanitized to a reasonable degree. However, it does concern me
that they can access these logs without someone knowing about it, it seems
to me that this sort of activity should be proactively disclosed to people
in WMF who conduct legal and security reviews, and I hope you will consider
what sort of security features are appropriate to make sure that occasions
when anyone accesses the raw logs are recorded in a robust manner. I worry
that if this one staffer can access logs without the higher-ups knowing
about it, it is possible that someone who intends to do unethical
activities with WMF's data could also access the logs without being noticed.

Thanks,

Pine


On Thu, Oct 16, 2014 at 9:31 PM, Toby Negrin tneg...@wikimedia.org wrote:

 Hi Pine --

 Thanks for this -- it's a challenging topic but one that the Analytics
 team takes very seriously.

 I'm not familiar with the IP address review that's referenced in the link.
 I don't know who the staffer might be. We don't currently calculate unique
 visitors to anything in Analytics and IP address is not a particularly
 accurate way to assess unique visitors regardless (due to proxies/NATs/etc).

 We do store IPs as part of page requests in our raw logs which are deleted
 every 30 days. This data is kept on a system where access is limited and
 controlled by the operations team. We're in line with the privacy policy on
 this.

 To be clear, we are currently considering mechanisms to count unique
 requests -- we rely on Comscore for this data and for several reasons,
 primarily related to mobile usage, it's not sufficient to understand our
 usage patterns. We are putting together some proposals to do this in as
 limited way as possible and that's respectful to our users. We'll share
 this with the community when we feel we understand the use cases and
 trade-offs well enough to discuss in an informed manner.

 -Toby



 We do store the IP address associated with varnish requests as part of the
 log. This data is



 On Thu, Oct 16, 2014 at 8:50 PM, Pine W wiki.p...@gmail.com wrote:

 Hi again Analytics,

 I was under the impression that no records are kept of which IPs access
 which articles on Wikipedia when no edits are made, but it appears that
 such records are in fact kept [1].

 Is this proper? This practice appears to be permissible under the Privacy
 Policy which states that We use IP addresses for research and analytics;
 to better personalize content, notices, and settings for you; to fight
 spam, identity theft, malware, and other kinds of abuse; and to provide
 better mobile and other applications.

 It is possible that this information is relevant for determining the
 number of unique visitors that Wikipedia gets and that this information is
 always properly filtered before it gets to the Signpost. However, given
 recent discussions which I thought said that Wikipedia was not instrumented
 to track unique visitors, I am surprised to learn that this already seems
 to be happening and that the situation has been this way for some time, so
 I would appreciate clarification.

 I want to emphasize that this question is about clarifying the practice
 of tracking likely unique visitors by IP. This question is not intended to
 start flame wars, get people into trouble, or limit the Signpost's access
 to properly filtered information if there has been a determination that
 WMF's retention of the raw data is appropriate. There might be appropriate
 secondary questions about making sure that access to the raw IP access data
 is carefully contained and secured.

 Thank you very much,

 Pine

 [1]
 https://en.wikipedia.org/w/index.php?title=User_talk%3ASerendipodousdiff=629934257oldid=629932288


 ___
 Analytics mailing list
 Analytics@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics



 ___
 Analytics mailing list
 Analytics@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics


___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


[Analytics] Happy Ada Lovelace Day

2014-10-15 Thread Pine W
Ada Lovelace Day is celebrated on October 14 this year.

Augusta Ada King, Countess of Lovelace (born in the year 1815) was a
mathematician and computer programmer who worked on Charles Babbage's
Analytical Engine. She foresaw how computers could evolve into devices that
perform tasks more sophisticated than simple calculations. She is
controversially  credited with authoring the world's first computer
program, and certainly worked extensively with Babbage. [1]

Ada Lovelace Day celebrates women's contributions to science, technology,
engineering, and mathematics.

Wikimedia Commons, English Wikipedia, and Persian Wikipedia have designated
a watercolor portrait of Lovelace as a featured picture. [2]

Happy Ada Lovelace Day,

Pine

[1] https://en.wikipedia.org/wiki/Ada_Lovelace

[2] https://commons.m.wikimedia.org/wiki/File:Ada_Lovelace_portrait.jpg
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


[Analytics] Fwd: [Wikitech-l] Tech Talk: The Dashboarding Problem: October 6

2014-10-11 Thread Pine W
Thanks for this. I'm forwarding to the Analytics and Research lists.

Pine


-- Forwarded message --
From: Rachel Farrand rfarr...@wikimedia.org
Date: Mon, Oct 6, 2014 at 1:12 PM
Subject: Re: [Wikitech-l] Tech Talk: The Dashboarding Problem: October 6
To: Wikimedia developers wikitec...@lists.wikimedia.org


Thank you for the great turnout today!

If you would like to view the recording of the talk, here is the link:
http://www.youtube.com/watch?v=hzMwwLfvh5g

If you have any questions about today's talk please feel free to get in
touch with Dan Andreescu dandree...@wikimedia.org and Nuria Ruiz 
nu...@wikimedia.org

You can check out past tech talk recondrings at the MediaWiki YouTube page
here: http://www.youtube.com/channel/UCg4wlhlN8RjP6_e_vMC4CTA

If you would like to nominate future tech talks or see what we have coming
up, go here:
https://www.mediawiki.org/wiki/Project:Calendar/How_to_schedule_an_event/TechTalks

Thanks!

On Mon, Oct 6, 2014 at 11:03 AM, Rachel Farrand rfarr...@wikimedia.org
wrote:

 Reminder: This tech talk starts in 1 hour

 On Wed, Oct 1, 2014 at 12:01 PM, Rachel Farrand rfarr...@wikimedia.org
 wrote:

 Please join us for the following tech talk:

 Tech Talk: *The Dashboarding Problem*
 Date: October 6
 Time: 1900 UTC
 
http://www.timeanddate.com/worldclock/fixedtime.html?msg=Tech+Talk%3A+The+Dashboarding+Problemiso=20141006T19p1=1440ah=1

 Link to live YouTube stream http://www.youtube.com/watch?v=hzMwwLfvh5g
 IRC channel for questions: #wikimedia-office
 Google+ page
 
https://plus.google.com/u/0/b/103470172168784626509/events/ch8uuivq05nqejqlivrqni6v1n0,
another
 place for questions

 Talk description:
 The Analytics team has been busy exploring dashboarding and visualizing
 editor engagement data. We found that while most people focus on
 visualization, data access and information architecture are just as
 important and separate problems.
 Mike Bostock solved visualization and the design team took care of
 information architecture, so we built a dashboard around their work.
 In this talk we share our learnings from developing dashiki, our new
 dashboard stack. We will talk about why we believe a server-less
javascript
 app was the right architecture for the problem, how with about 900 lines
of
 javascript we transform data into Vega grammar, and how knockout
components
 helped us stay modular.

 While we'll look at some javascript, the talk is high level, about 30
 minutes long, and everyone that is interested in dashboarding,
 visualization, and modularity  is welcome to attend.

 Dashiki Code: https://github.com/wikimedia/analytics-dashiki

 Editor Dashboard: https://metrics.wmflabs.org/static/public/dash/



___
Wikitech-l mailing list
wikitec...@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Welcome Marcel Ruiz Forns to the Analytics Development team

2014-10-08 Thread Pine W
Excellent. May I ask what features he is implementing for Grantmaking?

Pine
On Oct 7, 2014 4:44 PM, Toby Negrin tneg...@wikimedia.org wrote:

 Hi Everyone,

 I'd like to welcome Marcel to the Analytics team. We're super excited to
 have someone with Marcel's skills and experience on the team.

 In his own words:

 Marcel is a Spanish computer science engineer, currently living in Brazil.
 He has worked lately with recommender systems and e-commerce analytics at
 Chaordic in Brazil. Before that, he worked with natural language processing
 at the Autonomous University of Barcelona (UAB) and also with serious games
 development at the Poly-technical University of Catalonia (UPC), the same
 place he studied his B.S.

 He'll be working on Wikimetrics initially, implementing some feature
 requests for Grantmaking.

 Please join me in welcoming Marcel! He'll be remote, but he's in San
 Francisco at the moment, so please stop by Chambers to say hello if you are
 in the office.

 -Toby



 ___
 Analytics mailing list
 Analytics@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics


___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] stats.wikimedia.org and datasets.wikimedia.org unavailable

2014-10-05 Thread Pine W
Thanks. It would also be interesting to know why the Icinga alarm was muted.

Pine
On Oct 5, 2014 4:48 PM, Christian Aistleitner christ...@quelltextlich.at
wrote:

 Hi,

 the machine that hosts

   stats.wikimedia.org
   datasets.wikimedia.org

 is experiencing problems, and hence the above sites are currently
 unavailable.

 Investigation is still going on.
 We're tracking the issue at
   https://bugzilla.wikimedia.org/show_bug.cgi?id=71686

 Sorry for the inconveniences,
 Christian


 --
  quelltextlich e.U.  \\  Christian Aistleitner 
Companies' registry: 360296y in Linz
 Christian Aistleitner
 Kefermarkterstrasze 6a/3 Email:  christ...@quelltextlich.at
 4293 Gutau, Austria  Phone:  +43 7946 / 20 5 81
  Fax:+43 7946 / 20 5 81
  Homepage: http://quelltextlich.at/
 ---

 ___
 Analytics mailing list
 Analytics@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics


___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Individual Engagement Grant funding available

2014-09-29 Thread Pine W
En español:

A todos,

Este es un recordatorio de que solicitudes serán aceptados hasta el 30 de
septiembre.


In English:

All,

This is a reminder that applications will be accepted until September 30.

Other announcement translations may be found by clicking the links below:
বাংলা https://meta.wikimedia.org/wiki/Grants:IEG/IEG_Round_Two_-_2014/bn •
‎Deutsch
https://meta.wikimedia.org/wiki/Grants:IEG/IEG_Round_Two_-_2014/de •
italiano
https://meta.wikimedia.org/wiki/Grants:IEG/IEG_Round_Two_-_2014/it • ‎日本語
https://meta.wikimedia.org/wiki/Grants:IEG/IEG_Round_Two_-_2014/ja • ‎
русский https://meta.wikimedia.org/wiki/Grants:IEG/IEG_Round_Two_-_2014/ru


Pine


On Tue, Sep 2, 2014 at 12:16 PM, Pine W wiki.p...@gmail.com wrote:

 Forwarding from Siko Bouterse:

 Greetings! The Wikimedia Foundation Individual Engagement Grants program
 is accepting proposals for funding new experiments from September 1st to
 30th. https://meta.wikimedia.org/wiki/Grants:IEG

 Your idea can improve Wikimedia projects by building a new tool or gadget,
 organizing a better process on your wiki, conducting research on an
 important issue, or providing other support for community-building. Whether
 you need $200 or $30,000 USD, Individual Engagement Grants can cover your
 own project development time in addition to funding for a team to help you.
 The program has a flexible schedule and reporting structure, and
 Grantmaking staff are there to support you through all stages of the
 process.

 Do you have have a good idea, but you are worried that it isn’t developed
 enough for a grant?  Put it into the IdeaLab, where volunteers and staff
 can give you advice and guidance on how to bring it to life. 
 https://meta.wikimedia.org/wiki/Grants:IdeaLab  Also, IEG will be
 hosting three Hangout Sessions for real-time discussions to help you make
 your proposal better - the first will happen on September 16th. 
 https://meta.wikimedia.org/wiki/Grants:IdeaLab/Events#Upcoming_events

 For inspiration, you can read more about past projects 
 https://blog.wikimedia.org/tag/individual-engagement-grants/ that
 received funding or review open proposals 
 https://meta.wikimedia.org/wiki/Grants:IEG#ieg-reviewing. We are excited
 to see some of the new ways your grant ideas can support our community and
 make an impact on the future of Wikimedia projects.

 Submit your proposal in September! 
 https://meta.wikimedia.org/wiki/Grants:IEG#ieg-apply



___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Errors on stats.grok.se

2014-09-26 Thread Pine W
Thanks, Christian, Henrik and Alex.

Pine
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Editor engagement analytics

2014-08-15 Thread Pine W
I was thinking of tools in the sense of software developed by Analytics for
use by the EE teams when they measure the effectiveness of their work or
look for new opportunities. EEVS appears to be a good example of what I had
in mind as a tool. Your Snuggle project is something I would consider to be
more like a feature because it is intended for use by end users.

In other words, tools are used by devs and PMs to develop and evaluate
their products and opportunities for end-user engagement, and features
are used by end users. That is my arbitrary way of differentiating types of
software. Can you think of better terminology?

Pine
On Aug 15, 2014 2:42 AM, Aaron Halfaker ahalfa...@wikimedia.org wrote:

 Pine, can you help me understand the difference between tools and features?

 Could you be referring to things like Snuggle[1], my academic/volunteer
 work to improve editor engagement on-wiki?  If so, I wouldn't refer to that
 as something that's been developed by Analytics.

 1. https://snuggle-en.wmflabs.org/

 -Aaron


 On Thu, Aug 14, 2014 at 7:45 PM, Pine W wiki.p...@gmail.com wrote:

 Thanks all. My question was more about tools than features which is why I
 asked here.

 Pine
 On Aug 14, 2014 7:19 AM, Aaron Halfaker ahalfa...@wikimedia.org
 wrote:

 bah!  I forgot about that list!


 On Thu, Aug 14, 2014 at 3:15 PM, Dario Taraborelli 
 dtarabore...@wikimedia.org wrote:

 Pine – in fact (as I am sure you know, as you post frequently there)
 you can reach most Product people involved in the design of editor
 engagement features/experiments via e...@lists.wikimedia.org.

 On Aug 14, 2014, at 7:10 AM, Toby Negrin tneg...@wikimedia.org wrote:

 Thanks Aaron -- well said.

 We are collaborating with the growth team on task suggestions which is
 one of the first areas where we see our data being used to drive feature
 development. We have some ideas in this area but our activities have been
 focused on measurement and comprehension.

 -Toby


 On Thu, Aug 14, 2014 at 7:07 AM, Aaron Halfaker 
 ahalfa...@wikimedia.org wrote:

 Hey Pine,

 We don't deploy software that affects the user experience on Wikimedia
 projects, so it is hard to identify any direct effect on editor engagement
 that we've had.  The Product teams[1] develop user-facing features.  It
 doesn't look like they have a public facing mailing list, but the 
 community
 engagement team (for product)[2] does.  You can contact them at
 c...@lists.wikimedia.org.

 In analytics, we develop new measures of editor engagement (among
 other things)[3] and deploy those measures for public use.  For example,
 see WikiMetrics[4].  We also support the product teams by helping them
 identify which features are likely to have a positive impact with
 background analysis (e.g. [5]) and by running experiments to help product
 teams iterate toward feature designs that maximize positive impact (e.g.
 [6]).  Right now, we provide direct support of the Growth[7] and Mobile[8]
 product teams, but we also consult with other teams at the WMF and engage
 with community outreach efforts (e.g. [9]) in our (not so copious) free
 time.

 1. https://www.mediawiki.org/wiki/Product
 2. https://www.mediawiki.org/wiki/Community_Engagement_(Product)
 3.
 https://www.mediawiki.org/wiki/Analytics/Epics/Editor_Engagement_Vital_Signs
 4. https://metrics.wmflabs.org/
 5. https://meta.wikimedia.org/wiki/Research:Wikipedia_article_creation
 6.
 https://meta.wikimedia.org/wiki/Research:Asking_anonymous_editors_to_register
 7. https://meta.wikimedia.org/wiki/Growth
 8. https://www.mediawiki.org/wiki/Mobile_web_projects
 9.
 https://meta.wikimedia.org/wiki/Research:Labs2/Hackathons/August_6-7th,_2014

 -Aaron


 On Thu, Aug 14, 2014 at 2:11 AM, Pine W wiki.p...@gmail.com wrote:

 Hi Analytics team,

 I'm curious, which tools developed by Analytics have contributed
 notably to editor engagement successes?

 Pine

 ___
 Analytics mailing list
 Analytics@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics



 ___
 Analytics mailing list
 Analytics@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics


 ___
 Analytics mailing list
 Analytics@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics



 ___
 Analytics mailing list
 Analytics@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics



 ___
 Analytics mailing list
 Analytics@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics


 ___
 Analytics mailing list
 Analytics@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics



 ___
 Analytics mailing list
 Analytics@lists.wikimedia.org
 https

[Analytics] Fw: [Wikimedia-l] Great editor for Wikipedia infoboxes/templates connected to Wikidata

2014-08-09 Thread Pine W
Fowarding.

Pine
On Aug 9, 2014 1:58 AM, David Cuenca dacu...@gmail.com wrote:

 User:Vlsergey from ruwiki presented yesterday at the Wikidata Meetup a new
 wonderful infobox editor for Wikipedia infoboxes:

 https://www.wikidata.org/wiki/Wikidata:Project_chat#Edit_directly_from_infocard_.2F_infobox

 And also an improved authority template which you can see in action at
 Obama

 https://ru.wikipedia.org/wiki/%D0%9E%D0%B1%D0%B0%D0%BC%D0%B0,_%D0%91%D0%B0%D1%80%D0%B0%D0%BA#.D0.A1.D1.81.D1.8B.D0.BB.D0.BA.D0.B8

 Plus some interfaces for editing person info, taxons, and work/edition
 source info:

 https://www.wikidata.org/wiki/Wikidata:Project_chat/Archive/2014/07#WEF_gadgets_update

 I think these are great improvements for editing wikidata from wikipedia
 and I hope you can spread the word in your local wikis about these
 wonderful tools.

 Thanks!
 Micru
 ___
 Wikimedia-l mailing list, guidelines at:
 https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
 wikimedi...@lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
 mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Data inconsistency with displayMobile in ServerSideAccountCreation

2014-07-24 Thread Pine W
I believe that Android will run on desktops. Would Android desktops account
for the nonzero number?

See
http://www.pcworld.com/article/2048220/hybrid-hijinks-how-to-install-android-on-your-pc.html

Pine
On Jul 24, 2014 5:37 PM, Dan Garry dga...@wikimedia.org wrote:

 Hi!

 So I've been rooting around in ServerSideAccountCreation and I've noticed
 some inconsistencies in the data. The final two clauses in the WHERE in the
 following query should be mutually exclusive (registered on Android app,
 and registered not on mobile), but the number returned is nonzero.

 SELECT count(*)
 FROM ServerSideAccountCreation_5487345
 WHERE timestamp = 2014072200
 AND timestamp = 2014072300
 AND userAgent like 'WikipediaApp%'
 AND event_displayMobile = 0

 I'm sure you guys get data inconsistencies like this all the time, but I
 thought I should at least report it so you're aware.

 Thanks,
 Dan

 --
 Dan Garry
 Associate Product Manager for Platform and Mobile Apps
 Wikimedia Foundation

 ___
 Analytics mailing list
 Analytics@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics


___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] [wmfresearch] Want to examine editors cross-wiki activities, have a table.

2014-06-13 Thread Pine W
Thanks!

Pine


On Thu, Jun 12, 2014 at 10:16 PM, Federico Leva (Nemo) nemow...@gmail.com
wrote:

 Pine W, 13/06/2014 05:27:

 Interesting! Is there a way that I can use this with
 metrics.wikimedia.org http://metrics.wikimedia.org to perform

 cross-wiki cohort analysis, or do I need access to
 analytics-store.eqiad.wmnet?


 https://meta.wikimedia.org/wiki/Grants_talk:Evaluation/
 Learning_modules/1Cross-project_Cohorts

 Nemo


 ___
 Analytics mailing list
 Analytics@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics

___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] [wmfresearch] Want to examine editors cross-wiki activities, have a table.

2014-06-12 Thread Pine W
Interesting! Is there a way that I can use this with metrics.wikimedia.org
to perform cross-wiki cohort analysis, or do I need access to
analytics-store.eqiad.wmnet?

Pine


On Thu, Jun 12, 2014 at 5:50 PM, Aaron Halfaker ahalfa...@wikimedia.org
wrote:

 The only reason I didn't break this down by namespace was because they
 queries would have taken an order of magnitude longer to join the revision
 and page tables.  The query I used didn't even need to read the revision or
 archive tables.  It only read an index on those tables.  That made it go
 pretty fast.  :)  I'd be interested in taking another pass if you guys
 don't mind dealing with a heavier server load.

 On Thu, Jun 12, 2014 at 7:05 PM, Dario Taraborelli 
 dtarabore...@wikimedia.org wrote:

 Aaron – this is fantastic.
 Two quick questions:

 - was the decision not to break down the data by namespace (matching Erik
 Zachte’s master editor data dump) intentional?
 - are we expecting to refresh the archived revision count field every
 month?

 Dario


 On Jun 12, 2014, at 2:33 PM, Aaron Halfaker ahalfa...@wikimedia.org
 wrote:

 +1

 For example, the last time I sent a similar email to the list, it was for
 the wiki_info table.  One of the tasks I have is to break the code for
 generating that table out of the analysis project it lives in and make it a
 separate repo so that Oliver can send pull requests to fix issues and/or
 maintain his own managed table.

 It would be great to work towards an architecture that allows us to keep
 these tables up-to-date without user-based cron jobs.

 -Aaron

 On Thu, Jun 12, 2014 at 4:24 PM, Dan Andreescu dandree...@wikimedia.org
 wrote:

 This is great.  I'd like to go on record saying that this is leaning
 towards a data warehouse kind of approach - basically pre-aggregating
 useful datasets.  So we might want to do this in a more organized way down
 the line.


 On Thu, Jun 12, 2014 at 2:57 PM, Oliver Keyes oke...@wikimedia.org
 wrote:

 This is fricking awesome!


  On 12 June 2014 10:58, Aaron Halfaker ahalfa...@wikimedia.org wrote:

 I created a new table on analytics-store.eqiad.wmnet.  It contains the
 monthly edit counts for all wikis.  See a brief overview below.

 Note that the revisions column contains a count of all revisions --
 archived or not.  The archived column contains a count of archived
 revisions.   So revisions - archived == non-archived revisions.

 analytics-store.eqiad.wmnet [staging] explain editor_month;
 +---++--+-+-+---+
 | Field | Type   | Null | Key | Default | Extra |
 +---++--+-+-+---+
 | wiki  | varbinary(50)  | NO   | PRI | |   |
 | month | varbinary(7)   | NO   | PRI | |   |
 | user_id   | int(11)| NO   | PRI | 0   |   |
 | user_name | varbinary(191) | YES  | | NULL|   |
 | user_registration | varbinary(14)  | YES  | | NULL|   |
 | archived  | int(11)| YES  | | NULL|   |
 | revisions | int(11)| YES  | | NULL|   |
 +---++--+-+-+---+
 7 rows in set (0.01 sec)

 analytics-store.eqiad.wmnet [staging] select * from editor_month
 limit 3;

 ++-+-++---+--+---+
 | wiki   | month   | user_id | user_name  | user_registration |
 archived | revisions |

 ++-+-++---+--+---+
 | enwiki | 2001-01 |  34 | WojPob | 20010129110725|
  0 |13 |
 | enwiki | 2001-01 |  99 | RoseParks  | 20010121021221|
  0 | 7 |
 | enwiki | 2001-01 | 479 | JimboWales | 20010123223416|
  0 |13 |

 ++-+-++---+--+---+
 3 rows in set (0.03 sec)

 Feedback is welcome.   One of the next things, I'd like to do is
 remove the - from the month column as it ruins comparison with MW
 timestamps.

 -Aaron

 ___
 wmfresearch mailing list
 wmfresea...@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wmfresearch




 --
 Oliver Keyes
 Research Analyst
 Wikimedia Foundation

 ___
 Analytics mailing list
 Analytics@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics



 ___
 Analytics mailing list
 Analytics@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics



 ___
 Analytics mailing list
 Analytics@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics



 ___
 Analytics mailing list
 Analytics@lists.wikimedia.org