Re: [Toolserver-l] [Toolserver-announce] Maintenance: JIRA, FishEye: Saturday 30th April

2011-04-29 Thread River Tarnell
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

River Tarnell:
> Tonight (UTC) we will upgrade JIRA and FishEye to the latest version.  
> This will involve around 15 minutes downtime for each service.

This is now finished.  Downtime for JIRA was longer than expected (about 
30 minutes) but within the maintenance window.

- river.
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (SunOS)

iEYEARECAAYFAk27YJQACgkQIXd7fCuc5vIhPACeMA4ZquPHssQOdUc9vFUQFkDb
DjwAn0TU8hU5cVGNM8s1SWJsHgj5VvNF
=wC4e
-END PGP SIGNATURE-

___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette


[Toolserver-l] Maintenance: JIRA, FishEye: Saturday 30th April

2011-04-29 Thread River Tarnell
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi,

Tonight (UTC) we will upgrade JIRA and FishEye to the latest version.  
This will involve around 15 minutes downtime for each service.

Start time: Saturday, 30 April 2011, h UTC
http://time.tcx.org.uk/utc/2011-04-30/00:00

End time: Saturday, 30 April 2011, 0100h UTC
http://time.tcx.org.uk/utc/2011-04-30/01:00

- river.
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (SunOS)

iEYEARECAAYFAk2616EACgkQIXd7fCuc5vICXQCfWdhs8DjcqZJ1a2exNODKdB5j
BnIAnjopZFef9aJyWzOdqccRrecRynZ0
=h0Vs
-END PGP SIGNATURE-

___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette


Re: [Toolserver-l] Query Service Inquiry

2011-04-29 Thread Platonides
Manish Goregaokar wrote:
>>  1. Select 200 random articles.
>>  2. Get the top contributors for each of them.
>>  3. Get the edit counts for those contributors.
>>
> 
> I think he has the list/s of 200 articles, and does not want random ones.
> Plus, he doesn't want the editcounts, he wants their top edited articles,
> with the editcount per article.
> 
> My personal opinion is that this HAS to be done via php (though I can't
> comment of server load).
> Use php-mysql to determine the list of top contributors per given article,
> then loop for each contributor, and give *his* top edited articles...
> Shouldn't be hard, though you might want to clarify what you mean by "top".
> (Top 3? More than X edits? More than X% edits per day/week/month/beginning
> of time? More than X% edits of the top editor?).
> 
> -Manishearth

It's quite easy processing the stub-pages-articles dump, too.

1. Read the dump, if the page title matches, record all editing users.
2. Order the author list per article, select which ones pass to the next
phase.
3. Read the dump again, if the user edited that page (and it's in the
main namespace), record that page name.
4. ???
5. Profit

You may be able to get several steps with a single SQL query, but I'm
not convinced that would perform significantly better.
Working form a XML is a bit outdated, but more reproduceable.

___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette


Re: [Toolserver-l] Query Service Inquiry

2011-04-29 Thread Manish Goregaokar
>  1. Select 200 random articles.
>  2. Get the top contributors for each of them.
>  3. Get the edit counts for those contributors.
>

I think he has the list/s of 200 articles, and does not want random ones.
Plus, he doesn't want the editcounts, he wants their top edited articles,
with the editcount per article.

My personal opinion is that this HAS to be done via php (though I can't
comment of server load).
Use php-mysql to determine the list of top contributors per given article,
then loop for each contributor, and give *his* top edited articles...
Shouldn't be hard, though you might want to clarify what you mean by "top".
(Top 3? More than X edits? More than X% edits per day/week/month/beginning
of time? More than X% edits of the top editor?).

-Manishearth
___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette

Re: [Toolserver-l] Query Service Inquiry

2011-04-29 Thread Ilmari Karonen
On Thu, 2011-04-28 at 21:49 -0600, Jim Hutchinson wrote:
> Thank you for the information and feedback. What I need is somewhat
> more complicated than a list of contributors to a single page. In
> fact, there is already a tool in Wikipedia (just called contributors,
> I think) that lists all the contributors to an article and their
> number of edits. What I need to do is, using that list of
> contributors, select the top 20 or so (excluding bots) for each of the
> hundred selected articles and get a list of all of the other articles
> to which each of them contributed with a frequency count of edits.
> Ideally, this data would be in a table of sorts for each article
> selected (so 100 tables).
> 
> This could, of course, be done manually by searching for contributions
> by username. however, this will be time consuming and possibly error
> prone. My hope was that a query could grab this information fairly
> quickly as well as automatically count frequencies of edits per
> article, etc.
> 
> I don't have the expertise to do this myself, but I do know someone
> who can and has requested an account. However, he is afraid he will
> not be granted an account for what will likely be a one time project.
> 
> Is there likely an API that can do what I described or would a query
> be an easier or more efficient way to go?

Technically, most of this shouldn't be too hard to do using SQL queries
on the toolserver.  One disadvantage, though, is that the toolserver
does not have (direct) access to page text.  This could be a problem if
you, say, wanted to exclude reverts from the edit count, weigh edits by
the amount of text added or do some other kind of fine-grained
processing.

Basically, you have three steps you want to do:

 1. Select 200 random articles.
 2. Get the top contributors for each of them.
 3. Get the edit counts for those contributors.

The first step is easy, as long as the (not quite uniform) random page
selection algorithm built into MediaWiki is good enough for you.  You
could do it using a Toolserver SQL query, or just by clicking the
"random page" link 200 times (by hand or by bot), but the simplest way
would probably be to use the API:
http://www.mediawiki.org/wiki/API:Random

If you wanted a more uniform sample, you could download the page table
SQL dump (page.sql.gz), extract the page titles from it (with
appropriate filtering, e.g. to exclude redirects) and randomly select
200 of them.

The second step could be easily done on the Toolserver, as long as you
only wanted to count edits.  For more fine-grained filtering based on
page text, you could use Special:Export to obtain a "mini-dump" of the
pages in your sample, including their full history, in XML format.
Alternatively, the same information is also available using the API.

(The detail about excluding bots comes down to determining what is a
bot.  MediaWiki does feature a "bot flag", which can be used to filter
out users having it.  Unfortunately, for various reasons, not all bot
accounts necessarily have the flag set.  You might be able to filter out
more bots by looking at, say, the categories on their user page, but
ultimately you may still end up having to do some manual filtering.)

The last step could, again, be fairly easily done on the Toolserver as
long as you only wanted the raw edit counts.  In fact, it would probably
be best to start with that data anyway, and then refine it by looking at
the relevant page histories if necessary.

-- 
Ilmari Karonen


___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette