Re: [Foundation-l] Where do our readers come from? QA

2010-01-16 Thread Ziko van Dijk
Dear Erik,

Maybe there is a dirty Polish word looked up by many Polish pupils,
and when they Google it they come to eu.WP because a Basque word
accidentally is alike? :-)

I am looking now for the interest in the native / the English
Wikipedia in specific countries. It might be important how localized
the software in general is. If you live in, say, Kenya, and your
computer has Windows in English, the Internet Explorer and everything
is oriented to English, and you google your home town in an English
language Google, it is probable that you will get the Wikipedia
article in English and not in Swahili.

Kind regards
Ziko


2010/1/16 Mark Williamson node...@gmail.com:
 I notice in that list both Belarusian Wikipedias are listed just as
 Belarusian Wikipedia. It would be very informative to know which is which
 and to have visitor statistics on both :-)

 skype: node.ue


 On Fri, Jan 15, 2010 at 3:39 PM, Erik Zachte erikzac...@infodisiac.comwrote:

 Here is a QA on all issues raised:
 Q=question/R=Remark, A=answer

 I put the more general questions on top.

 Cheers, Erik Zachte

 --

 Q: Nikola Smolenski
 Is it first time these reports are published?

 A:
 Yes, expect trend report to grow by accretion over time.
 Other reports will be built from data for recent (6) months only

 --

 R: Andrew Gray
 Andrew explains why distribution of page requests over countries favors
 Spanish and Portuguese speaking countries:
 'Some Wikipedias - the ones which insist on only-free-images - do not use
 local uploads at all.'

 A:
 Thanks for explaining this unexpected distribution of page views on
 Commons,
 I had no idea.

 Spain           30.0%
 USA             29.2%
 Brazil  8.5%
 Argentina       4.8%
 Mexico  3.9%
 Germany 3.3%
 France  2.1%
 Venezuela       1.9%
 Chile           1.4%
 Costa Rica      1.4%
 Italy           1.4%
 Uruguay 1.2%
 Colombia        1.2%
 Portugal        1.1%

 --

 R: Mark Williamson

 Two main factors influencing choice of Wikipedia language:
 # Fluency of the Internet-using population of a country in English.
 # Quality of the native Wikipedia.

 A:
 Like you say. Many Scandinavians (and Dutch people I might add) probably
 switch between English and local content all the time.
 Personally I tend to look at English Wp first I many instances, because of
 obviously richer content and larger depth.

 --

 Q: Ziko van Dijk
 Why are 40 % of the visitors of ksh.WP (the dialect of Cologne) from Japan.
 Why are 25 % of the visitors of eu.WP (Basque) from Poland?

 Q: Andre Engels
 I think bots are a likely explanation in the eu case
 (unless Erik is using an algorithm that filters out bots)

 A:
 KSH used to be code for Kashmir. Still not Japan, but much closer than
 Cologne.
 Maybe Japanese mountaineers caused this spike ? (only half kidding)

 As for eu.wp: Would Polish presume there also is a European Wikipedia? Just
 a guess.

 I do filter bots

 --

 R: Teun Spaans
 For trends, I would expect a bar indicating upward or downward trend, not a
 percentage bar.

 A:
 We can have both, a notion of importance and of change: I might color code
 cells as I do already in e.g. [1]
 This way large fluctuations really stand out. Let's first collect more
 history.

 [1] http://stats.wikimedia.org/EN/TablesPageViewsMonthly.htm


 --

 Q: Nikola Smolenski
 Could we get this for other projects?

 A:
 This question is of course not unexpected.
 One consideration is we need a certain sample size to make numbers
 significant.
 For other projects, with far less traffic, few country/language pairs would
 be backed by sufficient data.
 See also below on extending the current reports with more table rows.

 --

 Q: Nikola Smolenski:
 Please include at Wikipedia Page Views Per Country - Overview [1] number of
 Internet users from [2], and number of views per Internet user?

 [1] http://tinyurl.com/yk43aq6
 [2] http://tinyurl.com/yfv5bwn

 A:
 Done

 --

 R: Nikola Smolenski
 It is obvious why Slovene Wikipedia is highly visited in Sierra Leone, and
 Serbian in Suriname; URLs do matter :)
 Although, I don't understand why so much. I would expect this distribution
 by visitors, perhaps, but not by visits.

 A:
 Very interesting observation! So people from Sierra Leone try
 'sl.wikipedia.org'.
 Why people from Surinam go to 'sr.wikimedia.org' is only slightly less
 obvious to me, but apparently is happens

 For countries with just a few hits in the sampled log the distinction
 between visitors and visits gets blurred.

 --

 R: Andre Engels
 Ukrainian is not a small language by any means, yet Wikipedia visitors tend
 to be drawn to the Russian Wikipedia instead.

 

Re: [Foundation-l] Where do our readers come from? QA

2010-01-16 Thread Mark Williamson
Sociolinguistic situations around the world are very complex I think. In
especially former European colonies, of which Kenya is but one example, the
language of the former colonial power often has a unique position in
society.

It is not surprising to me that the English Wikipedia is so popular compared
to any other in Kenya, but it is quite a bit more surprising that Korean,
Romanian, Bulgarian, Lithuanian, Iranian, etc. users prefer the English
Wikipedia.

Mark

On Sat, Jan 16, 2010 at 2:25 AM, Ziko van Dijk zvand...@googlemail.comwrote:

 Dear Erik,

 Maybe there is a dirty Polish word looked up by many Polish pupils,
 and when they Google it they come to eu.WP because a Basque word
 accidentally is alike? :-)

 I am looking now for the interest in the native / the English
 Wikipedia in specific countries. It might be important how localized
 the software in general is. If you live in, say, Kenya, and your
 computer has Windows in English, the Internet Explorer and everything
 is oriented to English, and you google your home town in an English
 language Google, it is probable that you will get the Wikipedia
 article in English and not in Swahili.

 Kind regards
 Ziko


 2010/1/16 Mark Williamson node...@gmail.com:
  I notice in that list both Belarusian Wikipedias are listed just as
  Belarusian Wikipedia. It would be very informative to know which is
 which
  and to have visitor statistics on both :-)
 
  skype: node.ue
 
 
  On Fri, Jan 15, 2010 at 3:39 PM, Erik Zachte erikzac...@infodisiac.com
 wrote:
 
  Here is a QA on all issues raised:
  Q=question/R=Remark, A=answer
 
  I put the more general questions on top.
 
  Cheers, Erik Zachte
 
  --
 
  Q: Nikola Smolenski
  Is it first time these reports are published?
 
  A:
  Yes, expect trend report to grow by accretion over time.
  Other reports will be built from data for recent (6) months only
 
  --
 
  R: Andrew Gray
  Andrew explains why distribution of page requests over countries favors
  Spanish and Portuguese speaking countries:
  'Some Wikipedias - the ones which insist on only-free-images - do not
 use
  local uploads at all.'
 
  A:
  Thanks for explaining this unexpected distribution of page views on
  Commons,
  I had no idea.
 
  Spain   30.0%
  USA 29.2%
  Brazil  8.5%
  Argentina   4.8%
  Mexico  3.9%
  Germany 3.3%
  France  2.1%
  Venezuela   1.9%
  Chile   1.4%
  Costa Rica  1.4%
  Italy   1.4%
  Uruguay 1.2%
  Colombia1.2%
  Portugal1.1%
 
  --
 
  R: Mark Williamson
 
  Two main factors influencing choice of Wikipedia language:
  # Fluency of the Internet-using population of a country in English.
  # Quality of the native Wikipedia.
 
  A:
  Like you say. Many Scandinavians (and Dutch people I might add) probably
  switch between English and local content all the time.
  Personally I tend to look at English Wp first I many instances, because
 of
  obviously richer content and larger depth.
 
  --
 
  Q: Ziko van Dijk
  Why are 40 % of the visitors of ksh.WP (the dialect of Cologne) from
 Japan.
  Why are 25 % of the visitors of eu.WP (Basque) from Poland?
 
  Q: Andre Engels
  I think bots are a likely explanation in the eu case
  (unless Erik is using an algorithm that filters out bots)
 
  A:
  KSH used to be code for Kashmir. Still not Japan, but much closer than
  Cologne.
  Maybe Japanese mountaineers caused this spike ? (only half kidding)
 
  As for eu.wp: Would Polish presume there also is a European Wikipedia?
 Just
  a guess.
 
  I do filter bots
 
  --
 
  R: Teun Spaans
  For trends, I would expect a bar indicating upward or downward trend,
 not a
  percentage bar.
 
  A:
  We can have both, a notion of importance and of change: I might color
 code
  cells as I do already in e.g. [1]
  This way large fluctuations really stand out. Let's first collect more
  history.
 
  [1] http://stats.wikimedia.org/EN/TablesPageViewsMonthly.htm
 
 
  --
 
  Q: Nikola Smolenski
  Could we get this for other projects?
 
  A:
  This question is of course not unexpected.
  One consideration is we need a certain sample size to make numbers
  significant.
  For other projects, with far less traffic, few country/language pairs
 would
  be backed by sufficient data.
  See also below on extending the current reports with more table rows.
 
  --
 
  Q: Nikola Smolenski:
  Please include at Wikipedia Page Views Per Country - Overview [1] number
 of
  Internet users from [2], and number of views per Internet user?
 
  [1] http://tinyurl.com/yk43aq6
  [2] http://tinyurl.com/yfv5bwn
 
  A:
  Done
 
  --
 
  R: Nikola Smolenski
  It is obvious why Slovene Wikipedia is 

Re: [Foundation-l] Statistics and chapters: searching for chapters

2010-01-16 Thread Tomasz Ganicz
2010/1/15 Milos Rancic mill...@gmail.com:
 Based on Erik's statistics [1] and Nikola's addition of Internet users
 [2] and the list of Wikimedia chapters [3], here is the first set of
 conclusions.


 * United Arab Emirates, Bulgaria, Uganda, Uzbekistan, Kazakhstan:

I have good contacts with Bulgarian wikipedians, but they decided not
to create the chapter. Recently they had problems with wikipedia.bg
domain However they use to meet from time to time informally  in
Sophia. As I rember, Jimbo was in Bulgaria last year, but he was
invited by a govermental organization not by local wikipedians.

Cheers,


-- 
Tomek Polimerek Ganicz
http://pl.wikimedia.org/wiki/User:Polimerek
http://www.ganicz.pl/poli/
http://www.ptchem.lodz.pl/en/TomaszGanicz.html

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Where do our readers come from? QA

2010-01-16 Thread Nikola Smolenski
Дана Saturday 16 January 2010 10:40:06 Mark Williamson написа:
 It is not surprising to me that the English Wikipedia is so popular
 compared to any other in Kenya, but it is quite a bit more surprising that
 Korean, Romanian, Bulgarian, Lithuanian, Iranian, etc. users prefer the
 English Wikipedia.

I don't think that they would prefer it, it's just that it covers much more 
topics, and generally covers the topics in much more depth.

I believe that I am fairly fluent in English, and yet I prefer to read Serbian 
Wikipedia, if I know that the topic is covered there and the article is 
better than the English one.

Next thing to do: Wikipedia Page Views By Country - Breakdown Adjusted by 
Wikipedia Size. Erik, are you planning to do this one as well? :D

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Where do our readers come from? QA

2010-01-16 Thread Milos Rancic
On Fri, Jan 15, 2010 at 11:39 PM, Erik Zachte erikzac...@infodisiac.com wrote:
 Q: Nikola Smolenski / Milos Rancic
 At Wikipedia Page Views By Country - Breakdown [1] and Wikipedia Page Views
 By Country - Trends [2] could you include more languages (ideally all
 languages)?
 Some of the numbers are going below 0.1% of population, but some of them are
 not mentioned even they are larger than 0.5% of population.

 [1] http://tinyurl.com/yhp3an7
 [2] http://tinyurl.com/yzga2hm

 A:
 Yes on some reports I do include smaller percentages for the largest
 Wikipedia's as those represent significant numbers of page views.
 I used different (and arbitrary) thresholds per report. The arbitrariness
 could change, but I want to plead for a notoriety threshold:

 Here is a much more extended version of the breakdown report [1] (for this
 discussion only)
 It shows per country up to 50 Wikipedia's
 An extra column shows the total number of records for this country/language
 (for the 6 month period) on which the percentage is based.
 As you can see for the smallest countries that number is so low that it is
 no longer significant.

 Let us say we cut off not at 1%, but at an (arbitrary) absolute threshold of
 x logged records per country/language pair (per row).
 Let us say we cut off at average 5 records per month. Everything below that
 threshold in the test report is in dark red.
 Personally I think this is still way too much detail for a general report.
 Not because of Kb's but information overload.

 [1] http://tinyurl.com/yjwoyre

Detailed statistics have two very important values:
* The first one is chapter-related. I want to know more details about
tendencies in Serbia, so I would be able: (1) to analyze what is going
on and what WM RS did; (2) to make a media event based on statistics.
* The other value is of general sociolinguistic value. I may trace up
to some extent where do speakers of some language live, what is the
percentage of internet adoption (actually, Wikipedia adoption); all of
that in comparison with, let's say, GDP, number of inhabitants and so
on.

It would be great if you put some periodic job which would create such
statistics at the end of every month. For example, I would really like
to know about the trends in the past 6 months.

I noticed in your quarterly report that share of Serbian language in
Serbia is raising. It is very important because it shows one (or both)
of two things: Serbian Wikipedia quality is raising or/and Internet
adoption among those who don't know English well enough is raising. If
number of visits to English Wikipedia is stable enough, it is about
the second; if number of visits is lower than previous, it is about
first; and so on.

Also, I would like to know is it seasonal: which numbers are about
tourists, and which are about general population behavior.

So, while such statistics are truly an information overload for
creation of a general report, they are very valuable for particular
reports.

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Where do our readers come from? QA

2010-01-16 Thread Nikola Smolenski
Дана Friday 15 January 2010 23:39:38 Erik Zachte написа:
 R: Nikola Smolenski
 It is obvious why Slovene Wikipedia is highly visited in Sierra Leone, and
 Serbian in Suriname; URLs do matter :)
 Although, I don't understand why so much. I would expect this distribution
 by visitors, perhaps, but not by visits.

 A:
 Very interesting observation! So people from Sierra Leone try
 'sl.wikipedia.org'.
 Why people from Surinam go to 'sr.wikimedia.org' is only slightly less
 obvious to me, but apparently is happens

ISO 3166-1 code for Surinam is 'sr'.

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Where do our readers come from? QA

2010-01-16 Thread Ronald Beelaard
I read all kind of confusions about funny correlations between language
versions and countries where visitors are coming from.

As I (privately) communicated with Erik, the following flaws are in the
current analysis:

* The country code AU is often used (by apnic in this case) as a placeholder
for ranges that are pre-reserved. For instance to allocate parts of that
very big range in bits and pieces to countries in the area (e.g. JP)
* Similarly Ripe is doing that for the country code EU (not to be confused
with the language code eu)

Other misinterpretations may occur because there are some conflicts between
country and language codes. An example of this is for instance SL (Sierra
Leone) and sl (Slovenian) and I guess UA (Ukraine) and uk (Ukrainian?) is a
similar case. But there are certainly more.
See also: http://meta.wikimedia.org/wiki/Language_codes/Conflicts, although
imo this list is not comprehensive.

Another cause of problems might be the fact that the assignments of IP
ranges continuously change. That happens on a small scale (e.g. re-assigning
a block of 65536 or much smaller), but also on a larger scale. The result is
that you can't fully trust a so-called geo-IP database (like MaxMind). I
don't know how quickly such a database is outdated, but have noticed major
shifts of ranges of more than 16 million addresses within half a year
(concerning the AU - JP confusion).
Structured lists do not exist, so the only way is continuously checking the
data in such a database via the Regional Internet Registries. That is a
complicated, but also a very time-consuming process.

So don't draw conclusions in the case of small countries and/or languages.

Rgds Ronald



___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Where do our readers come from? QA

2010-01-16 Thread Nikola Smolenski
Дана Friday 15 January 2010 23:39:38 Erik Zachte написа:
 Here is a much more extended version of the breakdown report [1] (for this
 discussion only)
 It shows per country up to 50 Wikipedia's
 An extra column shows the total number of records for this country/language
 (for the 6 month period) on which the percentage is based.

What exactly is this number of records? Thousands of visits?

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l