Re: [Gendergap] Sex Ratios in Wikidata Part III

2014-06-10 Thread Andrew Gray
On 9 June 2014 23:34, Lennart Guldbrandsson l_guldbrands...@hotmail.com wrote:
 Some language versions of Wikipedia do have gender categorization, such as
 Swedish and German Wikipedia. (The English categories exist but are not used
 very much.) Here's a link to the Swedish ones:

 https://sv.wikipedia.org/wiki/Kategori:M%C3%A4n (men)
 presently 132 211 articles

 https://sv.wikipedia.org/wiki/Kategori:Kvinnor (women)
 presently 32 693 articles

 This gives a rough proportion of 1 female for every 4 male. article subject.
 If my memory serves me, the German Wikipedia numbers are a bit higher
 (perhaps 1 in 6).

 The categorization was on Swedish Wikipedia a conscious decision to try and
 find out where we stood.

Thanks - I knew about the German categories but not the Swedish ones.

Interestingly, Wikidata reports:

32661 female on svwiki:
http://tools.wmflabs.org/wikidata-todo/autolist.html?q=claim%5B31%3A5%5D%20and%20claim%5B21%3A6581072%5D%20and%20link%5Bsvwiki%5D

130801 male on svwiki:
http://tools.wmflabs.org/wikidata-todo/autolist.html?q=claim%5B31%3A5%5D%20and%20claim%5B21%3A6581097%5D%20and%20link%5Bsvwiki%5D

Wikidata gives 20% female, the Wikipedia categories give 21%, but
they're in reasonably good alignment - almost perfectly matching for
women, and about 1500 men not in Wikidata. I'll have a look at getting
these mapped across tonight :-)

-- 
- Andrew Gray
  andrew.g...@dunelm.org.uk

___
Gendergap mailing list
Gendergap@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/gendergap


Re: [Gendergap] Sex Ratios in Wikidata Part III

2014-06-09 Thread Andrew Gray
Hi all,

I ran a few quick updates on Max's numbers today. As of 9/6/14:

* WIkidata has ~2080k items marked as people
* Of these, ~1893k have a gender property (91%)

(Magnus's games are doing an amazing job at filling out these numbers,
by the way - http://magnusmanske.de/wordpress/?p=213 )

Very quick and dirty statistics follow - note that since we have 9%
undefined, the stats may change a bit as time goes on :-)

* The gender breakdown across all these people is approximately 1603k
male, 290k female - 84.7% male and 15.3% female.

* enwiki is 15.5% female; arwiki 14.2%; dewiki 14.9% female; frwiki
15.2%; eswiki 15.9%; jawiki 18.2%; hiwiki 18.7%; zhwiki 20.1%

* It's interesting to note that these numbers mostly seem a point or
two better than the numbers Max got a month ago, which probably
represents better data-logging rather than change in the underlying
content

* There are still very few items with a gender property other than
male or female - perhaps 100-200 overall - but I suspect this
number will significantly increase as we deal with the remaining
items.

Andrew.

On 22 May 2014 18:16, Maximilian Klein isa...@gmail.com wrote:
 Hi Everyone,

 I just conducted some new research I though you might be intrigued by.

 It compares the sex or gender labels in use by Wikidata today - 13 in
 total.
 The percentage of articles about females by language.

 The best are Serbian Wikipedia, or Urdu Wikipedia, depending on the size you
 count.

 The Wiki's that have become most sexist in 2014 - English Wikpedia.
 And the Data Richness per sex value. - 6.2 Wikidata Statement per male, 6.0
 per female.


 See the full blog here, and please ask me questions and suggestions -

 http://notconfusing.com/sex-ratios-in-wikidata-part-iii/

 Max Klein
 ‽ http://notconfusing.com/

 ___
 Gendergap mailing list
 Gendergap@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/gendergap




-- 
- Andrew Gray
  andrew.g...@dunelm.org.uk

___
Gendergap mailing list
Gendergap@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/gendergap


Re: [Gendergap] Sex Ratios in Wikidata Part III

2014-06-09 Thread Nathan
On Mon, Jun 9, 2014 at 3:17 PM, Andrew Gray andrew.g...@dunelm.org.uk
wrote:

 Hi all,

 I ran a few quick updates on Max's numbers today. As of 9/6/14:

 * WIkidata has ~2080k items marked as people
 * Of these, ~1893k have a gender property (91%)

 (Magnus's games are doing an amazing job at filling out these numbers,
 by the way - http://magnusmanske.de/wordpress/?p=213 )

 Very quick and dirty statistics follow - note that since we have 9%
 undefined, the stats may change a bit as time goes on :-)

 * The gender breakdown across all these people is approximately 1603k
 male, 290k female - 84.7% male and 15.3% female.

 * enwiki is 15.5% female; arwiki 14.2%; dewiki 14.9% female; frwiki
 15.2%; eswiki 15.9%; jawiki 18.2%; hiwiki 18.7%; zhwiki 20.1%

 * It's interesting to note that these numbers mostly seem a point or
 two better than the numbers Max got a month ago, which probably
 represents better data-logging rather than change in the underlying
 content

 * There are still very few items with a gender property other than
 male or female - perhaps 100-200 overall - but I suspect this
 number will significantly increase as we deal with the remaining
 items.

 Andrew.


Can you define item in this context?

Do we have any comparable data points by which to evaluate our progress?
Perhaps a similar breakdown of other reference works, or if there is some
sort of summary data available about biographies written (using LOC data?),
etc.
___
Gendergap mailing list
Gendergap@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/gendergap


Re: [Gendergap] Sex Ratios in Wikidata Part III

2014-06-09 Thread Andrew Gray
On 9 June 2014 20:21, Nathan nawr...@gmail.com wrote:

 * WIkidata has ~2080k items marked as people
 * Of these, ~1893k have a gender property (91%)

 Can you define item in this context?

Item here is a single Wikidata entry:

http://www.wikidata.org/wiki/Q320

which may correspond to one Wikipedia article, one hundred Wikipedia
articles, etc - but all on the same topic. (Potentially it may
correspond to *no* Wikipedia articles - it's not strictly required,
and in any case the source article may be deleted - but there's
unlikely to be a statistically large number of these just now)

 Do we have any comparable data points by which to evaluate our progress?
 Perhaps a similar breakdown of other reference works, or if there is some
 sort of summary data available about biographies written (using LOC data?),
 etc.

The new Oxford Dictionary of National Biography was about 10% female
when published in 2004, though this was skewed by a limitation to
include all entries from the original, including a lot of - to modern
eyes - very non-notable men.
http://oed.hertford.ox.ac.uk/main/images/stories/articles/baigent2005.pdf
(It's since crept up to ~11%)

Max has done some numbers based on gender assigned in VIAF entries, I
think, but I can't immediately find it. Ben Schmidt did something
similar based on first names of authors:
http://sappingattention.blogspot.co.uk/2012/05/women-in-libraries.html

-- 
- Andrew Gray
  andrew.g...@dunelm.org.uk

___
Gendergap mailing list
Gendergap@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/gendergap


Re: [Gendergap] Sex Ratios in Wikidata Part III

2014-06-09 Thread Lennart Guldbrandsson
Some language versions of Wikipedia do have gender categorization, such as 
Swedish and German Wikipedia. (The English categories exist but are not used 
very much.) Here's a link to the Swedish ones:

https://sv.wikipedia.org/wiki/Kategori:M%C3%A4n (men)
presently 132 211 articles

https://sv.wikipedia.org/wiki/Kategori:Kvinnor (women)
presently 32 693 articles

This gives a rough proportion of 1 female for every 4 male. article subject. If 
my memory serves me, the German Wikipedia numbers are a bit higher (perhaps 1 
in 6). 

The categorization was on Swedish Wikipedia a conscious decision to try and 
find out where we stood.


Best wishes,

Lennart Guldbrandsson

070 - 207 80 05
http://www.elementx.se - arbete
http://www.mrchapel.wordpress.com - personlig blogg


Presentation
@aliasHannibal - på Twitter

Tänk dig en värld där varje människa på den här planeten får fri tillgång till 
världens samlade kunskap. Det är vårt mål.


Jimmy Wales

 From: andrew.g...@dunelm.org.uk
 Date: Mon, 9 Jun 2014 20:44:17 +0100
 To: gendergap@lists.wikimedia.org
 Subject: Re: [Gendergap] Sex Ratios in Wikidata Part III
 
 On 9 June 2014 20:21, Nathan nawr...@gmail.com wrote:
 
  * WIkidata has ~2080k items marked as people
  * Of these, ~1893k have a gender property (91%)
 
  Can you define item in this context?
 
 Item here is a single Wikidata entry:
 
 http://www.wikidata.org/wiki/Q320
 
 which may correspond to one Wikipedia article, one hundred Wikipedia
 articles, etc - but all on the same topic. (Potentially it may
 correspond to *no* Wikipedia articles - it's not strictly required,
 and in any case the source article may be deleted - but there's
 unlikely to be a statistically large number of these just now)
 
  Do we have any comparable data points by which to evaluate our progress?
  Perhaps a similar breakdown of other reference works, or if there is some
  sort of summary data available about biographies written (using LOC data?),
  etc.
 
 The new Oxford Dictionary of National Biography was about 10% female
 when published in 2004, though this was skewed by a limitation to
 include all entries from the original, including a lot of - to modern
 eyes - very non-notable men.
 http://oed.hertford.ox.ac.uk/main/images/stories/articles/baigent2005.pdf
 (It's since crept up to ~11%)
 
 Max has done some numbers based on gender assigned in VIAF entries, I
 think, but I can't immediately find it. Ben Schmidt did something
 similar based on first names of authors:
 http://sappingattention.blogspot.co.uk/2012/05/women-in-libraries.html
 
 -- 
 - Andrew Gray
   andrew.g...@dunelm.org.uk
 
 ___
 Gendergap mailing list
 Gendergap@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/gendergap
  ___
Gendergap mailing list
Gendergap@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/gendergap