On 04/12/2015 03:38 PM, Emilio J. Rodríguez-Posada wrote:
> 2015-04-12 21:18 GMT+02:00 WereSpielChequers
> <werespielchequ...@gmail.com <mailto:werespielchequ...@gmail.com>>:
> 
>     Firstly looking at gender ratios of deleted and undeleted bios to
>     see if there is an overall gender skew.
> 
> I share here this page of deleted and recreated pages
> https://en.wikipedia.org/wiki/User:Emijrp/Deletionism/2011 just in case
> someone wants to explore that.

I have python code that's pretty good at guessing the gender of
biographical subjects, but originally it scraped HTML given a list of
names. If someone had some code for retrieving the wikitext and
determining that it is a biography (neither of which would be hard) it
would be very easy to determine. Here's some pseudocode:

```
#!/usr/bin/python2.7

def is_bio(article):
    '''TRUE if article is not '{{hsdis}}' and has '{{infobox person}}'''

for title in titles:
    males = females = unknowns = 0
    if title_exists:
        article = get_wiki(title)
        if is_bio(article):
            gender = guess_gender(article)
            print('%s: %s' %(title, gender))
            if gender = male:
                males += 1
            else gender = female
                females += 1
            else:
                unknowns += 1
print('males = %s; females = %s; unknowns = %s'
    %(males, females , unknowns))
```


_______________________________________________
Gendergap mailing list
Gendergap@lists.wikimedia.org
To manage your subscription preferences, including unsubscribing, please visit:
https://lists.wikimedia.org/mailman/listinfo/gendergap

Reply via email to