Jeffrey Barish wrote:
> I am writing a PyGTK application.  I would like to be able to download text 
> only (with formatting) from Wikipedia and display it in my application.  I 
> think that I am close to a solution, but I have reached an impasse due to my 
> ignorance of most of the mediawiki API.
> 
> My plan has been to use GtkMozembed in my application to render the page, so 
> I 
> need to retrieve html.  What is close to working is to use the index.php API 
> with action=render and title=<search string for the Wikipedia page>.  The 
> data that I retrieve does display in my browser, but it has the following 
> undesired characteristics:
> 

> 2. There are sections at the end that I don't want (Further reading, External 
> links, Notes, See also, References).
Those sections are part of the content. The API doesn't have any
parameter to include/exclude them.

> 1. All images appear (I want none).
Same issue. Although it's easier to replace, remove /<img.*?>/

> 3. Some characters are not rendered correctly (e.g., IPA: [ˈvɔlfgaŋ 
> amaˈdeus ˈmoːtsart]).

You're showing the text as windows-1252, but it is UTF-8.

_______________________________________________
Mediawiki-api mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api

Reply via email to