Jeffrey Barish wrote: > I am writing a PyGTK application. I would like to be able to download text > only (with formatting) from Wikipedia and display it in my application. I > think that I am close to a solution, but I have reached an impasse due to my > ignorance of most of the mediawiki API. > > My plan has been to use GtkMozembed in my application to render the page, so > I > need to retrieve html. What is close to working is to use the index.php API > with action=render and title=<search string for the Wikipedia page>. The > data that I retrieve does display in my browser, but it has the following > undesired characteristics: >
> 2. There are sections at the end that I don't want (Further reading, External > links, Notes, See also, References). Those sections are part of the content. The API doesn't have any parameter to include/exclude them. > 1. All images appear (I want none). Same issue. Although it's easier to replace, remove /<img.*?>/ > 3. Some characters are not rendered correctly (e.g., IPA: [ˈvÉ”lfgaÅ‹ > amaˈdeus ˈmoËtsart]). You're showing the text as windows-1252, but it is UTF-8. _______________________________________________ Mediawiki-api mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
