Dear Pywikipedia team I have pushed a few of my coding projects using pywikipedia (the compat version) to github and I thought that some of you might be interested in the code. I had some time recently to clean up the code and bring it into a (hopefully) useable format and I would be willing to make further adjustments if you think the code would be more useful to you if I changed a few things.
Ultimately my hope would be that the code will find its home in the pywikipedia repository. Also some of the code that I wrote might be duplicated and already present in the core, if so I would apologize and you can happily ignore it. == Template parser == https://github.com/hroest/pywikibot-compat/tree/feature/template_parser For one bot project on the German Wikipedia I had to parse rather complex templates and replace specific fields. The templates would contain nested templates, math formulas and references inside. I thus wrote a template parser which would parse these templates and return them as key-value pairs which would make it easy to query specific keys and replace their values. The code worked well on several thousand templates of the German chemistry project and should be rather straightforward to use. This is library code, so there is no bot associated with it, see templateparser.py and tests/test_templateparser.py In order to correctly handle nesting and properly differentiate equal signs belonging to key-value pairs from those in mathematical formulas etc, I also had to write a partial wikimedia syntax parser which would recognize such syntax in wikitext. This code is in textrange_parser.py and allows to extract specific parts of a text (e.g. wikitables, templates, wikilinks, weblinks), tests are in tests/test_textrange_parser.py == Spellchecking == https://github.com/hroest/pywikibot-compat/tree/feature/spellcheck I added two new spellchecking bots, one based on hunspell which is the same spellchecker that also libreoffice uses (spellcheck_hunspell.py) and another one based on a negative list (spellcheck_blacklist.py). They run from the commandline, both parse the given wiki text, skip text ranges that usually do not only contain human-readable text (templates, tables etc) and check each word against a spellchecking engine (again, either a simple blacklist or a full-blown spellchecker that has stemming and morphological analysis like hunspell). These spellcheckers may turn out to be useful since the understand part of the Wiki markup and know which parts of a text to spellcheck and which parts not. The wrong words can be processed interactively and each word can be confirmed individually and then sent to Wikipedia to be corrected. I have a bot with which I do this semi-automatically and I have so far corrected 3000+ spelling mistakes on the German Wikipedia https://de.wikipedia.org/wiki/Spezial:Beitr%C3%A4ge/HRoestTypo For large scale processing, one can process a complete Wikipeda XML dump and for small-scale processing one can use the Wikipedia web-search functionality to search for articles with a specific spelling error and then only process these pages. == Review edits == https://github.com/hroest/pywikibot-compat/tree/feature/review_pages In the German Wikipedia, there is considerable work done reviewing individual edits and marking them as reviewed. In the above feature/review_pages branch there is a script called review_pages which allows to perform reviews of revisions semi-automatically. It fetches for a given page the revision history up to the last reviewed change and displays the changes between the current and the last reviewed version of the article on the command line. The user can then interactively decide to accept the review, undo the change or go to the next unreviewed change. For this bot, a mediawiki APIs are used and thus it may not actually be suitable for the compat version of pywikipedia. Reviewing, undoing and retrieving full version histories are done through the APIs and can be performed fully asynchronous. This allows relatively fast interactive response while the bot in the background fetches the revision histories and performs the review/undo actions requested by the user. == Summary == I provide this code in the hope that it is useful for people and if somebody thinks that the described functionality could be provided from the pywikibot project, I would be willing to work to make necessary adjustments for the code to be merged. Best regards Hannes _______________________________________________ Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l