Package: wnpp Severity: wishlist Owner: Ben Finney <bign...@debian.org>
* Package name : wikiextractor Version : 2.75 Upstream Author : Giuseppe Attardi <atta...@di.unipi.it> * URL : http://medialab.di.unipi.it/wiki/Wikipedia_Extractor * License : GPL-3 Programming Lang: Python Description : tool to extract plain text from a Wikipedia dump The Wikipedia maintainers provide, each month, an XML dump of all documents in the database: a single XML file containing the whole encyclopedia, that can be used for various kinds of analysis, such as statistics, service lists, etc. This Wikipedia extractor tool generates plain text from a Wikipedia database dump. It discards any other information or annotation present in Wikipedia pages, such as images, tables, references and lists. Some works use Wikipedia data as part of their complete source. This package will be useful for build chains that require processing that data as source. -- \ “I put instant coffee in a microwave oven and almost went back | `\ in time.” —Steven Wright | _o__) | Ben Finney <bign...@debian.org>
signature.asc
Description: PGP signature