Hi Marco, Wikipedia has been offering these files for years, but the quality of the abstracts is very low. I extracted 12 examples from the first few dozen lines of the file. See below. I'd say three are fine, another three are OK, the rest is useless. The abstracts produced by DBpedia are much better.
Regards, JC <title>Wikipedia: Anarchism</title> <abstract>Anarchism is a political philosophy that advocates stateless societies often defined as self-governed voluntary institutions,"ANARCHISM, a social philosophy that rejects authoritarian government and maintains that voluntary institutions are best suited to express man's natural social tendencies." George Woodcock.</abstract> <title>Wikipedia: Autism</title> <abstract>| ICD9 = 299.00</abstract> <title>Wikipedia: Albedo</title> <abstract>Albedo (), or reflection coefficient, derived from Latin albedo "whiteness" (or reflected sunlight) in turn from albus "white", is the diffuse reflectivity or reflecting power of a surface. It is the ratio of reflected radiation from the surface to incident radiation upon it.</abstract> <title>Wikipedia: A</title> <abstract>English articles}}</abstract> <title>Wikipedia: Alabama</title> <abstract>Elevation adjusted to North American Vertical Datum of 1988.</abstract> <title>Wikipedia: Achilles</title> <abstract>In Greek mythology, Achilles (; , Akhilleus, ) was a Greek hero of the Trojan War and the central character and greatest warrior of Homer's Iliad. His mother was the nymph Thetis, and his father, Peleus, was the king of the Myrmidons.</abstract> <title>Wikipedia: Abraham Lincoln</title> <abstract>border</abstract> <title>Wikipedia: Aristotle</title> <abstract>.The alabaster mantle is modern.</abstract> <title>Wikipedia: An American in Paris</title> <abstract>An American in Paris is a jazz-influenced symphonic poem by the American composer George Gershwin, written in 1928. Inspired by the time Gershwin had spent in Paris, it evokes the sights and energy of the French capital in the 1920s and is one of his best-known compositions.</abstract> <title>Wikipedia: Academy Award for Best Production Design</title> <abstract>The Academy Awards are the oldest awards ceremony for achievements in motion pictures. The Academy Award for Best Production Design recognizes achievement in art direction on a film.</abstract> <title>Wikipedia: Academy Awards</title> <abstract>}}</abstract> <title>Wikipedia: Actrius</title> <abstract>| runtime = 90 minutes</abstract> On Fri, Feb 13, 2015 at 4:24 PM, Marco Fossati <[email protected]> wrote: > Extracted page abstracts dumps directly from Wikimedia? > > For instance: > http://dumps.wikimedia.org/enwiki/20150112/enwiki-20150112-abstract.xml > > Does this mean no need for the (slightly painful) abstract extraction > procedure anymore? > -- > Marco Fossati > http://about.me/marco.fossati > Twitter: @hjfocs > Skype: hell_j > > ------------------------------------------------------------------------------ > Dive into the World of Parallel Programming. The Go Parallel Website, > sponsored by Intel and developed in partnership with Slashdot Media, is your > hub for all things parallel software development, from weekly thought > leadership blogs to news, videos, case studies, tutorials and more. Take a > look and join the conversation now. http://goparallel.sourceforge.net/ > _______________________________________________ > Dbpedia-developers mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/dbpedia-developers ------------------------------------------------------------------------------ Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ _______________________________________________ Dbpedia-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dbpedia-developers
