There are papers on “list” extraction from text and tables…
Techniques usually involve:
* Natural Language Processing to identify patterns characteristics of lists
in unstructured text
* Wrapper Induction to extract similar items from consistently designed
tables or web pages, across web pages
* Large-scale statistical mining based on content redundancy across
tables/pages
Also, note that DBpedia had some initiative around table/list extraction in the
past:
See http://mappings.dbpedia.org/index.php/How_to_edit_DBpedia_Mappings
—
Nicolas Torzec
Yahoo Labs.
From: Heiko Paulheim
<he...@informatik.uni-mannheim.de<mailto:he...@informatik.uni-mannheim.de>>
Date: Tuesday, March 4, 2014 at 3:22 AM
To: Bernard Vatant
<bernard.vat...@mondeca.com<mailto:bernard.vat...@mondeca.com>>
Cc: DBpedia Discussions
<dbpedia-discussion@lists.sourceforge.net<mailto:dbpedia-discussion@lists.sourceforge.net>>
Subject: Re: [Dbpedia-discussion] Wikipedia lists in DBpedia?
Hi Bernard,
concerning the extraction of knowledge from list pages, which is not a
straight-forward problem as it may seem, we had a paper last year at the
DBpedia&NLP workshop [1].
As far as up-to-date-ness is concerned: the current build of DBpedia is based
on a dump from May 2013. However, mappings to YAGO may still be older, as they
are not extracted freshly from Wikipedia when DBpedia is built, but from the
YAGO version which is up to date at the time of the extraction [2].
Hope that helps,
Heiko
[1] http://ceur-ws.org/Vol-1064/Paulheim_Extending_DBpedia.pdf
[2] http://wiki.dbpedia.org/Downloads39#yago-links
Am 04.03.2014 10:52, schrieb Bernard Vatant:
Hello all
There are a lot of "List of ..." in Wikipedia, and I thought they were
somehow translated in DBpedia, but unless I miss something, I don't see
anything equivalent to e.g.,
http://en.wikipedia.org/wiki/List_of_current_heads_of_state_and_government
This list is pretty much up-to-date, including e.g., the recent changes in
Ukraine.
The closer equivalent I could find is
http://dbpedia.org/class/yago/CurrentNationalLeaders
... but I wonder what "Current" means here, since Nicolas Sarkozy is still
among the instances but not François Hollande (the latter replaced the
former in May 2012 for those who missed the event) Even if one does not
expect real-time data, this is quite a long delay for updating ...
Thanks for any clue on this.
------------------------------------------------------------------------------
Subversion Kills Productivity. Get off Subversion & Make the Move to Perforce.
With Perforce, you get hassle-free workflows. Merge that actually works.
Faster operations. Version large binaries. Built-in WAN optimization and the
freedom to use Git, Perforce or both. Make the move to Perforce.
http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.clktrk
_______________________________________________
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net<mailto:Dbpedia-discussion@lists.sourceforge.net>https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
--
Dr. Heiko Paulheim
Research Group Data and Web Science
University of Mannheim
Phone: +49 621 181 2646
B6, 26, Room C1.08
D-68159 Mannheim
Mail: he...@informatik.uni-mannheim.de<mailto:he...@informatik.uni-mannheim.de>
Web: www.heikopaulheim.com<http://www.heikopaulheim.com>
------------------------------------------------------------------------------
Subversion Kills Productivity. Get off Subversion & Make the Move to Perforce.
With Perforce, you get hassle-free workflows. Merge that actually works.
Faster operations. Version large binaries. Built-in WAN optimization and the
freedom to use Git, Perforce or both. Make the move to Perforce.
http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.clktrk
_______________________________________________
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion