Hi Pablo, Amit,
Although I didn't write the image extractor, I think that this is more a
matter of semantics than technical and it was left this way intentionally.
The first picture is usually the most representative of the article and
thus we use foaf:depiction. Other pictures might not be about the article
itself, but for other closely related articles i.e. [1,2,3] so it might not
be the best approach to extract them all.
> 1. Wiki Downloader : We have two components.
> - Full Downloader: A basic bash script which poll the latest folder of
> wikipedia dumps. Check if a new dumps is available and downloads it to a
> dated folder.
> - Incremental Downloader: It includes an IRC bot which keeps
> listening to wikipedia IRC channel. It makes a list of files which were
> updated. It De-dups and downloads those pages every few hours while
> respecting the wikipedia QPS.
> 2. Def Wrapper: A bash script which invokes the DEF on the data
> generated by the downloader.
>
>
We do have the download.scala for wikipedia dumps, but the DIEF Wrapper
would be great!
One script to invoke them all ;)
Anyway, I guess you have a reason to do it this way, but the DBpedia live
approach could make more sense for a continuous integration.
Cheers,
Dimitris
[1] http://en.wikipedia.org/wiki/Anton_Bacalba%C5%9Fa
[2] http://en.wikipedia.org/wiki/Branch_House
[3] http://en.wikipedia.org/wiki/Coptic_Orthodox_Church_of_Alexandria
------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here
http://p.sf.net/sfu/sfd2d-msazure
_______________________________________________
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion