Re: [Dbpedia-discussion] ImageExtractor issue

Dimitris Kontokostas Mon, 19 Mar 2012 04:04:48 -0700

Hi Pablo, Amit,

Although I didn't write the image extractor, I think that this is more a
matter of semantics than technical and it was left this way intentionally.
The first picture is usually the most representative of the article and
thus we use foaf:depiction. Other pictures might not be about the article
itself, but for other closely related articles i.e. [1,2,3] so it might not
be the best approach to extract them all.



>    1. Wiki Downloader : We have two components.
>    - Full Downloader: A basic bash script which poll the latest folder of
>       wikipedia dumps. Check if a new dumps is available and downloads it to a
>       dated folder.
>       - Incremental Downloader: It includes  an IRC bot which keeps
>       listening to wikipedia IRC channel. It makes a list of files which were
>       updated. It De-dups  and downloads those pages every few hours while
>       respecting the wikipedia QPS.
>       2. Def Wrapper: A bash script which invokes the DEF on the data
>    generated by the downloader.
>
>
We do have the download.scala for wikipedia dumps, but the DIEF Wrapper
would be great!
One script to invoke them all ;)

Anyway, I guess you have a reason to do it this way, but the DBpedia live
approach could make more sense for a continuous integration.

Cheers,
Dimitris

[1] http://en.wikipedia.org/wiki/Anton_Bacalba%C5%9Fa
[2] http://en.wikipedia.org/wiki/Branch_House
[3] http://en.wikipedia.org/wiki/Coptic_Orthodox_Church_of_Alexandria

------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure

_______________________________________________
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Re: [Dbpedia-discussion] ImageExtractor issue

Reply via email to