Hi all,

we also wrote bash scripts that download the latest wikipedia dumps
[1][2] and import them into a database [3]. I wasn't around when we
switched from bash to Scala, but I guess it was because we wanted code
that can also run on Windows.

Regards,
JC

[1] 
http://dbpedia.svn.sourceforge.net/viewvc/dbpedia/mwdumper/download.sh?content-type=text%2Fplain
[2] 
http://dbpedia.svn.sourceforge.net/viewvc/dbpedia/mwdumper/download-all.sh?content-type=text%2Fplain
[3] http://dbpedia.svn.sourceforge.net/viewvc/dbpedia/mwdumper/

On Mon, Mar 19, 2012 at 11:45, Amit Kumar <amitk...@yahoo-inc.com> wrote:
> Hi Pablo,
> For the continuous extraction we are trying to setup a pipeline, which polls
> and downloads the Wikipedia data, passes it through DEF(Dbpedia Extraction
> Framework) and then create knowledgebases. Many of the plumbing is handled
> by Yahoo! Internal tools and platform but there are some pieces which might
> be useful for the Dbpedia community. I’m mentioning some below. Let me know
> if you think you can use anyone. if yes, I would contact our Open Source
> Working Group Manager to take it forward.
>
> Wiki Downloader : We have two components.
>
> Full Downloader: A basic bash script which poll the latest folder of
> wikipedia dumps. Check if a new dumps is available and downloads it to a
> dated folder.
> Incremental Downloader: It includes  an IRC bot which keeps listening to
> wikipedia IRC channel. It makes a list of files which were updated. It
> De-dups  and downloads those pages every few hours while respecting the
> wikipedia QPS.
>
> Def Wrapper: A bash script which invokes the DEF on the data generated by
> the downloader.
>
>
> Both these have some basic notifications and error handling. There are some
> stuff after DEF, but they are quite internal to Yahoo!.
>
> I think you already have a download.scala which downloads the dbpedia dumps.
> There were few mails in the last week about the same. If you are facing some
> particular issue in particular with DBpedia Portuguese, do let me know. If
> we have faced the same, we would let you know.
>
> Regards
> Amit
>
>
>
> On 3/19/12 3:45 PM, "Pablo Mendes" <pablomen...@gmail.com> wrote:
>
> Hi Amit,
>
>>"We have been trying to setup an instance of dbpedia to continously extract
>> data from wikipedia dumps/updates. While"
>
> We would like to do the same for the DBpedia Portuguese. If you can share
> any code, it would be much appreciated.
>
> Cheers
> Pablo
>
> On Mar 19, 2012 10:38 AM, "Amit Kumar" <amitk...@yahoo-inc.com> wrote:
>
> Hi,
> We have been trying to setup an instance of dbpedia to continously extract
> data from wikipedia dumps/updates. While going through the output we
> observed that the image extractor was only picking up the first image for
> any page.
>
> I can see  commented out code present in the ImageExtractor which seems to
> pick all images. In place of that we  have the code which returns on the
> first image it encounters. My questions are :
>
> Does the commented out code actually works ? Does it really pick all the
> images on a particular page?
> Why was the change made in the code ?
>
>
>
> Thanks and Regards
> Amit
>
>
> ------------------------------------------------------------------------------
> This SF email is sponsosred by:
> Try Windows Azure free for 90 days Click Here
> http://p.sf.net/sfu/sfd2d-msazure
> _______________________________________________
> Dbpedia-discussion mailing list
> Dbpedia-discussion@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>
>
>
> ------------------------------------------------------------------------------
> This SF email is sponsosred by:
> Try Windows Azure free for 90 days Click Here
> http://p.sf.net/sfu/sfd2d-msazure
> _______________________________________________
> Dbpedia-discussion mailing list
> Dbpedia-discussion@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>

------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure
_______________________________________________
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to