Hi all, we also wrote bash scripts that download the latest wikipedia dumps [1][2] and import them into a database [3]. I wasn't around when we switched from bash to Scala, but I guess it was because we wanted code that can also run on Windows.
Regards, JC [1] http://dbpedia.svn.sourceforge.net/viewvc/dbpedia/mwdumper/download.sh?content-type=text%2Fplain [2] http://dbpedia.svn.sourceforge.net/viewvc/dbpedia/mwdumper/download-all.sh?content-type=text%2Fplain [3] http://dbpedia.svn.sourceforge.net/viewvc/dbpedia/mwdumper/ On Mon, Mar 19, 2012 at 11:45, Amit Kumar <amitk...@yahoo-inc.com> wrote: > Hi Pablo, > For the continuous extraction we are trying to setup a pipeline, which polls > and downloads the Wikipedia data, passes it through DEF(Dbpedia Extraction > Framework) and then create knowledgebases. Many of the plumbing is handled > by Yahoo! Internal tools and platform but there are some pieces which might > be useful for the Dbpedia community. I’m mentioning some below. Let me know > if you think you can use anyone. if yes, I would contact our Open Source > Working Group Manager to take it forward. > > Wiki Downloader : We have two components. > > Full Downloader: A basic bash script which poll the latest folder of > wikipedia dumps. Check if a new dumps is available and downloads it to a > dated folder. > Incremental Downloader: It includes an IRC bot which keeps listening to > wikipedia IRC channel. It makes a list of files which were updated. It > De-dups and downloads those pages every few hours while respecting the > wikipedia QPS. > > Def Wrapper: A bash script which invokes the DEF on the data generated by > the downloader. > > > Both these have some basic notifications and error handling. There are some > stuff after DEF, but they are quite internal to Yahoo!. > > I think you already have a download.scala which downloads the dbpedia dumps. > There were few mails in the last week about the same. If you are facing some > particular issue in particular with DBpedia Portuguese, do let me know. If > we have faced the same, we would let you know. > > Regards > Amit > > > > On 3/19/12 3:45 PM, "Pablo Mendes" <pablomen...@gmail.com> wrote: > > Hi Amit, > >>"We have been trying to setup an instance of dbpedia to continously extract >> data from wikipedia dumps/updates. While" > > We would like to do the same for the DBpedia Portuguese. If you can share > any code, it would be much appreciated. > > Cheers > Pablo > > On Mar 19, 2012 10:38 AM, "Amit Kumar" <amitk...@yahoo-inc.com> wrote: > > Hi, > We have been trying to setup an instance of dbpedia to continously extract > data from wikipedia dumps/updates. While going through the output we > observed that the image extractor was only picking up the first image for > any page. > > I can see commented out code present in the ImageExtractor which seems to > pick all images. In place of that we have the code which returns on the > first image it encounters. My questions are : > > Does the commented out code actually works ? Does it really pick all the > images on a particular page? > Why was the change made in the code ? > > > > Thanks and Regards > Amit > > > ------------------------------------------------------------------------------ > This SF email is sponsosred by: > Try Windows Azure free for 90 days Click Here > http://p.sf.net/sfu/sfd2d-msazure > _______________________________________________ > Dbpedia-discussion mailing list > Dbpedia-discussion@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion > > > > ------------------------------------------------------------------------------ > This SF email is sponsosred by: > Try Windows Azure free for 90 days Click Here > http://p.sf.net/sfu/sfd2d-msazure > _______________________________________________ > Dbpedia-discussion mailing list > Dbpedia-discussion@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion > ------------------------------------------------------------------------------ This SF email is sponsosred by: Try Windows Azure free for 90 days Click Here http://p.sf.net/sfu/sfd2d-msazure _______________________________________________ Dbpedia-discussion mailing list Dbpedia-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion