Hi Jo,

This is a good interdisciplinary task ;)

About the extraction script, DBpedia now uses a predefined folder structure
for locating dumps / extracting data and follows the wIkipedia dumps
structure [1].

There are two options here
1) Spotlight adapts the configuration to accommodate that
2) DBpedia makes the dump easier to run with arbitrary mediawiki dumps and
output folders.

Maybe (1) is a lot easier but I'd vote for (2). ;)
For (2) what we need is to create 2 new scripts for download / extract that
will be based on [2] & [3].
Once we have a volunteer we can discuss this in detail

Cheers,
Dimitris


[1] http://dumps.wikimedia.org/
[2]
https://github.com/dbpedia/extraction-framework/blob/master/dump/src/main/scala/org/dbpedia/extraction/dump/extract/Extraction.scala
[3]
https://github.com/dbpedia/extraction-framework/blob/master/dump/src/main/scala/org/dbpedia/extraction/dump/download/Download.scala


On Tue, Apr 16, 2013 at 1:29 PM, Joachim Daiber <daiber.joac...@gmail.com>wrote:

> Hey all,
>
> I added this task to the Spotlight ideas, it's smallish, so it's maybe
> more of a warm-up task:
>
> ----
>
> For creating Spotlight models, we need instance_types.nt, redirects.nt and
> disambiguations.nt. Since we want these to be from the same Wikipedia dump
> as the one from which we create the model, integrate the DBpedia extraction
> into the index_db.sh script in DBpedia Spotlight, so that the files are
> automatically produced during indexing.
>
> ----
>
> Maybe somebody who knows DEF better than I could comment on how
> complicated this would be to do. We have the Wikipedia dump and we need
> redirects, disambiguation pages and instance types for this version of the
> dump.
>
> Best,
> Jo
>
>
> ------------------------------------------------------------------------------
> Precog is a next-generation analytics platform capable of advanced
> analytics on semi-structured data. The platform includes APIs for building
> apps and a phenomenal toolset for data science. Developers can use
> our toolset for easy data analysis & visualization. Get a free account!
> http://www2.precog.com/precogplatform/slashdotnewsletter
> _______________________________________________
> Dbpedia-gsoc mailing list
> Dbpedia-gsoc@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc
>
>


-- 
Kontokostas Dimitris
------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________
Dbpedia-gsoc mailing list
Dbpedia-gsoc@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc

Reply via email to