Hi Julien, Jonas, 

I just saw your discussion bout externalised templates. 
For information, the property prop-fr:population appears on 
http://fr.dbpedia.org because the template 
Données/Toulouse/évolution_population was not used when I did the last 
extraction. 

About the extractor you want to add, I am not sure I understood how you want to 
do. 
You will store data extracted from the templates pages and then insert them 
when you parse the article page ? 
So you need to run the extraction framework twice over the Wikipedia dump, the 
template page may appear after in the dump file. 

Wouldn't it be more generic to define some insert/delete SPARQL rules to handle 
this once the extraction process is over ? 
something like : 

> insert {?s ?p ?v} where {?s dbo:wikiPageUsesTemplate ?t . ?t ?p ?v}

then 

> delete {?t ?p ?v} where {?s dbo:wikiPageUsesTemplate ?t . ?t ?p ?v}
Cheers, 
Julien C. 

----- Mail original -----

> De: "Julien Plu" <julien....@redaction-developpez.com>
> À: "Jona Christopher Sahnwaldt" <j...@sahnwaldt.de>
> Cc: dbpedia-discussion@lists.sourceforge.net
> Envoyé: Lundi 22 Avril 2013 09:54:59
> Objet: Re: [Dbpedia-discussion] Problem with extracted data

> Ok, I will try to code this in a new package "fr" this week. I have
> just to see how to write an extractor and learning Scala :-D

> Best.

> Julien.

> 2013/4/22 Jona Christopher Sahnwaldt < j...@sahnwaldt.de >

> > Good idea! It probably wouldn't be hard to write a specific
> > extractor
> 
> > for this. Maybe just a few dozen lines.
> 

> > Only problem is, we may soon have dozens or hundreds of such
> 
> > specialized extractors. But we can deal with that. :-)
> 

> > If you want to write that extractor, we would be happy to include
> > it
> 
> > in the extraction framework. Here are some instructions on how you
> > can
> 
> > send a pull request on GitHub:
> 

> > https://github.com/dbpedia/extraction-framework/wiki/Contributing
> 

> > To keep things manageable and since this extractor is only
> > applicable
> 
> > for the French Wikipedia edition, I would suggest you create a new
> 
> > package org.dbpedia.extraction.mappings.fr in
> 
> > extraction-framework/core/src/main/scala. Like many other
> > extractors,
> 
> > this one doesn't really belong in the 'core' module, but the
> 
> > extraction framework is not yet very well modularized, so there's
> > no
> 
> > better place.
> 

> > A minor addition: I guess we should change the syntax in the
> 
> > extraction config files: currently, all extractor class names that
> > *do
> 
> > not contain a dot* are prefixed by
> > "org.dbpedia.extraction.mappings.".
> 
> > Example: "AbstractExtractor" becomes
> 
> > "org.dbpedia.extraction.mappings.AbstractExtractor". If we change
> > that
> 
> > rule and prefix all extractor class names that *start with a dot*
> > by
> 
> > "org.dbpedia.extraction.mappings", then you could write
> 
> > ".fr.PopulationExtractor" in your extraction config file. With the
> 
> > current rule, you would have to write the whole class name
> 
> > "org.dbpedia.extraction.mappings.fr.PopulationExtractor". (Of
> > course,
> 
> > with the new rule, we would have to add a dot to all extractor
> > class
> 
> > names in all config files, but that's no big deal.)
> 

> > Cheers,
> 
> > JC
> 

> > On 21 April 2013 22:35, Julien Plu <
> > julien....@redaction-developpez.com > wrote:
> 
> > > I thought to the same implementation than you Jona but a little
> > > bit
> 
> > > different. Here my steps :
> 
> > >
> 
> > > 1) Parse the XML file and retrieve all the data about these
> > > templates. For
> 
> > > example we see a tag "title" with this :
> 
> > >
> 
> > > Modèle:Données/Toulouse/évolution_population
> 
> > >
> 
> > > 2) Extract the last "an" and "pop" values
> 
> > > 3) Put in a file the triples :
> 
> > > < http://fr.dbpedia.org/resource/Toulouse >
> 
> > > < http://fr.dbpedia.org/property/population > number
> > > pop^^xsd:integer .
> 
> > > < http://fr.dbpedia.org/resource/Toulouse >
> 
> > > < http://fr.dbpedia.org/property/AnneePopulation > year^^xsd:date
> > > .
> 
> > >
> 
> > > And so on, for all these templates. What do you think ?
> 
> > >
> 
> > > I know it's not really generic but it's a good beginning to think
> > > after to a
> 
> > > generic solution.
> 
> > >
> 
> > > Best.
> 
> > >
> 
> > > Julien.
> 
> > >
> 
> > >
> 
> > > 2013/4/21 Jona Christopher Sahnwaldt < j...@sahnwaldt.de >
> 
> > >>
> 
> > >> Good question. Short answer: No, DBpedia can't handle these
> > >> templates,
> 
> > >> and it's hard to change that.
> 
> > >>
> 
> > >> It would be nice to do it in a generic way: design a system that
> 
> > >> allows users of the mappings wiki to add rules how such
> > >> templates
> 
> > >> should be handled in a certain lanuage. Write Scala code that
> > >> executes
> 
> > >> these rules and parses the template definitions (e.g.
> 
> > >> Modèle:Données/Toulouse/évolution_population) to extract the
> > >> data
> > >> and
> 
> > >> store it in memory or in an temporary file. Then during the main
> 
> > >> extraction, when you find a template call like {{Dernière
> > >> population
> 
> > >> commune de France}}, get the data from storage and generate the
> 
> > >> appropriate triples.
> 
> > >>
> 
> > >> A major effort. Related to
> 
> > >> http://wiki.dbpedia.org/gsoc2013/ideas/CrowdsourceTestsAndRules
> > >> ,
> > >> but
> 
> > >> even bigger.
> 
> > >>
> 
> > >> Maybe it would be easier to extend DBpedia such that the
> > >> framework
> > >> can
> 
> > >> "execute" template definitions.
> 
> > >>
> 
> > >> Maybe all that is a waste of time because the data will soon
> > >> move
> > >> to
> 
> > >> Wikidata. We just don't know how soon: Three months? Three
> > >> years?
> 
> > >> Never?
> 
> > >>
> 
> > >> JC
> 
> > >>
> 
> > >> On 21 April 2013 22:04, Julien Plu <
> > >> julien....@redaction-developpez.com >
> 
> > >> wrote:
> 
> > >> > Thanks Jona for these precisions :-)
> 
> > >> >
> 
> > >> > Another thing, I would like to know if the extraction
> > >> > framework
> > >> > can use
> 
> > >> > the
> 
> > >> > "data templates". I mean some properties values (in french
> > >> > wikipedia for
> 
> > >> > french Settlement) are now replaced by templates, for example
> > >> > :
> 
> > >> >
> 
> > >> > population = {{Dernière population commune de France}} <!--
> > >> > {{Last
> 
> > >> > population french Settlement}} -->
> 
> > >> >
> 
> > >> > And this data is contained in this kind of pattern :
> 
> > >> >
> 
> > >> > http://fr.wikipedia.fr/wiki/Modèle:Données/Nom de
> 
> > >> > l'article/évolution_population
> 
> > >> >
> 
> > >> > In english :
> 
> > >> >
> 
> > >> > Template:Data/article name/evolution_population
> 
> > >> >
> 
> > >> > By example :
> 
> > >> >
> 
> > >> > http://fr.wikipedia.org/wiki/Modèle:Données/Toulouse/évolution_population
> 
> > >> >
> 
> > >> > It's always the same address pattern. And these templates look
> > >> > like this
> 
> > >> > :
> 
> > >> >
> 
> > >> > <includeonly>{{#switch: {{{1|}}}
> 
> > >> > |an1=1793|pop1=52612
> 
> > >> > |anX=year|popX=number
> 
> > >> > |an=last_year|pop=last_known_number}}</includeonly>
> 
> > >> >
> 
> > >> > These templates are in the XML dump.
> 
> > >> >
> 
> > >> > So it has been added in the extraction framework ? if no, what
> > >> > files I
> 
> > >> > have
> 
> > >> > to modify for including these kind of exceptions ?
> 
> > >> >
> 
> > >> > Best.
> 
> > >> >
> 
> > >> > Julien.
> 
> > >> >
> 
> > >> >
> 
> > >> > 2013/4/21 Jona Christopher Sahnwaldt < j...@sahnwaldt.de >
> 
> > >> >>
> 
> > >> >> On 21 April 2013 19:38, Julien Plu
> 
> > >> >> < julien....@redaction-developpez.com >
> 
> > >> >> wrote:
> 
> > >> >> > Hi,
> 
> > >> >> >
> 
> > >> >> > An idea of what I do wrongly? (see my previous mail below)
> 
> > >> >> >
> 
> > >> >> > Best.
> 
> > >> >> >
> 
> > >> >> > Julien.
> 
> > >> >> >
> 
> > >> >> > From: Julien Plu < julien....@redaction-developpez.com >
> 
> > >> >> > Date: 2013/4/20
> 
> > >> >> > Subject: Problem with extracted data
> 
> > >> >> > To: " dbpedia-discussion@lists.sourceforge.net "
> 
> > >> >> > < dbpedia-discussion@lists.sourceforge.net >
> 
> > >> >> >
> 
> > >> >> >
> 
> > >> >> > Hi,
> 
> > >> >> >
> 
> > >> >> > After to have imported the extracted data into my virtuoso
> > >> >> > server I
> 
> > >> >> > could
> 
> > >> >> > see that I had some strange data. By example all my URI
> > >> >> > start
> > >> >> > with
> 
> > >> >> > " http://dbpedia.org " and not with " http://fr.dbpedia.org
> > >> >> > "
> > >> >> > and I don't
> 
> > >> >> > have
> 
> > >> >> > the "prop-fr" properties too, whereas I put "fr" in all the
> 
> > >> >> > extraction
> 
> > >> >> > properties file.
> 
> > >> >> >
> 
> > >> >> > I could see too, if I compare the data from the
> > >> >> > http://fr.dbpedia.org
> 
> > >> >> > and
> 
> > >> >> > mine they are not the same. By example if you compare these
> > >> >> > two
> 
> > >> >> > sparql
> 
> > >> >> > results :
> 
> > >> >> >
> 
> > >> >> > mine :
> 
> > >> >> >
> 
> > >> >> >
> 
> > >> >> > http://data.lirmm.fr:8890/sparql?default-graph-uri=&query=select+distinct+*+where+%7B%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FToulouse%3E+%3Fp+%3Fo%7D&should-sponge=&format=text%2Fhtml&timeout=0&debug=on
> 
> > >> >> >
> 
> > >> >> > fr.dbpedia.org :
> 
> > >> >> >
> 
> > >> >> >
> 
> > >> >> > http://fr.dbpedia.org/sparql?default-graph-uri=&query=select+distinct+*+where+%7B%3Chttp%3A%2F%2Ffr.dbpedia.org%2Fresource%2FToulouse%3E+%3Fp+%3Fo%7D&format=text%2Fhtml&timeout=0&debug=on
> 
> > >> >> >
> 
> > >> >> > In mine, I don't have the "
> > >> >> > http://www.w3.org/2002/07/owl#sameAs " or
> 
> > >> >>
> 
> > >> >> Do you mean the triples like
> > >> >> http://www.w3.org/2002/07/owl#sameAs
> 
> > >> >> http://de.dbpedia.org/resource/Toulouse ? To get them, you
> > >> >> would have
> 
> > >> >> to download Wikipedia dumps for several other languages, run
> 
> > >> >> InterlangueLinkExtractor on them, and then run
> 
> > >> >>
> 
> > >> >>
> 
> > >> >> https://github.com/dbpedia/extraction-framework/blob/master/scripts/src/main/scala/org/dbpedia/extraction/scripts/ProcessInterLanguageLinks.scala
> 
> > >> >> on all the result files.
> 
> > >> >>
> 
> > >> >> Or you could use the links in
> 
> > >> >>
> 
> > >> >>
> 
> > >> >> http://downloads.dbpedia.org/3.8/fr/interlanguage_links_same_as_chapters_fr.ttl.bz2
> 
> > >> >> or a similar file.
> 
> > >> >>
> 
> > >> >> > " http://fr.dbpedia.org/property/population " properties
> > >> >> > among many
> 
> > >> >> > others.
> 
> > >> >> >
> 
> > >> >> > In attachment my extraction property file.
> 
> > >> >> >
> 
> > >> >> > What I did wrong ?
> 
> > >> >> >
> 
> > >> >> > Best.
> 
> > >> >> >
> 
> > >> >> > Julien.
> 
> > >> >> >
> 
> > >> >> >
> 
> > >> >> >
> 
> > >> >> >
> 
> > >> >> > ------------------------------------------------------------------------------
> 
> > >> >> > Precog is a next-generation analytics platform capable of
> > >> >> > advanced
> 
> > >> >> > analytics on semi-structured data. The platform includes
> > >> >> > APIs
> > >> >> > for
> 
> > >> >> > building
> 
> > >> >> > apps and a phenomenal toolset for data science. Developers
> > >> >> > can use
> 
> > >> >> > our toolset for easy data analysis & visualization. Get a
> > >> >> > free
> 
> > >> >> > account!
> 
> > >> >> > http://www2.precog.com/precogplatform/slashdotnewsletter
> 
> > >> >> > _______________________________________________
> 
> > >> >> > Dbpedia-discussion mailing list
> 
> > >> >> > Dbpedia-discussion@lists.sourceforge.net
> 
> > >> >> > https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
> 
> > >> >> >
> 
> > >> >
> 
> > >> >
> 
> > >
> 
> > >
> 

> ------------------------------------------------------------------------------
> Precog is a next-generation analytics platform capable of advanced
> analytics on semi-structured data. The platform includes APIs for
> building
> apps and a phenomenal toolset for data science. Developers can use
> our toolset for easy data analysis & visualization. Get a free
> account!
> http://www2.precog.com/precogplatform/slashdotnewsletter
> _______________________________________________
> Dbpedia-discussion mailing list
> Dbpedia-discussion@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to