Good idea! It probably wouldn't be hard to write a specific extractor for this. Maybe just a few dozen lines.
Only problem is, we may soon have dozens or hundreds of such specialized extractors. But we can deal with that. :-) If you want to write that extractor, we would be happy to include it in the extraction framework. Here are some instructions on how you can send a pull request on GitHub: https://github.com/dbpedia/extraction-framework/wiki/Contributing To keep things manageable and since this extractor is only applicable for the French Wikipedia edition, I would suggest you create a new package org.dbpedia.extraction.mappings.fr in extraction-framework/core/src/main/scala. Like many other extractors, this one doesn't really belong in the 'core' module, but the extraction framework is not yet very well modularized, so there's no better place. A minor addition: I guess we should change the syntax in the extraction config files: currently, all extractor class names that *do not contain a dot* are prefixed by "org.dbpedia.extraction.mappings.". Example: "AbstractExtractor" becomes "org.dbpedia.extraction.mappings.AbstractExtractor". If we change that rule and prefix all extractor class names that *start with a dot* by "org.dbpedia.extraction.mappings", then you could write ".fr.PopulationExtractor" in your extraction config file. With the current rule, you would have to write the whole class name "org.dbpedia.extraction.mappings.fr.PopulationExtractor". (Of course, with the new rule, we would have to add a dot to all extractor class names in all config files, but that's no big deal.) Cheers, JC On 21 April 2013 22:35, Julien Plu <julien....@redaction-developpez.com> wrote: > I thought to the same implementation than you Jona but a little bit > different. Here my steps : > > 1) Parse the XML file and retrieve all the data about these templates. For > example we see a tag "title" with this : > > Modèle:Données/Toulouse/évolution_population > > 2) Extract the last "an" and "pop" values > 3) Put in a file the triples : > <http://fr.dbpedia.org/resource/Toulouse> > <http://fr.dbpedia.org/property/population> number pop^^xsd:integer . > <http://fr.dbpedia.org/resource/Toulouse> > <http://fr.dbpedia.org/property/AnneePopulation> year^^xsd:date . > > And so on, for all these templates. What do you think ? > > I know it's not really generic but it's a good beginning to think after to a > generic solution. > > Best. > > Julien. > > > 2013/4/21 Jona Christopher Sahnwaldt <j...@sahnwaldt.de> >> >> Good question. Short answer: No, DBpedia can't handle these templates, >> and it's hard to change that. >> >> It would be nice to do it in a generic way: design a system that >> allows users of the mappings wiki to add rules how such templates >> should be handled in a certain lanuage. Write Scala code that executes >> these rules and parses the template definitions (e.g. >> Modèle:Données/Toulouse/évolution_population) to extract the data and >> store it in memory or in an temporary file. Then during the main >> extraction, when you find a template call like {{Dernière population >> commune de France}}, get the data from storage and generate the >> appropriate triples. >> >> A major effort. Related to >> http://wiki.dbpedia.org/gsoc2013/ideas/CrowdsourceTestsAndRules , but >> even bigger. >> >> Maybe it would be easier to extend DBpedia such that the framework can >> "execute" template definitions. >> >> Maybe all that is a waste of time because the data will soon move to >> Wikidata. We just don't know how soon: Three months? Three years? >> Never? >> >> JC >> >> On 21 April 2013 22:04, Julien Plu <julien....@redaction-developpez.com> >> wrote: >> > Thanks Jona for these precisions :-) >> > >> > Another thing, I would like to know if the extraction framework can use >> > the >> > "data templates". I mean some properties values (in french wikipedia for >> > french Settlement) are now replaced by templates, for example : >> > >> > population = {{Dernière population commune de France}} <!-- {{Last >> > population french Settlement}} --> >> > >> > And this data is contained in this kind of pattern : >> > >> > http://fr.wikipedia.fr/wiki/Modèle:Données/Nom de >> > l'article/évolution_population >> > >> > In english : >> > >> > Template:Data/article name/evolution_population >> > >> > By example : >> > >> > http://fr.wikipedia.org/wiki/Modèle:Données/Toulouse/évolution_population >> > >> > It's always the same address pattern. And these templates look like this >> > : >> > >> > <includeonly>{{#switch: {{{1|}}} >> > |an1=1793|pop1=52612 >> > |anX=year|popX=number >> > |an=last_year|pop=last_known_number}}</includeonly> >> > >> > These templates are in the XML dump. >> > >> > So it has been added in the extraction framework ? if no, what files I >> > have >> > to modify for including these kind of exceptions ? >> > >> > Best. >> > >> > Julien. >> > >> > >> > 2013/4/21 Jona Christopher Sahnwaldt <j...@sahnwaldt.de> >> >> >> >> On 21 April 2013 19:38, Julien Plu >> >> <julien....@redaction-developpez.com> >> >> wrote: >> >> > Hi, >> >> > >> >> > An idea of what I do wrongly? (see my previous mail below) >> >> > >> >> > Best. >> >> > >> >> > Julien. >> >> > >> >> > From: Julien Plu <julien....@redaction-developpez.com> >> >> > Date: 2013/4/20 >> >> > Subject: Problem with extracted data >> >> > To: "dbpedia-discussion@lists.sourceforge.net" >> >> > <dbpedia-discussion@lists.sourceforge.net> >> >> > >> >> > >> >> > Hi, >> >> > >> >> > After to have imported the extracted data into my virtuoso server I >> >> > could >> >> > see that I had some strange data. By example all my URI start with >> >> > "http://dbpedia.org" and not with "http://fr.dbpedia.org" and I don't >> >> > have >> >> > the "prop-fr" properties too, whereas I put "fr" in all the >> >> > extraction >> >> > properties file. >> >> > >> >> > I could see too, if I compare the data from the http://fr.dbpedia.org >> >> > and >> >> > mine they are not the same. By example if you compare these two >> >> > sparql >> >> > results : >> >> > >> >> > mine : >> >> > >> >> > >> >> > http://data.lirmm.fr:8890/sparql?default-graph-uri=&query=select+distinct+*+where+%7B%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FToulouse%3E+%3Fp+%3Fo%7D&should-sponge=&format=text%2Fhtml&timeout=0&debug=on >> >> > >> >> > fr.dbpedia.org : >> >> > >> >> > >> >> > http://fr.dbpedia.org/sparql?default-graph-uri=&query=select+distinct+*+where+%7B%3Chttp%3A%2F%2Ffr.dbpedia.org%2Fresource%2FToulouse%3E+%3Fp+%3Fo%7D&format=text%2Fhtml&timeout=0&debug=on >> >> > >> >> > In mine, I don't have the "http://www.w3.org/2002/07/owl#sameAs" or >> >> >> >> Do you mean the triples like http://www.w3.org/2002/07/owl#sameAs >> >> http://de.dbpedia.org/resource/Toulouse ? To get them, you would have >> >> to download Wikipedia dumps for several other languages, run >> >> InterlangueLinkExtractor on them, and then run >> >> >> >> >> >> https://github.com/dbpedia/extraction-framework/blob/master/scripts/src/main/scala/org/dbpedia/extraction/scripts/ProcessInterLanguageLinks.scala >> >> on all the result files. >> >> >> >> Or you could use the links in >> >> >> >> >> >> http://downloads.dbpedia.org/3.8/fr/interlanguage_links_same_as_chapters_fr.ttl.bz2 >> >> or a similar file. >> >> >> >> > "http://fr.dbpedia.org/property/population" properties among many >> >> > others. >> >> > >> >> > In attachment my extraction property file. >> >> > >> >> > What I did wrong ? >> >> > >> >> > Best. >> >> > >> >> > Julien. >> >> > >> >> > >> >> > >> >> > >> >> > ------------------------------------------------------------------------------ >> >> > Precog is a next-generation analytics platform capable of advanced >> >> > analytics on semi-structured data. The platform includes APIs for >> >> > building >> >> > apps and a phenomenal toolset for data science. Developers can use >> >> > our toolset for easy data analysis & visualization. Get a free >> >> > account! >> >> > http://www2.precog.com/precogplatform/slashdotnewsletter >> >> > _______________________________________________ >> >> > Dbpedia-discussion mailing list >> >> > Dbpedia-discussion@lists.sourceforge.net >> >> > https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion >> >> > >> > >> > > > ------------------------------------------------------------------------------ Precog is a next-generation analytics platform capable of advanced analytics on semi-structured data. The platform includes APIs for building apps and a phenomenal toolset for data science. Developers can use our toolset for easy data analysis & visualization. Get a free account! http://www2.precog.com/precogplatform/slashdotnewsletter _______________________________________________ Dbpedia-discussion mailing list Dbpedia-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion