Hi Andrea,
On Fri, Jan 11, 2013 at 10:28 PM, Andrea Di Menna <ninn...@gmail.com> wrote:
> Hi Dimitris,
>
>
> 2013/1/10 Dimitris Kontokostas <jimk...@gmail.com>
>
>> Hi Andrea,
>>
>>
>> On Thu, Jan 10, 2013 at 7:08 PM, Andrea Di Menna <ninn...@gmail.com>wrote:
>>
>>> Hi,
>>>
>>> I have questions regarding group of infoboxes/templates which are
>>> designed to be used together to provide additional data for a resource.
>>>
>>> Specifically, I have been checking the Infobox "Template:Starbox begin"
>>> [1] (and the other templates in the same "group").
>>> This template should be used in conjuction with other templates to add
>>> properties to articles which refer to stars.
>>>
>>> If I check which pages link to that template I see that there are
>>> hundreds [2], while in the mappings page it seems to occur only in one
>>> article [3] [3b]
>>>
>>> One of the articles which actually uses that template in Wikipedia is
>>> Algol (dbpedia: [4], wikipedia: [5]).
>>> As you can see it used the "Template:Starbox begin" when the dbpedia
>>> extraction was run (from what I can see in the old revision).
>>> But on dbpedia, Algol is not using "Template:Starbox begin", moreover it
>>> has a dbpedia-owl:Writer rdf:type since there is a Infobox_writer in the
>>> article.
>>>
>>> My questions are:
>>> 1) are the "Template:Starbox begin" occurrences correct in the mappings
>>> statistics page?
>>>
>>
>> There is a minPropertyCount limit in the statistics to reduce general
>> purpose templates and it is set to '2'. Starbox_begin template has only
>> one label property and that is why it is miscalculated.
>> There must be one wrong instance with two properties defined that's why
>> you see only one.
>>
>>
>
> I think the wrong instance was http://dbpedia.org/resource/Kepler-33which was
> missing closing brackets in the Starbox_begin template [1].
> The typo has been corrected after the release of dbpedia 3.8 [2], in fact
> in live DBpedia there is no entity known to be using that template.
>
> My questions are then:
> 1) are the statistics calculated on dbpedia data or on wikipedia live data?
>
The statistics are generated per dbpedia release, so these numbers are from
~June 2012
> 2) are the templates with numOfProperties < minPropertyCount only hidden
> from the templates or also not processed during mappings extraction?
> Hiding them from the statistics makes sense to me, but if it is possible
> to create mappings for them I would like to understand how a DBpedia
> mapping contributor can be informed about the existence of such templates.
> I personally use the statistics page to decide what to work on first and
> in case of a missing template in that page, it would be difficult for me to
> get to know other possible candidates. Moreover, the Starbox_* templates
> are used in more than 2k articles, which could lead us to at least assign
> proper types to a big set of entities.
>
There are 2 infobox extractors in the framework and they are independent to
each other: the infoboxExtractor and the mappingsExtractor.
InfoboxExtractor generates triples in the dbprop name space but discards
some templates / properties according to [1], [2] to remove probably
unwanted triples
in line 131 [3] we do a variation of the output to generate the basis for
the statistics.
If we lower these restrictions the problem will get worse and the
statistics table will be filled with a big number of formatting (or
whatever) templates
unless you have any suggestions on this
As I said the mappings extractor is independent so whatever you map in the
mappings wiki will be mapped by the framework regardless of the statistics
[1]
http://dbpedia.hg.sourceforge.net/hgweb/dbpedia/extraction_framework/raw-file/49af1ec3b4b5/core/src/main/scala/org/dbpedia/extraction/mappings/InfoboxExtractor.scala
[2]
http://dbpedia.hg.sourceforge.net/hgweb/dbpedia/extraction_framework/raw-file/49af1ec3b4b5/core/src/main/scala/org/dbpedia/extraction/config/mappings/InfoboxExtractorConfig.scala
[3]
http://dbpedia.hg.sourceforge.net/hgweb/dbpedia/extraction_framework/file/49af1ec3b4b5/core/src/main/scala/org/dbpedia/extraction/mappings/InfoboxExtractor.scala#l131
>
>
>>
>>
> 2) what is the approach used by the dbpedia extraction framework in case
>>> different infoboxes are present in a wiki article?
>>>
>>
>> This is a known problem but the framework doesn't handle this type of
>> information.
>> Whenever there is a mapping the framework expects a class definition and
>> it either assigns it in the article resource or in a new "intermediate"
>> resource ( in the case of multiple defined mappings )
>>
>>
>>> 3) how to deal with group of templates which provide properties for the
>>> same resource?
>>>
>>
>> To solve this I think we should add a "noMapToClass" property in the
>> templateMapping and whenever the framework reads that definition it just
>> adds the mapping output directly to the main resource without rdf:type info
>> Would you like to help in this regard? We could of course help you and
>> provide you with repo access
>>
>>
>
> I agree with this approach and I would love to contribute. I just need a
> bit of time to get used to the dbpedia extraction framework code and also
> with Scala :P
>
take all the time you need :)
> Also, wouldn't it be possible to avoid creating a new intermediate
> resource when there are multiple templates which map to the same ontology
> class?
> If the resource has been already assigned a class, and another template in
> the article maps to the same class then you do not create another "fake"
> resource and add ontology properties to the original resource. Otherwise
> you create a new resource and add properties to that
> Does that make sense?
>
This approach looks good and it's probably easier too ;)
Best,
Dimitris
>
> Thanks
> Andrea
>
>
>>> Thanks
>>> Andrea
>>>
>>> [1] http://en.wikipedia.org/wiki/Template:Starbox_begin
>>> [2]
>>> http://en.wikipedia.org/wiki/Special:WhatLinksHere/Template:Starbox_begin
>>> [3] http://mappings.dbpedia.org/server/statistics/en/?show=100000
>>> [3b] http://dbpedia.org/resource/Kepler-33
>>> [4] http://dbpedia.org/page/Algol
>>> [5] http://en.wikipedia.org/wiki/Algol?oldid=495281281
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
>>> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
>>> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
>>> MVPs and experts. ON SALE this month only -- learn more at:
>>> http://p.sf.net/sfu/learnmore_122712
>>> _______________________________________________
>>> Dbpedia-discussion mailing list
>>> Dbpedia-discussion@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>>>
>>>
>>
>>
>> --
>> Kontokostas Dimitris
>>
>
>
> [1] http://en.wikipedia.org/wiki/Kepler-33?oldid=494925038
> [2] http://en.wikipedia.org/wiki/Kepler-33&oldid=498844429
>
>
--
Kontokostas Dimitris
------------------------------------------------------------------------------
Master HTML5, CSS3, ASP.NET, MVC, AJAX, Knockout.js, Web API and
much more. Get web development skills now with LearnDevNow -
350+ hours of step-by-step video tutorials by Microsoft MVPs and experts.
SALE $99.99 this month only -- learn more at:
http://p.sf.net/sfu/learnmore_122812
_______________________________________________
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion