On 13 February 2012 13:01, Gregor Trefs <gtr...@rumms.uni-mannheim.de> wrote:
> Hi DBPedia-Community,
>
> I'm currently writing my Master-Thesis in the field of DBPedia and SPARQL.
> One of my subgoals is to find out how many categories are present in both
> Wikipedia and DBPedia. Therefore, I wrote a little tool which identifies all
> categories having at least one resource in the unspecific mapping based part
> of DBPedia (If I refer to DBPedia in this mail, I usually mean this part of
> DBPedia not the whole one.). It searches the file
> mapping_based_properties_en.nt and looks whether or not the object and
> subject of each statement is linked to a category in the file
> article_categories_en.nt. If there is a link, the tool considers the
> corresponding category to be 'present' in DBPedia.
>
> On the other hand, the same tool searches the page_links_en.nt file to find
> all categories of Wikipedia. That is, all triples which relate a resource to
> a category or (if present at all) a category to any object. According to the
> description of the 'Page Links Extractor' it 'Extracts internal links
> between DBpedia instances from the internal pagelinks between Wikipedia
> articles.'. As Wikipedia pages normally link to their categories, I assumed
> that these links are also included and, thus, all categories in Wikipedia
> are captured.
>

Categories can also be added by templates.

> Unfourtnately, this is only true for almost all categories. I found 127
> categories which are present in DBPedia but not in Wikipedia, compared to
> 59099 categories present in Wikipedia and not in DBPedia. This is strange,
> as the set of DBPedia categories must be a subset of Wikipedia categories.
> Otherwise, some magic added some new categories during extraction and I
> doubt that.

As Yury said, it's more likely that those articles have changed since
the last extraction.

> I made sure, it was not my fault and had a look on the data. One
> of the suddenly appeared categories is
> http://dbpedia.org/resource/Category:Alaska_elections,_1996. On the
> DBPediasian side, there is a triple
> (<http://dbpedia.org/resource/United_States_Senate_election_in_Alaska,_1996>
> <http://purl.org/dc/terms/subject>
> <http://dbpedia.org/resource/Category:Alaska_elections,_1996> .) which
> relates this category to the United states Senate election in Alaska in
> 1996. The resource itself is subject of two statements in
> mapping_based_properties_en.nt. On the Wikipediasian side,

As an aside, I originally thought that you were talking about some
Asia-specific version of Wikipedia, and now put it down to some sort
of interlanguage effect. If it's the latter, adjectives formed from
English nouns ending -a typically have the ending -an (Wikipedia ->
Wikipedian), but it's generally preferable, especially with proper
nouns, to just use the noun as a modifier ('On the Wikipedia side').

> I did not find
> any triple in page_links_en.nt which contained the category. But I did find
> the United states senate election in Alaska in 1996 resource. The
> corresponding Wikipedia page also includes a link to the category. It is
> present since page creation.

page_links is meant to capture _normal_ wiki links found in the body
of the text, article_categories is specifically for categories.

$ bzgrep 
'http://dbpedia.org/resource/United_States_Senate_election_in_Alaska,_1996'
article_categories_en.nt.bz2 |grep 'http://purl.org/dc/terms/subject'
<http://dbpedia.org/resource/United_States_Senate_election_in_Alaska,_1996>
<http://purl.org/dc/terms/subject>
<http://dbpedia.org/resource/Category:United_States_Senate_elections,_1996>
.
<http://dbpedia.org/resource/United_States_Senate_election_in_Alaska,_1996>
<http://purl.org/dc/terms/subject>
<http://dbpedia.org/resource/Category:United_States_Senate_elections_in_Alaska>
.
<http://dbpedia.org/resource/United_States_Senate_election_in_Alaska,_1996>
<http://purl.org/dc/terms/subject>
<http://dbpedia.org/resource/Category:Alaska_elections,_1996> .

I believe these are the triples you're looking for.

If you find yourself wondering if you're looking in the right file,
bear in mind that you can always use the website:

$ curl 
http://dbpedia.org/data/United_States_Senate_election_in_Alaska,_1996.ntriples|grep
'http://purl.org/dc/terms/subject'
<http://dbpedia.org/resource/United_States_Senate_election_in_Alaska,_1996>     
<http://purl.org/dc/terms/subject>      
<http://dbpedia.org/resource/Category:United_States_Senate_elections_in_Alaska>
.
<http://dbpedia.org/resource/United_States_Senate_election_in_Alaska,_1996>     
<http://purl.org/dc/terms/subject>      
<http://dbpedia.org/resource/Category:Alaska_elections,_1996>
.
<http://dbpedia.org/resource/United_States_Senate_election_in_Alaska,_1996>     
<http://purl.org/dc/terms/subject>      
<http://dbpedia.org/resource/Category:United_States_Senate_elections,_1996>
.


-- 
<Sefam> Are any of the mentors around?
<jimregan> yes, they're the ones trolling you

------------------------------------------------------------------------------
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
_______________________________________________
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to