In addition to Adrea's reply, we also collect the "red links" which means
links to pages that do not exist (yet).


On Tue, Dec 3, 2013 at 11:32 AM, Andrea Di Menna <ninn...@gmail.com> wrote:

> Hi Dario,
>
> the dataset you are using is extracted by
> the org.dbpedia.extraction.mappings.PageLinksExtractor [1].
> This extractor collects internal wiki links [2] from Wikipedia content
> articles (that is, wikipedia pages which belong to the Main namespace [3])
> to other wikipedia pages (please note I am not talking about content
> articles here, because also links to pages in the File or Category
> namespaces are collected).
>
> Each row - triple <subject> <predicate> <object> - in the Pagelinks
> represent a directed link between two pages, e.g.
>
>
> <http://dbpedia.org/resource/Albedo> 
> <http://dbpedia.org/ontology/wikiPageWikiLink> 
> <http://dbpedia.org/resource/Latin> .
>
> means that an internal link to http://en.wikipedia.org/wiki/Latin was found 
> in http://en.wikipedia.org/wiki/Albedo.
>
> You can check this link exists here (first sentence) [6]
>
> Basically this can be modeled in a directed graph as an edge "Albedo -> Latin"
>
>
> The reason why you have 17M instances (I suppose you are counting the
> nodes in your graph) is because objects in each triple can be outside the
> Main namespace.
> As far as I remember, 4M articles are wiki pages with belong to the Main
> namespace and which are neither redirects [4] nor disambiguation pages [5].
>
> Hope this clarifies a bit :-)
>
> Cheers
> Andrea
>
> [1]
> https://github.com/dbpedia/extraction-framework/blob/master/core/src/main/scala/org/dbpedia/extraction/mappings/PageLinksExtractor.scala
> [2] https://en.wikipedia.org/wiki/Help:Link
> [3] https://en.wikipedia.org/wiki/Wikipedia:Main_namespace
> [4] https://en.wikipedia.org/wiki/Wikipedia:Redirect
> [5] https://en.wikipedia.org/wiki/Wikipedia:Disambiguation
> [6] https://en.wikipedia.org/wiki/Albedo
>
>
>
>
> 2013/12/2 Dario Garcia Gasulla <dar...@lsi.upc.edu>
>
>>  Hi,
>>
>> I'm Dario Garcia-Gasulla, an AI researcher at Barcelona Tech (UPC).
>>
>> I'm currently doing research on very large directed graphs and I am using
>> one of your datasets for testing. Concretly, I am using the "Wikipedia
>> Pagelinks" dataset as available in the DBpedia web site.
>>
>> Unfortunately the description of the dataset is not very detailed:
>>  Wikipedia Pagelinks  *Dataset containing internal links between DBpedia
>> instances. The dataset was created from the internal links between
>> Wikipedia articles. The dataset might be useful for structural analysis,
>> data mining or for ranking DBpedia instances using Page Rank or similar
>> algorithms.*
>>
>> I wonder if you could give me more information on how the dataset was
>> built and what composes it.
>> I understand Wikipedia has 4M articles and 31M pages, while this dataset
>> has 17M instances and 130M links (couldn't find the number of links of
>> Wikipedia).
>>
>> What's the relation between both? Could someone briefly explain the
>> nature of the Pagelinks dataset and the differences with the Wikipedia?
>>
>> Thank you for your time,
>> Dario.
>>
>>
>> ------------------------------------------------------------------------------
>> Rapidly troubleshoot problems before they affect your business. Most IT
>> organizations don't have a clear picture of how application performance
>> affects their revenue. With AppDynamics, you get 100% visibility into your
>> Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics
>> Pro!
>>
>> http://pubads.g.doubleclick.net/gampad/clk?id=84349351&iu=/4140/ostg.clktrk
>> _______________________________________________
>> Dbpedia-discussion mailing list
>> Dbpedia-discussion@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>>
>>
>
>
> ------------------------------------------------------------------------------
> Rapidly troubleshoot problems before they affect your business. Most IT
> organizations don't have a clear picture of how application performance
> affects their revenue. With AppDynamics, you get 100% visibility into your
> Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics
> Pro!
> http://pubads.g.doubleclick.net/gampad/clk?id=84349351&iu=/4140/ostg.clktrk
> _______________________________________________
> Dbpedia-discussion mailing list
> Dbpedia-discussion@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>
>


-- 
Kontokostas Dimitris
------------------------------------------------------------------------------
Rapidly troubleshoot problems before they affect your business. Most IT 
organizations don't have a clear picture of how application performance 
affects their revenue. With AppDynamics, you get 100% visibility into your 
Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349351&iu=/4140/ostg.clktrk
_______________________________________________
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to