Dear Kasun,
It's a very useful project, congratulation.
I just want to know if it's possible to leverage your method for a local set of
documents (not all leaf categories)?
Suppose that I have a set of text documents and I want to find the
relatedness/similarity between them using abstraction levels in the category
network.
In this situation, I think, you need a further items to rank parent categories
according to the initial leaf categories or modify the concept of prominent
nodes to encompass more leaf categories. Please take a look at the following
paper:
http://www.medelyan.com/files/medelyan-focused-taxonomies-eswc2013.pdf?attredirects=0
Also, you can see some example of local taxonomies:
https://sites.google.com/site/focusedtaxonomies/home
It's completely related to my request. Please let me know if it's possible to
leverage your method for a local set of documents (not all leaf categories)?
Kind regards,
Amir
________________________________
From: kasun perera <kkasunper...@gmail.com>
To: Paul Houle <ontolo...@gmail.com>
Cc: Amir H. Jadidinejad <amir.jad...@yahoo.com>;
"dbpedia-discussion@lists.sourceforge.net"
<dbpedia-discussion@lists.sourceforge.net>
Sent: Friday, December 20, 2013 12:19 PM
Subject: Re: [Dbpedia-discussion] How to build a meaningful Taxonomy from
Wikipedia Categories?
Hi Amir
We have done some work related to Wikipedia category processing as the
GSOC-2013 project.
We used Wikipedia leaf categories as the starting point. Leaf category is a
Wikipedia category page that there is no links to any other category page/s.
Next we have defined the concept called “Prominent Node”.
We use following 3 factors to define a prominent node
1) The initial candidates for the prominent nodes were the parents of leaf
categories. We have used Wikipedia database dumps as our main data source,
specifically the tables “category”, “categorylinks”, “page ” and
“Interlanguage” .
2) Then we find the ones that head of the category name is a plural word (e.g.
Naturalized citizens of the United States:- pre-modifier {Naturalized}, head
{citizens} and post-modifier {of the United States}
3) Then we get the number of interlanguage links for each prominent candidate
category and defined that a prominent node at least it should have 3
interlanguage links.
Then we did some clustering based on identified prominent category names and
identified the concept that each prominent node belongs.
So we have produced following type of Wikipedia hierarchy
Concepts > Prominent nodes > Leaf nodes
Please look at following links [1] ,[2] for more details. If you are looking
for this kind of work i'm happy share my experience with you.
[1] https://github.com/dbpedia/extraction-framework/wiki/GSOC2013_Progress_Kasun
[2]
http://blog.dbpedia.org/2013/11/29/making-sense-out-of-the-wikipedia-categories-gsoc2013/
Thanks
On Thu, Dec 19, 2013 at 8:22 PM, Paul Houle <ontolo...@gmail.com> wrote:
The strength of the Wikipedia categories is that there are a lot of
>them and a lot of statements matching instances to categories.
>
>The weakness of categories is that they are completely disorganized.
>
>There are two good strategies for using the categories.
>
>One of them is to treat them abstractly and use them as inputs for
>numerical algorithms. For instance, you can use algorithms such as
>Kleinberg's Hubs and Authorities where categories are treated as hubs
>and instances are treated as authorities. Similarly you can create
>similarity scores based on the categories shared between items.
>
>I've used wikipedia categories to create my own well-defined
>categories such as "things related to New York City" or "obscene
>things" or "things related to skiing" In all of these categories you
>have things that are easy to ontologize, such as ski areas, and
>other things such as
>
>http://en.wikipedia.org/wiki/Ski_manufacturing_techniques
>
>that are not easy to ontologize. Generally I've made these by doing
>waves of expansion and contraction, traversing the graph and adding
>inclusion and exclusion rules. In the past with half-baked tools I've
>been able to create good categories of 10,000 or so members in a day
>or so. With good tools it ought to be possible to work faster.
>
>On Thu, Dec 19, 2013 at 4:45 AM, Amir H. Jadidinejad
><amir.jad...@yahoo.com> wrote:
>> Hi,
>>
>> I’m trying to leverage Wikipedia Category Network for a semantic processing
>> application. A set of Wikipedia articles are extracted from the document and
>> I want to build a meaningful hierarchical taxonomy using Wikipedia
>> categories. In my experiments, I found that the original category network of
>> Wikipedia is really messy. For example, when some articles are mentioned in
>> a document, it leads to the whole category network!
>>
>> I haven’t use DBpedia before; I just really interested to know, if I
>> leverage DBpedia, is it possible to have a meaningful taxonomy of categories
>> with hyponym relations?
>>
>>
>> ------------------------------------------------------------------------------
>> Rapidly troubleshoot problems before they affect your business. Most IT
>> organizations don't have a clear picture of how application performance
>> affects their revenue. With AppDynamics, you get 100% visibility into your
>> Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics
>> Pro!
>> http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
>> _______________________________________________
>> Dbpedia-discussion mailing list
>> Dbpedia-discussion@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>>
>
>
>
>--
>Paul Houle
>Expert on Freebase, DBpedia, Hadoop and RDF
>(607) 539 6254 paul.houle on Skype ontol...@gmail.com
>
>
>------------------------------------------------------------------------------
>Rapidly troubleshoot problems before they affect your business. Most IT
>organizations don't have a clear picture of how application performance
>affects their revenue. With AppDynamics, you get 100% visibility into your
>Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
>http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
>_______________________________________________
>Dbpedia-discussion mailing list
>Dbpedia-discussion@lists.sourceforge.net
>https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>
--
Regards
Kasun Perera
------------------------------------------------------------------------------
Rapidly troubleshoot problems before they affect your business. Most IT
organizations don't have a clear picture of how application performance
affects their revenue. With AppDynamics, you get 100% visibility into your
Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
_______________________________________________
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion