Yup, I see that wordnet has also been "ported" to a lucene index, and hence 
pulling the hyponyms works great.

tks

Paul




________________________________
From: Tommy Chheng <[email protected]>
To: [email protected]
Sent: Tuesday, 23 June, 2009 23:19:25
Subject: Re: mahout PLSI (with some lucene, thrown in)

Have you looked at WordNet to get the hypohyms?

Tommy

On Jun 23, 2009, at 3:09 PM, Paul Jones wrote:

> Okay, have seen the difficulty (apart from the maths :-)).
> 
> I guess "similar" can mean many things, i.e hypohyms, but also words such as 
> hot...cold are also "related", hence to solve my little problem I am 
> wondering if there is a easier way, i.e to use things like existing hyponyms 
> relations which exist (wordnet and the like) , and/or if they do not then I 
> guess using something similar to a "google distance measure" may help in 
> "adding" new words to the system....
> 
> Paul
> 
> 
> 
> 
> ________________________________
> From: Ted Dunning <[email protected]>
> To: [email protected]
> Sent: Tuesday, 23 June, 2009 18:00:12
> Subject: Re: mahout PLSI (with some lucene, thrown in)
> 
> Yes.  This can be done.  It isn't necessarily real simple to do.
> 
> See http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.56.7275 for an
> old (but still pretty good) example.
> 
> On Tue, Jun 23, 2009 at 6:45 AM, Paul Jones <[email protected]>wrote:
> 
>> Imagine we have crawled 100K webpages, and we have 100 pages which show
>> "red" and 100 which show "crimson" and then 100 which show both "red and
>> crimson" is there a way to deduce that there maybe (albeit weak)
>> relationship between red AND crimson. Of course we can pre-seed this info,
>> which then gets weighted by actual results.
>> 
> 
> 
> 


      

Reply via email to