Re: Top terms relevance from specific documents ?

Ahmet Arslan Wed, 27 Jan 2016 02:30:07 -0800

Hi Yannick,

More like this (mlt) stuff does this already.
It extracts "interesting terms" from top N documents.
Don't remember but this feature may require "term vectors" to be stored.


Ahmet



On Wednesday, January 27, 2016 10:41 AM, Yannick Martel <mar...@codelutin.com> 
wrote:
Le Tue, 15 Dec 2015 17:56:05 +0100,
Yannick Martel <mar...@codelutin.com> a écrit :

> Hi !
> 
> I am using (Java) Lucene for data indexation, and I want to produce
> kind of tags cloud for specific data.
> 
> I've found HighFreqTerms to get a top list of terms from *all
> documents* (if I have well understood) (by the bye, I had override it
> to be able to filter on several fields instead only one).
> 
> But, it does not really match with my need : I'd like to get the most
> repeated terms in a single (or several specific) document(s).
> For exemple, considering a document with Terms "Title", "Summary",
> "Description", I try to get the count of each terms (excluding stop
> words from Analyzer).
> 
> I cannot find process to do that : I searched among TopFieldCollector,
> or other collector, but seems it just give document scores :/
> 
> Find documentation is not easy I think, cause lot of questions/answers
> are either not corresponding my need, or with old version (3.x for
> example), and I'm feeling lost in all of this...
> 
> 
> Hopping someone could guide me well.
> 
> Regards,
> 

Hello,

After more than one month with no response, should I conclude what I
want is not possible with Lucene ?


Regards,

-- 
Yannick Martel


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Top terms relevance from specific documents ?

Reply via email to