Hi Yannick, More like this (mlt) stuff does this already. It extracts "interesting terms" from top N documents. Don't remember but this feature may require "term vectors" to be stored.
Ahmet On Wednesday, January 27, 2016 10:41 AM, Yannick Martel <mar...@codelutin.com> wrote: Le Tue, 15 Dec 2015 17:56:05 +0100, Yannick Martel <mar...@codelutin.com> a écrit : > Hi ! > > I am using (Java) Lucene for data indexation, and I want to produce > kind of tags cloud for specific data. > > I've found HighFreqTerms to get a top list of terms from *all > documents* (if I have well understood) (by the bye, I had override it > to be able to filter on several fields instead only one). > > But, it does not really match with my need : I'd like to get the most > repeated terms in a single (or several specific) document(s). > For exemple, considering a document with Terms "Title", "Summary", > "Description", I try to get the count of each terms (excluding stop > words from Analyzer). > > I cannot find process to do that : I searched among TopFieldCollector, > or other collector, but seems it just give document scores :/ > > Find documentation is not easy I think, cause lot of questions/answers > are either not corresponding my need, or with old version (3.x for > example), and I'm feeling lost in all of this... > > > Hopping someone could guide me well. > > Regards, > Hello, After more than one month with no response, should I conclude what I want is not possible with Lucene ? Regards, -- Yannick Martel --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org