Hello Sengly
First of all you have to make sure, that you create new Fields, which
you add to a Document, with the appropriate constructor. You have to
specify the usage of term vectors (Field.TermVector.YES):
new Field("text", "your text...", Field.Store.YES,
Field.Index.TOKENIZED,Field.TermVector.YES));
Without the explicit storage of the term vectors it is not possible to
get the term vectors during searching.
Once you build the index, you can use the suggested method
getTermFreqVector().
To get the top n keywords from the hits object you can iterate over the
first results.
Here is an example:
for (int i = 0; i < 10; i++) {
int docNumber = hits.id(i);
TermFreqVector[] termsV =
ir.getTermFreqVectors(docNumber); //return an array of term frequency
vectors for the specified document.
for (int xy = 0; xy < termsV.length; xy++) { //loop over
all terms-vectors in the current document
String[] terms = termsV[xy].getTerms();
for (int termsInArray = 0; termsInArray <
terms.length; termsInArray++) {
//toDo: count the occurrence of the terms
}
}
}
Hope this helps.
Thomas
Sengly Heng wrote:
Hello all,
I would like to extract the term freq vector from the hit results as a
total
vector not by document.
I have searched the mailing and I found many have talked about this issue
but I still could not find the right solution to this matter. Everyone
just
suggested to look at getTermFreqVector and TermEnum.
I wonder if there someone has already done this before and what was your
solution? Would you please share?
Also how to get a list of top n keywords from that hit results. I have
also
looked at HighFreqTerms (in the contribution repositories as well as the
one implemented by Luke) but still this class is rather for the usage
when
we want to get the top n keywords from an index and not from the hit
results.
Thank you.
Best regards,
Sengly
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]