Thanks so much Thomas for your prompt reply.


First of all you have to make sure, that you create new Fields, which
you add to a Document, with the appropriate constructor. You have to
specify the usage of term vectors (Field.TermVector.YES):

new Field("text", "your text...", Field.Store.YES,
Field.Index.TOKENIZED,Field.TermVector.YES));


I did set up like this.

Without the explicit storage of the term vectors it is not possible to
get the term vectors during searching.

Once you build the index, you can use the suggested method
getTermFreqVector().

To get the top n keywords from the hits object you can iterate over the
first results.
Here is an example:

           for (int i = 0; i < 10; i++) {
               int docNumber = hits.id(i);
               TermFreqVector[] termsV =
ir.getTermFreqVectors(docNumber); //return an array of term frequency
vectors for the specified document.
               for (int xy = 0; xy < termsV.length; xy++) { //loop over
all terms-vectors in the current document
                   String[] terms = termsV[xy].getTerms();
                   for (int termsInArray = 0;    termsInArray <
terms.length; termsInArray++) {
                           //toDo: count the occurrence of the terms
                   }

               }
           }


I wanted to do this way as well but I am a bit worrying about computational
time as I have many documents and each document is a bit large.

I am looking for more solutions.

Please do contribute if you have any. Your help is hightly appreciated.

Best,

Sengly

Sengly Heng wrote:
> Hello all,
>
> I would like to extract the term freq vector from the hit results as a
> total
> vector not by document.
>
> I have searched the mailing and I found many have talked about this
issue
> but I still could not find the right solution to this matter. Everyone
> just
> suggested to look at getTermFreqVector and TermEnum.
>
> I wonder if there someone has already done this before and what was your
> solution? Would you please share?
>
> Also how to get a list of top n keywords from that hit results. I have
> also
> looked at HighFreqTerms (in the contribution repositories as well as the
> one implemented by Luke) but still this class is rather for the usage
> when
> we want to get the top n keywords from an index and not from the hit
> results.
>
> Thank you.
>
> Best regards,
>
> Sengly
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Reply via email to