http://www.lucidimagination.com/blog/2009/05/26/accessing-words-around-a-positional-match-in-lucene/ has an example of implementing a TermVectorMapper. There are also several implementations included in the Lucene codebase.
All it really does is give you a callback as it is reading the code from the Directory and then you can massage the data as you see fit. On Oct 21, 2010, at 7:47 AM, app...@dsl.pipex.com wrote: > Would you have an example of this or be able to point me in the direction of > an example at all? > > Quoting Grant Ingersoll <gsing...@apache.org>: > >> >> On Oct 20, 2010, at 4:40 PM, Martin O'Shea wrote: >> >>> >> http://mail-archives.apache.org/mod_mbox/lucene-java-user/201010.mbox/%3c128 >>> 7065863.4cb7110774...@netmail.pipex.net%3e will give you a better idea of >>> what I'm moving towards. >>> >>> It's all a bit grey at the moment so further investigation is inevitable. >>> >>> I expect that a combination of MySQL database storage and Lucene indexing >> is >>> going to be the end result. >> >> I'd likely take the TermVectorMapper approach, but otherwise, yeah, I think >> you are on the right track. >> >> >>> >>> >>> >>> -----Original Message----- >>> From: Grant Ingersoll [mailto:gsing...@apache.org] >>> Sent: 20 Oct 2010 21 20 >>> To: java-user@lucene.apache.org >>> Subject: Re: Using a TermFreqVector to get counts of all words in a >> document >>> >>> >>> On Oct 20, 2010, at 2:53 PM, Martin O'Shea wrote: >>> >>>> Uwe >>>> >>>> Thanks - I figured that bit out. I'm a Lucene 'newbie'. >>>> >>>> What I would like to know though is if it is practical to search a single >>>> document of one field simply by doing this: >>>> >>>> IndexReader trd = IndexReader.open(index); >>>> TermFreqVector tfv = trd.getTermFreqVector(docId, "title"); >>>> String[] terms = tfv.getTerms(); >>>> int[] freqs = tfv.getTermFrequencies(); >>>> for (int i = 0; i < tfv.getTerms().length; i++) { >>>> System.out.println("Term " + terms[i] + " Freq: " + freqs[i]); >>>> } >>>> trd.close(); >>>> >>>> where docId is set to 0. >>>> >>>> The code works but can this be improved upon at all? >>>> >>>> My situation is where I don't want to calculate the number of documents >>> with >>>> a particular string. Rather I want to get counts of individual words in a >>>> field in a document. So I can concatenate the strings before passing it >> to >>>> Lucene. >>> >>> Can you describe the bigger problem you are trying to solve? This looks >>> like a classic XY problem: http://people.apache.org/~hossman/#xyproblem >>> >>> What you are doing above will work OK for what you describe (up to the >>> "passing it to Lucene" part), but you probably should explore the use of >> the >>> TermVectorMapper which provides a callback mechanism (similar to a SAX >>> parser) that will allow you to build your data structures on the fly >> instead >>> of having to serialize them into two parallel arrays and then loop over >>> those arrays to create some other structure. >>> >>> >>> -------------------------- >>> Grant Ingersoll >>> http://www.lucidimagination.com >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>> >>> >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>> >> >> -------------------------- >> Grant Ingersoll >> http://www.lucidimagination.com >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> > > > -- > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > -------------------------- Grant Ingersoll http://www.lucidimagination.com --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org