I'm not going to go into too much code level detail, however I'd index the phrases using tri-gram shingles, and as uni-grams. I think this'll give you the results you're looking for. You'll be able to quickly recall the count of a given phrase aka tri-gram such as "blue_shorts_burough"
On Fri, Jan 8, 2010 at 9:37 AM, hrishim <smarthr...@yahoo.co.in> wrote: > > @All : Elaborating the problem > > The phrase is being indexed as a single token ... > I have a Gene tag in the xml document which is like > <Gene>brain natriuretic peptide </Gene> > This phrase is present in the abstract text for the given document . > > Code is as : > > doc.add(new Field("Gene", geneName, Field.Store.YES, > Field.Index.ANALYZED,Field.TermVector.YES)); > > doc.add(new Field("Token", abstractText.toString().toLowerCase(), > Field.Store.YES, Field.Index.ANALYZED,Field.TermVector.YES)); > > When I retrieve all tokens as well as genes for a given doc and calculate > the tf for each of these , > a null exception is thrown . Term = brain natriuretic peptide > > TermDocs termDocs = indexReader.termDocs(term); > termDocs.next(); > double tf = termDocs.freq(); > > Regards, > Hrishi > > > Grant Ingersoll-6 wrote: >> >> When do you detect that they are phrases? During indexing or during >> search? >> >> On Jan 8, 2010, at 5:16 AM, hrishim wrote: >> >>> >>> Hi . >>> I have phrases like brain natriuretic peptide indexed as a single token >>> using Lucene. >>> When I calculate the term frequency for the same the count is 0 since >>> the >>> tokens from the text are indexed separately i.e. brain , natriuretic , >>> peptide. >>> Is there a way to solve this problem and get the term frequency for the >>> entire phrase ? >>> >>> Regards, >>> Hrishi >>> -- >>> View this message in context: >>> http://old.nabble.com/Term-Frequency-for-phrases-tp27073866p27073866.html >>> Sent from the Lucene - Java Users mailing list archive at Nabble.com. >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>> >> >> -------------------------- >> Grant Ingersoll >> http://www.lucidimagination.com/ >> >> Search the Lucene ecosystem using Solr/Lucene: >> http://www.lucidimagination.com/search >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> >> > > -- > View this message in context: > http://old.nabble.com/Term-Frequency-for-phrases-tp27073866p27079648.html > Sent from the Lucene - Java Users mailing list archive at Nabble.com. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org