On Friday 01 July 2005 21:13, Dave Kor wrote: > Quoting Peter Laurinc <[EMAIL PROTECTED]>: > > > Hi, > > > > I'm newbie to lucene. > > I wan to ask, how to implement search for phrase that must be in > > sentence/paragraph. > > I did see som examples, that uses term position changing, but I think > > that this is not the way, because it breaks classic proximity search. > > (if one word is on end and second of begining of next sentence) > > Most NLP toolkits have a sentence and paragraph boundary detector that can be > used to separate a single document into its constituent sentences and > paragraphs. The two NLP toolkits I am familiar with are OpenNLP (Open Source) > and Alias-i's LingPipe (Commercial) libraries. These toolkits can be used in > several ways to achieve what you want. > > If you are ONLY interested in searching for individual sentences, then you can > use the toolkits to create an index of sentences instead of an index of > documents. > > Alternatively, you can encode the sentence boundaries found by these toolkits > within the documents you are indexing, for example using special characters or > as a separate field in Lucene. After every search, do an extra check to ensure > that Lucene did not match across sentence boundaries.
Or try and use a SpanNotQuery to make sure that the sentence or paragraph border is contained in matches in the same field. Regards, Paul Elschot --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]