Quoting Peter Laurinc <[EMAIL PROTECTED]>: > Hi, > > I'm newbie to lucene. > I wan to ask, how to implement search for phrase that must be in > sentence/paragraph. > I did see som examples, that uses term position changing, but I think > that this is not the way, because it breaks classic proximity search. > (if one word is on end and second of begining of next sentence)
Most NLP toolkits have a sentence and paragraph boundary detector that can be used to separate a single document into its constituent sentences and paragraphs. The two NLP toolkits I am familiar with are OpenNLP (Open Source) and Alias-i's LingPipe (Commercial) libraries. These toolkits can be used in several ways to achieve what you want. If you are ONLY interested in searching for individual sentences, then you can use the toolkits to create an index of sentences instead of an index of documents. Alternatively, you can encode the sentence boundaries found by these toolkits within the documents you are indexing, for example using special characters or as a separate field in Lucene. After every search, do an extra check to ensure that Lucene did not match across sentence boundaries. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]