You might get some joy from WhitespaceAnalyzer, but beware of case and punctuation. You could pre-process your indexing and querying to remove non-alphanumerics.
Or you could create your own analyzer, see SynonymAnalyzer in Lucene In Action, and there's another example here: http://mext.at/?p=26. The idea is to string together some number of Filters, starting with a Tokenizer that "does the right thing", and create your own Analyzer. But as far as I know, there's nothing out of the box that does what you want. Best Erick On Wed, Mar 17, 2010 at 4:25 PM, Joachim De Beule <joac...@arti.vub.ac.be>wrote: > Hi All, > > I have a corpus of documents which I want to search for phrases. I only > want > to get those documents that exactly contain a phrase. for example if: > doc1 = "x 11 windowing system" > doc2 = "x windowing system" > doc3 = "the x 11 windowing system" > > then I want the query "x 11 windowing system" to return only doc1 and doc3 > and > the query "the x 11" to return only doc3. > > I have tried to use SimpleAnalyzer together with using the query as a > single > phrase, but this still also gives doc2 for the first example query because > this > analyzer discards the number 11. There does not seem to be an alternative > analyzer for this however, and I don't know how to write one myself. > > Is there a standard way of doing this? > > Thanks! > > Joachim. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >