Something significant that I've noticed about using the default Lucene query parser is that if your user enters a query like:

"temperate climates"

... it will get turned into an OR query:

temperate OR climates

This means that a document that contains the literal substring "temperate climates" will be on equal footing with a document that contains "temperate emotions may go a long way to keeping the peace as we continue to discuss climate change".

So far as I know, your typical search engine definitely does not ignore the relative positions of terms.

And so my question is -- how do people typically deal with this when using Lucene? What is wanted is a query that desires search terms to be close together, but failing that, is ok with the terms simply occurring in the document.

And again -- the ultimate desire isn't just to construct a Query object to accomplish that, but to hook things up in such a way that a user can enter a query in an input box and have the system take their flat string and turn it into an intelligent query that acts somewhat like today's modern search engines in terms of wanting terms to be close to each other.

This is such a "basic" use case of a search system that I'm tempted to think there must be well worn paths for doing this in Lucene.

Thanks,
Daniel

Reply via email to