I just ran into an interesting problem today, and wanted to know if it was my understanding or Lucene that was out of whack -- right now I'm leaning toward a fault between the chair and the keyboard.
I attempted to do a simple phrase query using the StandardAnalyzer: "United States" Against my corpus of test documents, I got no results returned, which surprised me. I know it's in there. So, I ran this same query in Luke, and it also returned no results. Luke explains: PhraseQuery: boost=1.0000, slop=0 pos[0,1] Term 0: field='contents' text='united' Term 1: field='contents' text='states' Now I know Lucene handles phrases, so I tried manually setting the slop to 1, given that there were two terms: "United States"~1 ...and suddenly I got the results I was expecting! In fact, after a little trial and error with larger phrases, I always get no results unless I *manually* specify at least slop value of the number of terms minus one. Isn't this supposed to be the default behavior if no slop is specified? Lucene's standard analyzer, which clear knows the number of terms, should be able to deduce the minimum slop amount. Why must it be manually specified? Could I be missing some configuration setting, have a bad understanding of the query syntax, or is there a clever reason (like searching for encoding synonyms) that makes more sense as a default value for slop that I'm not seeing? Many thanks to all that unravel my confusion. -wls --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]