I just ran into an interesting problem today, and wanted to know if it
was my understanding or Lucene that was out of whack -- right now I'm
leaning toward a fault between the chair and the keyboard.

I attempted to do a simple phrase query using the StandardAnalyzer:
"United States"

Against my corpus of test documents, I got no results returned, which
surprised me.  I know it's in there.

So, I ran this same query in Luke, and it also returned no results.

Luke explains:
 PhraseQuery: boost=1.0000, slop=0
 pos[0,1]
 Term 0: field='contents' text='united'
 Term 1: field='contents' text='states'

Now I know Lucene handles phrases, so I tried manually setting the
slop to 1, given that there were two terms:  "United States"~1

...and suddenly I got the results I was expecting!

In fact, after a little trial and error with larger phrases, I always
get no results unless I *manually* specify at least slop value of the
number of terms minus one.

Isn't this supposed to be the default behavior if no slop is specified?

Lucene's standard analyzer, which clear knows the number of terms,
should be able to deduce the minimum slop amount.  Why must it be
manually specified?

Could I be missing some configuration setting, have a bad
understanding of the query syntax, or is there a clever reason (like
searching for encoding synonyms) that makes more sense as a default
value for slop that I'm not seeing?

Many thanks to all that unravel my confusion.

-wls

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to