No, I really don't have a good solution off the top of my head, perhaps others can chime in...
Although I suppose you could fire multiple queries dropping out some number of search terms, but I don't know whether that satisfies your requirements. E.g. search the following phrases "this is a formal test" "this is a formal" "this is a test" "this is formal test" "this a formal test" "is a formal test" and so on... Best Erick On Tue, Jun 29, 2010 at 8:30 PM, a peng <zhoudengp...@gmail.com> wrote: > Hi Erick, > > Any comments about this requirement? > > 2010/6/29 a peng <zhoudengp...@gmail.com> > > > Hi Erick, > > > > Thanks for you reply, now I get the point why I can not get the search > > result. But can you guide me how can I use Lucene to implement the > following > > search feature: > > Basically we can call this feature "fuzzy phrase search", which means the > > search phrase may contains more words or less words compared to the > > sentences indexed in the document. Again I want to use the sample I > posted > > to demo it: > > > > The document contains a sentence "This is a test", and the search phrase > is > > "This is a formal test", as there is only one word difference between the > > two sentences, how can I get the "This is a test" phrase as the search > > result? > > > > Thanks. > > > > > > > > 2010/6/29 Erick Erickson <erickerick...@gmail.com> > > > > No, I don't think so. The critical bit is that the indexed text > >> does NOT contain the word "formal". So searching for > >> any phrase that DOES contain "formal" should fail no matter > >> what the slop. > >> > >> Phrase queries are something like "find all the words in this > >> search string, ignoring some number of intervening tokens not in the > >> search string". There's nothing in there about "find only some of the > >> words in the search string".... > >> > >> I'm guessing that the original post had a typo in the success case, > >> because it's contradicted by "a peng's" second post. It's always > >> possible that I'm experiencing a brain short...... > >> > >> Best > >> Erick > >> > >> On Mon, Jun 28, 2010 at 11:32 AM, tarun sapra <t.sapr...@gmail.com> > >> wrote: > >> > >> > Hey Erick > >> > > >> > Thanks mate! > >> > > >> > So I guess my explanation in the mail chain above was correct! > >> > > >> > On Mon, Jun 28, 2010 at 6:20 AM, Erick Erickson < > >> erickerick...@gmail.com > >> > >wrote: > >> > > >> > > I think you're misunderstanding the intent of PhraseQueries and > slop. > >> > Slop > >> > > is the number of intervening tokens that may exist between the words > >> > > you're looking for. However, all the words you're looking for MUST > >> exist. > >> > > So, > >> > > > >> > > <<< whenever the search phrase contains a word that don't > >> > > exist in the document, the search result will be empty >>> > >> > > > >> > > is exactly how this is intended to work. > >> > > > >> > > HTH > >> > > Erick > >> > > > >> > > > >> > > On Mon, Jun 28, 2010 at 9:09 AM, a peng <zhoudengp...@gmail.com> > >> wrote: > >> > > > >> > > > Hi, > >> > > > > >> > > > My test result is that whenever the search phrase contains a word > >> that > >> > > > don't > >> > > > exist in the document, the search result will be empty no matter > how > >> > big > >> > > > the > >> > > > slop factor I set, seems this is a bug of Lucene, or it is work as > >> > > design? > >> > > > > >> > > > 2010/6/28 tarun sapra <t.sapr...@gmail.com> > >> > > > > >> > > > > Hi , > >> > > > > > >> > > > > I think I have been able to understand whats happening here... > >> > > > > > >> > > > > Indexed Content : "This is a test". > >> > > > > your search phrase : "This is a formal test" > >> > > > > your setting the slop factor 2 , now if your slop factor is 3 it > >> > should > >> > > > > work > >> > > > > because "is" and "a" are stop words thus the words "This" and > >> "test" > >> > > are > >> > > > 2 > >> > > > > slop factor apart but in your search phrase "This is a formal > >> test" > >> > the > >> > > > > words "This" and "test" are 3 slop factor thats why it's nor > >> working > >> > > > > now in search phrase "This is formal test" the words "This" and > >> > "test" > >> > > > are > >> > > > > 2 > >> > > > > slop factor apart thats why this phrase is working. > >> > > > > > >> > > > > > >> > > > > > >> > > > > On Mon, Jun 28, 2010 at 11:37 AM, a peng < > zhoudengp...@gmail.com> > >> > > wrote: > >> > > > > > >> > > > > > Hi, > >> > > > > > > >> > > > > > I am using StandardAnalyzer(Version.LUCENE_30); > >> > > > > > > >> > > > > > 2010/6/27 tarun sapra <t.sapr...@gmail.com> > >> > > > > > > >> > > > > > > which analyzer are you usin'? > >> > > > > > > > >> > > > > > > > >> > > > > > > On Sun, Jun 27, 2010 at 7:12 AM, a peng < > >> zhoudengp...@gmail.com> > >> > > > > wrote: > >> > > > > > > > >> > > > > > > > Hi, > >> > > > > > > > > >> > > > > > > > I know the indexed content contains the following text: > >> "This > >> > is > >> > > a > >> > > > > > test". > >> > > > > > > > And the search phrase I used is "This is a formal test", > and > >> > then > >> > > I > >> > > > > set > >> > > > > > > the > >> > > > > > > > slop of the PhraseQuery as 2 with setSlop(2), but I found > >> that > >> > I > >> > > > can > >> > > > > > not > >> > > > > > > > get > >> > > > > > > > a search result. If I set the search phrase as "This is > >> formal > >> > > > test", > >> > > > > > > then > >> > > > > > > > I > >> > > > > > > > can get the search result. > >> > > > > > > > > >> > > > > > > > So what is the problem here, thanks in advance. > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > Attached is the Java doc for the setSlop method: > >> > > > > > > > > >> > > > > > > > public void *setSlop*(int s) > >> > > > > > > > > >> > > > > > > > Sets the number of other words permitted between words in > >> query > >> > > > > phrase. > >> > > > > > > If > >> > > > > > > > zero, then this is an exact phrase search. For larger > values > >> > this > >> > > > > works > >> > > > > > > > like > >> > > > > > > > a WITHIN or NEAR operator. > >> > > > > > > > > >> > > > > > > > The slop is in fact an edit-distance, where the units > >> > correspond > >> > > to > >> > > > > > moves > >> > > > > > > > of > >> > > > > > > > terms in the query phrase out of position. For example, to > >> > switch > >> > > > the > >> > > > > > > order > >> > > > > > > > of two words requires two moves (the first move places the > >> > words > >> > > > atop > >> > > > > > one > >> > > > > > > > another), so to permit re-orderings of phrases, the slop > >> must > >> > be > >> > > at > >> > > > > > least > >> > > > > > > > two. > >> > > > > > > > > >> > > > > > > > More exact matches are scored higher than sloppier > matches, > >> > thus > >> > > > > search > >> > > > > > > > results are sorted by exactness. > >> > > > > > > > > >> > > > > > > > The slop is zero by default, requiring exact matches. > >> > > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > -- > >> > > > > > > Thanks & Regards > >> > > > > > > Tarun Sapra > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > -- > >> > > > > Thanks & Regards > >> > > > > Tarun Sapra > >> > > > > > >> > > > > >> > > > >> > > >> > > >> > > >> > -- > >> > Thanks & Regards > >> > Tarun Sapra > >> > > >> > > > > >