Hi Erick,

Thanks for you reply, now I get the point why I can not get the search
result. But can you guide me how can I use Lucene to implement the following
search feature:
Basically we can call this feature "fuzzy phrase search", which means the
search phrase may contains more words or less words compared to the
sentences indexed in the document. Again I want to use the sample I posted
to demo it:

The document contains a sentence "This is a test", and the search phrase is
"This is a formal test", as there is only one word difference between the
two sentences, how can I get the "This is a test" phrase as the search
result?

Thanks.



2010/6/29 Erick Erickson <erickerick...@gmail.com>

> No, I don't think so. The critical bit is that the indexed text
> does NOT contain the word "formal".  So searching for
> any phrase that DOES contain "formal" should fail no matter
> what the slop.
>
> Phrase queries are something like "find all the words in this
> search string, ignoring some number of intervening tokens not in the
> search string". There's nothing in there about "find only some of the
> words in the search string"....
>
> I'm guessing that the original post had a typo in the success case,
> because it's contradicted by "a peng's" second post. It's always
> possible that I'm experiencing a brain short......
>
> Best
> Erick
>
> On Mon, Jun 28, 2010 at 11:32 AM, tarun sapra <t.sapr...@gmail.com> wrote:
>
> > Hey Erick
> >
> > Thanks mate!
> >
> > So I guess my explanation in the mail chain above was correct!
> >
> > On Mon, Jun 28, 2010 at 6:20 AM, Erick Erickson <erickerick...@gmail.com
> > >wrote:
> >
> > > I think you're misunderstanding the intent of PhraseQueries and slop.
> > Slop
> > > is the number of intervening tokens that may exist between the words
> > > you're looking for. However, all the words you're looking for MUST
> exist.
> > > So,
> > >
> > > <<< whenever the search phrase contains a word that don't
> > > exist in the document, the search result will be empty >>>
> > >
> > > is exactly how this is intended to work.
> > >
> > > HTH
> > > Erick
> > >
> > >
> > > On Mon, Jun 28, 2010 at 9:09 AM, a peng <zhoudengp...@gmail.com>
> wrote:
> > >
> > > > Hi,
> > > >
> > > > My test result is that whenever the search phrase contains a word
> that
> > > > don't
> > > > exist in the document, the search result will be empty no matter how
> > big
> > > > the
> > > > slop factor I set, seems this is a bug of Lucene, or it is work as
> > > design?
> > > >
> > > > 2010/6/28 tarun sapra <t.sapr...@gmail.com>
> > > >
> > > > > Hi ,
> > > > >
> > > > > I think I have been able to understand whats happening here...
> > > > >
> > > > > Indexed Content : "This is a test".
> > > > > your search phrase : "This is a formal test"
> > > > > your setting the slop factor 2 , now if your slop factor is 3 it
> > should
> > > > > work
> > > > > because "is" and "a" are stop words thus the words "This" and
> "test"
> > > are
> > > > 2
> > > > > slop factor apart but in your search phrase "This is a formal test"
> > the
> > > > > words "This" and "test"  are 3 slop factor thats why it's nor
> working
> > > > > now in search phrase "This is formal test" the words "This" and
> > "test"
> > > > are
> > > > > 2
> > > > > slop factor apart thats why this phrase is working.
> > > > >
> > > > >
> > > > >
> > > > > On Mon, Jun 28, 2010 at 11:37 AM, a peng <zhoudengp...@gmail.com>
> > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I am using StandardAnalyzer(Version.LUCENE_30);
> > > > > >
> > > > > > 2010/6/27 tarun sapra <t.sapr...@gmail.com>
> > > > > >
> > > > > > > which analyzer are you usin'?
> > > > > > >
> > > > > > >
> > > > > > > On Sun, Jun 27, 2010 at 7:12 AM, a peng <
> zhoudengp...@gmail.com>
> > > > > wrote:
> > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > I know the indexed content contains the following text: "This
> > is
> > > a
> > > > > > test".
> > > > > > > > And the search phrase I used is "This is a formal test", and
> > then
> > > I
> > > > > set
> > > > > > > the
> > > > > > > > slop of the PhraseQuery as 2 with setSlop(2), but I found
> that
> > I
> > > > can
> > > > > > not
> > > > > > > > get
> > > > > > > > a search result. If I set the search phrase as "This is
> formal
> > > > test",
> > > > > > > then
> > > > > > > > I
> > > > > > > > can get the search result.
> > > > > > > >
> > > > > > > > So what is the problem here, thanks in advance.
> > > > > > > >
> > > > > > > >
> > > > > > > > Attached is the Java doc for the setSlop method:
> > > > > > > >
> > > > > > > > public void *setSlop*(int s)
> > > > > > > >
> > > > > > > > Sets the number of other words permitted between words in
> query
> > > > > phrase.
> > > > > > > If
> > > > > > > > zero, then this is an exact phrase search. For larger values
> > this
> > > > > works
> > > > > > > > like
> > > > > > > > a WITHIN or NEAR operator.
> > > > > > > >
> > > > > > > > The slop is in fact an edit-distance, where the units
> > correspond
> > > to
> > > > > > moves
> > > > > > > > of
> > > > > > > > terms in the query phrase out of position. For example, to
> > switch
> > > > the
> > > > > > > order
> > > > > > > > of two words requires two moves (the first move places the
> > words
> > > > atop
> > > > > > one
> > > > > > > > another), so to permit re-orderings of phrases, the slop must
> > be
> > > at
> > > > > > least
> > > > > > > > two.
> > > > > > > >
> > > > > > > > More exact matches are scored higher than sloppier matches,
> > thus
> > > > > search
> > > > > > > > results are sorted by exactness.
> > > > > > > >
> > > > > > > > The slop is zero by default, requiring exact matches.
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Thanks & Regards
> > > > > > > Tarun Sapra
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Thanks & Regards
> > > > > Tarun Sapra
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> > Thanks & Regards
> > Tarun Sapra
> >
>

Reply via email to