phrase, inidividual term, prefix, fuzzy and stemming search
My current project has the requirement to support search when user inputs any number of terms across a few index fields (movie title, actor, director). In order to maximize result, I plan to support all those searches listed in the subject, phrase, individual term, prefix, fuzzy and stemming. Of course, score relevance in the right order is also important. I have considered using dismax query. However, it does not support prefix query. I am not sure if it supports fuzzy query, my guess is does not. Therefore, i still need to use standard query. For example, if someone searches "deim moer" (typo for demi moore), i compare the phrase and terms with each searchable fields (title, actor, director): title_display: "deim moer"~30 actors: "deim moer"~30 directors: "deim moer"~30<-- OR title_display: deim<-- OR actors: deim directors: deim title_display: deim* <-- OR actors: deim* directors: deim* title_display: deim~0.6 <-- OR actors: deim~0.6 directors: deim~0.6 title_display: moer<-- OR actors: moer directors: moer title_display: moer* <-- OR actors: moer* directors: moer* title_display: moer~0.6<-- OR actors: moer~0.6 directors: moer~0.6 The solr relevance score is sum for all those OR. In that way, i can make sure relevance score are in order. For example, for the exact match ("deim moer"), it will match phrase, term, prefix and fuzzy query all at the same time. Therefore, it will score higher than some input text only matchs term, or prefix or fuzzy. At the same time, i can apply boost to a particular search field if requirement needs. Does it sound right to you? Is there better ways to achieve the same thing? My concern is my query is not going to perform, since it tries to do too much. But isn't that what people want to get (maximize result) when they just type in a few search words? Another question is that: Can i combine the result of two query together? For example, first i query phrase and term match, next I query for prefix match. Can I just append the result for prefix match to that for phrase/term match? I thought two queries have different queryNorm, therefore, the score is not comparable to each other so as to combine. Is it correct? Thanks. love to hear what your thought is. -- View this message in context: http://lucene.472066.n3.nabble.com/phrase-inidividual-term-prefix-fuzzy-and-stemming-search-tp239p239.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: phrase, inidividual term, prefix, fuzzy and stemming search
Hi, I'll admit I didn't read your email closely, but the first part makes me thing that ngrams, which I don't think you mentioned, might be handy for you here, allowing for misspellings without the implementation complexity. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message > From: cyang2010 > To: solr-user@lucene.apache.org > Sent: Mon, January 31, 2011 5:22:19 PM > Subject: phrase, inidividual term, prefix, fuzzy and stemming search > > > My current project has the requirement to support search when user inputs any > number of terms across a few index fields (movie title, actor, director). > > In order to maximize result, I plan to support all those searches listed in > the subject, phrase, individual term, prefix, fuzzy and stemming. Of > course, score relevance in the right order is also important. > > I have considered using dismax query. However, it does not support prefix > query. I am not sure if it supports fuzzy query, my guess is does not. > > Therefore, i still need to use standard query.For example, if someone > searches "deim moer" (typo for demi moore), i compare the phrase and terms > with each searchable fields (title, actor, director): > > > title_display: "deim moer"~30 actors: "deim moer"~30 directors: "deim > moer"~30<-- OR > > title_display: deim<-- OR > actors: deim > directors: deim > > title_display: deim* <-- OR > actors: deim* > directors: deim* > > title_display: deim~0.6 <-- OR > actors: deim~0.6 > directors: deim~0.6 > > title_display: moer<-- OR > actors: moer > directors: moer > > title_display: moer*<-- OR > actors: moer* > directors: moer* > > title_display: moer~0.6<-- OR > actors: moer~0.6 > directors: moer~0.6 > > The solr relevance score is sum for all those OR. In that way, i can make > sure relevance score are in order. For example, for the exact match ("deim > moer"), it will match phrase, term, prefix and fuzzy query all at the same > time. Therefore, it will score higher than some input text only matchs > term, or prefix or fuzzy. At the same time, i can apply boost to a > particular search field if requirement needs. > > > Does it sound right to you? Is there better ways to achieve the same thing? > My concern is my query is not going to perform, since it tries to do too > much. But isn't that what people want to get (maximize result) when they > just type in a few search words? > > Another question is that: Can i combine the result of two query together? > For example, first i query phrase and term match, next I query for prefix > match. Can I just append the result for prefix match to that for > phrase/term match? I thought two queries have different queryNorm, > therefore, the score is not comparable to each other so as to combine. Is > it correct? > > > Thanks. love to hear what your thought is. > > > -- > View this message in context: >http://lucene.472066.n3.nabble.com/phrase-inidividual-term-prefix-fuzzy-and-stemming-search-tp239p239.html > > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: phrase, inidividual term, prefix, fuzzy and stemming search
You mentioned that dismax does not support wildcards, but edismax does. Not sure if dismax would have solved your other problems, or whether you just had to shift gears because of the wildcard issue, but you might want to have a look at edismax. -Jay http://www.lucidimagination.com On Mon, Jan 31, 2011 at 2:22 PM, cyang2010 wrote: > > My current project has the requirement to support search when user inputs > any > number of terms across a few index fields (movie title, actor, director). > > In order to maximize result, I plan to support all those searches listed in > the subject, phrase, individual term, prefix, fuzzy and stemming. Of > course, score relevance in the right order is also important. > > I have considered using dismax query. However, it does not support prefix > query. I am not sure if it supports fuzzy query, my guess is does not. > > Therefore, i still need to use standard query. For example, if someone > searches "deim moer" (typo for demi moore), i compare the phrase and terms > with each searchable fields (title, actor, director): > > > title_display: "deim moer"~30 actors: "deim moer"~30 directors: "deim > moer"~30<-- OR > > title_display: deim<-- OR > actors: deim > directors: deim > > title_display: deim* <-- OR > actors: deim* > directors: deim* > > title_display: deim~0.6 <-- OR > actors: deim~0.6 > directors: deim~0.6 > > title_display: moer<-- OR > actors: moer > directors: moer > > title_display: moer* <-- OR > actors: moer* > directors: moer* > > title_display: moer~0.6<-- OR > actors: moer~0.6 > directors: moer~0.6 > > The solr relevance score is sum for all those OR. In that way, i can make > sure relevance score are in order. For example, for the exact match ("deim > moer"), it will match phrase, term, prefix and fuzzy query all at the same > time. Therefore, it will score higher than some input text only matchs > term, or prefix or fuzzy. At the same time, i can apply boost to a > particular search field if requirement needs. > > > Does it sound right to you? Is there better ways to achieve the same > thing? > My concern is my query is not going to perform, since it tries to do too > much. But isn't that what people want to get (maximize result) when they > just type in a few search words? > > Another question is that: Can i combine the result of two query together? > For example, first i query phrase and term match, next I query for prefix > match. Can I just append the result for prefix match to that for > phrase/term match? I thought two queries have different queryNorm, > therefore, the score is not comparable to each other so as to combine. Is > it correct? > > > Thanks. love to hear what your thought is. > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/phrase-inidividual-term-prefix-fuzzy-and-stemming-search-tp239p239.html > Sent from the Solr - User mailing list archive at Nabble.com. >