Re: AW: Edismax query parser and phrase queries
It _seems_ like just adding "phrase fields" (qf) to your edismax defaults gets you close. It would have the problem of matching if the field were longer... but it might be "close enough". Otherwise, why not just add in fq clauses on your exact fields? Because one problem you'll have is that you need to get the parameters past the parser to the field, which will be...er...interesting. And one note. Rather than String fields (which are case sensitive), consider KeywordTokenizer and LowercaseFilter or some such. But I'd _really_ prove that you can't get close enough with current functionality before I went down the custom route. Often things like this seem like a good idea but then don't improve results enough to be worth the complexity. Best Erick On Mon, Dec 3, 2012 at 8:00 AM, Jack Krupansky wrote: > Okay, so the bottom line here is that you wish to change the semantics of > quoted phrases. Fine, that's your prerogative, but a change in semantics > would require a change to the query parser, or as you originally indicated, > a pre-processor. It does sound as if a pre-processor is the way to go here. > > You still have a choice: An application-level preprocessor that generates > an edismax query, or implement a Solr SearchComponent that pre-processes > the query after Solr receives it but before edismax sees it. The former is > probably easier. The only question is whether there might be multiple > applications that access the same Solr node, so that maybe centralizing the > pre-processing in Solr might be warranted. > > -- Jack Krupansky > > -Original Message- From: Tantius, Richard > Sent: Monday, December 03, 2012 5:03 AM > To: solr-user@lucene.apache.org > Subject: AW: Edismax query parser and phrase queries > > > Hi, > the use case we have in mind is that we would like to achieve exact > matches for explicit phrases. Our users expect that an explicit phrase not > only considers the order of terms, but also the exact wording. Therefore if > we search on fields using a data type that is not meant performing exact > matches, we need to change that for explicit phrases. This means in a usual > query we have qf default fields using advanced tokenization (for query > processing and indexing), for example like stemming via > SnowballPorterFilterFactory. So our idea was to change the default search > fields for explicit phrases to achieve exact matches, by using a simple > data format like for example “string“ (StrField, without advanced options). > > Extending our example from the last mail: > > qf="title text" > > Datatype of title, text, something like “text_advanced”: > > > > > > ... > > Data type of the additional fields titleExact, textExact: > omitNorms="true"/> > > q="ran away from home" Cat Dog > > -transformTo-> > > q=( titleExact:"ran away from home" OR textExact:"ran away from home" ) > Cat Dog. > > Regards, > Richard. > > BINSERV > Gesellschaft für interaktive Konzepte und neue Medien mbH > Software Engineer > > Gotenstr. 7-9 > 53175 Bonn > Tel.: +49 (0)228 / 4 22 86 - 38 > Fax.: +49 (0)228 / 4 22 86 - 538 > E-Mail: r.tant...@binserv.de > Web: www.binserv.de >www.binforcepro.de > > Geschäftsführer: Rüdiger Jakob > Amtsgericht: Siegburg HRB 6765 > Hauptsitz der Gesellschaft.: Pfarrer-Wichert-Str. 35, 53639 Königswinter > Diese E-Mail einschließlich eventuell angehängter Dateien enthält > vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht > der richtige Adressat sind und diese E-Mail irrtümlich erhalten haben, > dürfen Sie weder den Inhalt dieser E-Mail nutzen noch dürfen Sie die > eventuell angehängten Dateien öffnen und auch nichts kopieren oder > weitergeben/verbreiten. Bitte verständigen Sie den Absender und löschen Sie > diese E-Mail und eventuell angehängte Dateien umgehend. Vielen Dank! > > > - Original message - > Von: Jack Krupansky [mailto:jack@basetechnology.**com > ] > Gesendet: Freitag, 30. November 2012 23:04 > An: solr-user@lucene.apache.org > Betreff: Re: Edismax query parser and phrase queries > > I don’t have a simple answer for your stated issue, but maybe part of that > is because I’m not so sure what the exact problem/goal is. I mean, what’s > so special about phrase queries for your app than they need distinct > processing from individual terms? > > And, ultimately, what goal are you trying to achieve? Such as, how will > the outcome of the query affect what users see and do. > > -- Jack Krupansky > > From: Tantius, Richard > Sent: Friday, November 30, 2012 8:44 AM > To: solr-user@lucene.apache.org >
Re: AW: Edismax query parser and phrase queries
Okay, so the bottom line here is that you wish to change the semantics of quoted phrases. Fine, that's your prerogative, but a change in semantics would require a change to the query parser, or as you originally indicated, a pre-processor. It does sound as if a pre-processor is the way to go here. You still have a choice: An application-level preprocessor that generates an edismax query, or implement a Solr SearchComponent that pre-processes the query after Solr receives it but before edismax sees it. The former is probably easier. The only question is whether there might be multiple applications that access the same Solr node, so that maybe centralizing the pre-processing in Solr might be warranted. -- Jack Krupansky -Original Message- From: Tantius, Richard Sent: Monday, December 03, 2012 5:03 AM To: solr-user@lucene.apache.org Subject: AW: Edismax query parser and phrase queries Hi, the use case we have in mind is that we would like to achieve exact matches for explicit phrases. Our users expect that an explicit phrase not only considers the order of terms, but also the exact wording. Therefore if we search on fields using a data type that is not meant performing exact matches, we need to change that for explicit phrases. This means in a usual query we have qf default fields using advanced tokenization (for query processing and indexing), for example like stemming via SnowballPorterFilterFactory. So our idea was to change the default search fields for explicit phrases to achieve exact matches, by using a simple data format like for example “string“ (StrField, without advanced options). Extending our example from the last mail: qf="title text" Datatype of title, text, something like “text_advanced”: ... Data type of the additional fields titleExact, textExact: omitNorms="true"/> q="ran away from home" Cat Dog -transformTo-> q=( titleExact:"ran away from home" OR textExact:"ran away from home" ) Cat Dog. Regards, Richard. BINSERV Gesellschaft für interaktive Konzepte und neue Medien mbH Software Engineer Gotenstr. 7-9 53175 Bonn Tel.: +49 (0)228 / 4 22 86 - 38 Fax.: +49 (0)228 / 4 22 86 - 538 E-Mail: r.tant...@binserv.de Web: www.binserv.de www.binforcepro.de Geschäftsführer: Rüdiger Jakob Amtsgericht: Siegburg HRB 6765 Hauptsitz der Gesellschaft.: Pfarrer-Wichert-Str. 35, 53639 Königswinter Diese E-Mail einschließlich eventuell angehängter Dateien enthält vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige Adressat sind und diese E-Mail irrtümlich erhalten haben, dürfen Sie weder den Inhalt dieser E-Mail nutzen noch dürfen Sie die eventuell angehängten Dateien öffnen und auch nichts kopieren oder weitergeben/verbreiten. Bitte verständigen Sie den Absender und löschen Sie diese E-Mail und eventuell angehängte Dateien umgehend. Vielen Dank! - Original message - Von: Jack Krupansky [mailto:j...@basetechnology.com] Gesendet: Freitag, 30. November 2012 23:04 An: solr-user@lucene.apache.org Betreff: Re: Edismax query parser and phrase queries I don’t have a simple answer for your stated issue, but maybe part of that is because I’m not so sure what the exact problem/goal is. I mean, what’s so special about phrase queries for your app than they need distinct processing from individual terms? And, ultimately, what goal are you trying to achieve? Such as, how will the outcome of the query affect what users see and do. -- Jack Krupansky From: Tantius, Richard Sent: Friday, November 30, 2012 8:44 AM To: solr-user@lucene.apache.org Subject: Edismax query parser and phrase queries Hi, we are using the edismax query parser and execute queries on specific fields by using the qf option. Like others, we are facing the problem we do not want explicit phrase queries to be performed on some of the qf fields and also require additional search fields for those kind of queries. We tried to expand explicit phrases in a query by implementing some pre-processing logic, which did not seemed to be quite convenient. So for example (lets assume qf="title text", we want phrase queries to be performed on the additional fields "titleAlt textAlt" ): q="ran away from home" Cat Dog -transformTo-> q=( titleAlt:"ran away from home" OR textAlt:"ran away from home" ) Cat Dog. Unfortunately this gets rather complicated if logic operators are involved within the query. Is there some kind of best practice, should we for example extend the query parser, or stick to our pre-processing approach? Regards, Richard.
AW: Edismax query parser and phrase queries
Hi, the use case we have in mind is that we would like to achieve exact matches for explicit phrases. Our users expect that an explicit phrase not only considers the order of terms, but also the exact wording. Therefore if we search on fields using a data type that is not meant performing exact matches, we need to change that for explicit phrases. This means in a usual query we have qf default fields using advanced tokenization (for query processing and indexing), for example like stemming via SnowballPorterFilterFactory. So our idea was to change the default search fields for explicit phrases to achieve exact matches, by using a simple data format like for example “string“ (StrField, without advanced options). Extending our example from the last mail: qf="title text" Datatype of title, text, something like “text_advanced”: ... Data type of the additional fields titleExact, textExact: q="ran away from home" Cat Dog -transformTo-> q=( titleExact:"ran away from home" OR textExact:"ran away from home" ) Cat Dog. Regards, Richard. BINSERV Gesellschaft für interaktive Konzepte und neue Medien mbH Software Engineer Gotenstr. 7-9 53175 Bonn Tel.: +49 (0)228 / 4 22 86 - 38 Fax.: +49 (0)228 / 4 22 86 - 538 E-Mail: r.tant...@binserv.de Web: www.binserv.de www.binforcepro.de Geschäftsführer: Rüdiger Jakob Amtsgericht: Siegburg HRB 6765 Hauptsitz der Gesellschaft.: Pfarrer-Wichert-Str. 35, 53639 Königswinter Diese E-Mail einschließlich eventuell angehängter Dateien enthält vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige Adressat sind und diese E-Mail irrtümlich erhalten haben, dürfen Sie weder den Inhalt dieser E-Mail nutzen noch dürfen Sie die eventuell angehängten Dateien öffnen und auch nichts kopieren oder weitergeben/verbreiten. Bitte verständigen Sie den Absender und löschen Sie diese E-Mail und eventuell angehängte Dateien umgehend. Vielen Dank! - Original message - Von: Jack Krupansky [mailto:j...@basetechnology.com] Gesendet: Freitag, 30. November 2012 23:04 An: solr-user@lucene.apache.org Betreff: Re: Edismax query parser and phrase queries I don’t have a simple answer for your stated issue, but maybe part of that is because I’m not so sure what the exact problem/goal is. I mean, what’s so special about phrase queries for your app than they need distinct processing from individual terms? And, ultimately, what goal are you trying to achieve? Such as, how will the outcome of the query affect what users see and do. -- Jack Krupansky From: Tantius, Richard Sent: Friday, November 30, 2012 8:44 AM To: solr-user@lucene.apache.org Subject: Edismax query parser and phrase queries Hi, we are using the edismax query parser and execute queries on specific fields by using the qf option. Like others, we are facing the problem we do not want explicit phrase queries to be performed on some of the qf fields and also require additional search fields for those kind of queries. We tried to expand explicit phrases in a query by implementing some pre-processing logic, which did not seemed to be quite convenient. So for example (lets assume qf="title text", we want phrase queries to be performed on the additional fields "titleAlt textAlt" ): q="ran away from home" Cat Dog -transformTo-> q=( titleAlt:"ran away from home" OR textAlt:"ran away from home" ) Cat Dog. Unfortunately this gets rather complicated if logic operators are involved within the query. Is there some kind of best practice, should we for example extend the query parser, or stick to our pre-processing approach? Regards, Richard.