AW: Edismax query parser and phrase queries

2012-12-03 Thread Tantius, Richard
Hi,
the use case we have in mind is that we would like to achieve exact matches for 
explicit phrases. Our users expect that an explicit phrase not only considers 
the order of terms, but also the exact wording. Therefore if we search on 
fields using a data type that is not meant performing exact matches, we need to 
change that for explicit phrases. This means in a usual query we have qf 
default fields using advanced tokenization (for query processing and indexing), 
for example like stemming via SnowballPorterFilterFactory. So our idea was to 
change the default search fields for explicit phrases to achieve exact matches, 
by using a simple data format like for example “string“ (StrField, without 
advanced options).

Extending our example from the last mail: 

qf=title text

Datatype of title, text, something like “text_advanced”:

fieldtype ...
 analyzer type=index !--(and also analyzer type=query )--
  filter class=solr.WordDelimiterFilterFactory ...
  filter class=solr.LowerCaseFilterFactory /
  filter class=solr.SnowballPorterFilterFactory language=German2 /
...

Data type of the additional fields titleExact, textExact:
fieldType name=string class=solr.StrField sortMissingLast=true 
omitNorms=true/

q=ran away from home Cat Dog 

-transformTo-

q=( titleExact:ran away from home OR textExact:ran away from home ) Cat Dog.

Regards,
Richard.

BINSERV
Gesellschaft für interaktive Konzepte und neue Medien mbH
Software Engineer

Gotenstr. 7-9
53175 Bonn
Tel.: +49 (0)228 / 4 22 86 - 38 
Fax.: +49 (0)228 / 4 22 86 - 538
E-Mail:   r.tant...@binserv.de  
Web:  www.binserv.de
  www.binforcepro.de

Geschäftsführer: Rüdiger Jakob
Amtsgericht: Siegburg HRB 6765
Hauptsitz der Gesellschaft.: Pfarrer-Wichert-Str. 35, 53639 Königswinter
Diese E-Mail einschließlich eventuell angehängter Dateien enthält vertrauliche 
und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige 
Adressat sind und diese E-Mail irrtümlich erhalten haben, dürfen Sie weder den 
Inhalt dieser E-Mail nutzen noch dürfen Sie die eventuell angehängten Dateien 
öffnen und auch nichts kopieren oder weitergeben/verbreiten. Bitte verständigen 
Sie den Absender und löschen Sie diese E-Mail und eventuell angehängte Dateien 
umgehend. Vielen Dank!


- Original message -
Von: Jack Krupansky [mailto:j...@basetechnology.com] 
Gesendet: Freitag, 30. November 2012 23:04
An: solr-user@lucene.apache.org
Betreff: Re: Edismax query parser and phrase queries

I don’t have a simple answer for your stated issue, but maybe part of that is 
because I’m not so sure what the exact problem/goal is. I mean, what’s so 
special about phrase queries for your app than they need distinct processing 
from individual terms?

And, ultimately, what goal are you trying to achieve? Such as, how will the 
outcome of the query affect what users see and do.

-- Jack Krupansky

From: Tantius, Richard
Sent: Friday, November 30, 2012 8:44 AM
To: solr-user@lucene.apache.org
Subject: Edismax query parser and phrase queries

Hi,

we are using the edismax query parser and execute queries on specific fields by 
using the qf option. Like others, we are facing the problem we do not want 
explicit phrase queries to be performed on some of the qf fields and also 
require additional search fields for those kind of queries.

We tried to expand explicit phrases in a query by implementing some 
pre-processing logic, which did not seemed to be quite convenient.

So for example (lets assume qf=title text, we want phrase queries to be 
performed on the additional fields titleAlt textAlt ): q=ran away from home 
Cat Dog -transformTo- q=( titleAlt:ran away from home OR textAlt:ran away 
from home ) Cat Dog. Unfortunately this gets rather complicated if logic 
operators are involved within the query. Is there some kind of best practice, 
should we for example extend the query parser, or stick to our pre-processing 
approach?


Regards,
Richard.




Re: AW: Edismax query parser and phrase queries

2012-12-03 Thread Jack Krupansky
Okay, so the bottom line here is that you wish to change the semantics of 
quoted phrases. Fine, that's your prerogative, but a change in semantics 
would require a change to the query parser, or as you originally indicated, 
a pre-processor. It does sound as if a pre-processor is the way to go here.


You still have a choice: An application-level preprocessor that generates an 
edismax query, or implement a Solr SearchComponent that pre-processes the 
query after Solr receives it but before edismax sees it. The former is 
probably easier. The only question is whether there might be multiple 
applications that access the same Solr node, so that maybe centralizing the 
pre-processing in Solr might be warranted.


-- Jack Krupansky

-Original Message- 
From: Tantius, Richard

Sent: Monday, December 03, 2012 5:03 AM
To: solr-user@lucene.apache.org
Subject: AW: Edismax query parser and phrase queries

Hi,
the use case we have in mind is that we would like to achieve exact matches 
for explicit phrases. Our users expect that an explicit phrase not only 
considers the order of terms, but also the exact wording. Therefore if we 
search on fields using a data type that is not meant performing exact 
matches, we need to change that for explicit phrases. This means in a usual 
query we have qf default fields using advanced tokenization (for query 
processing and indexing), for example like stemming via 
SnowballPorterFilterFactory. So our idea was to change the default search 
fields for explicit phrases to achieve exact matches, by using a simple data 
format like for example “string“ (StrField, without advanced options).


Extending our example from the last mail:

qf=title text

Datatype of title, text, something like “text_advanced”:

fieldtype ...
analyzer type=index !--(and also analyzer type=query )--
 filter class=solr.WordDelimiterFilterFactory ...
 filter class=solr.LowerCaseFilterFactory /
 filter class=solr.SnowballPorterFilterFactory language=German2 /
...

Data type of the additional fields titleExact, textExact:
fieldType name=string class=solr.StrField sortMissingLast=true 
omitNorms=true/


q=ran away from home Cat Dog

-transformTo-

q=( titleExact:ran away from home OR textExact:ran away from home ) Cat 
Dog.


Regards,
Richard.

BINSERV
Gesellschaft für interaktive Konzepte und neue Medien mbH
Software Engineer

Gotenstr. 7-9
53175 Bonn
Tel.: +49 (0)228 / 4 22 86 - 38
Fax.: +49 (0)228 / 4 22 86 - 538
E-Mail:   r.tant...@binserv.de
Web:  www.binserv.de
   www.binforcepro.de

Geschäftsführer: Rüdiger Jakob
Amtsgericht: Siegburg HRB 6765
Hauptsitz der Gesellschaft.: Pfarrer-Wichert-Str. 35, 53639 Königswinter
Diese E-Mail einschließlich eventuell angehängter Dateien enthält 
vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht der 
richtige Adressat sind und diese E-Mail irrtümlich erhalten haben, dürfen 
Sie weder den Inhalt dieser E-Mail nutzen noch dürfen Sie die eventuell 
angehängten Dateien öffnen und auch nichts kopieren oder 
weitergeben/verbreiten. Bitte verständigen Sie den Absender und löschen Sie 
diese E-Mail und eventuell angehängte Dateien umgehend. Vielen Dank!



- Original message -
Von: Jack Krupansky [mailto:j...@basetechnology.com]
Gesendet: Freitag, 30. November 2012 23:04
An: solr-user@lucene.apache.org
Betreff: Re: Edismax query parser and phrase queries

I don’t have a simple answer for your stated issue, but maybe part of that 
is because I’m not so sure what the exact problem/goal is. I mean, what’s so 
special about phrase queries for your app than they need distinct processing 
from individual terms?


And, ultimately, what goal are you trying to achieve? Such as, how will the 
outcome of the query affect what users see and do.


-- Jack Krupansky

From: Tantius, Richard
Sent: Friday, November 30, 2012 8:44 AM
To: solr-user@lucene.apache.org
Subject: Edismax query parser and phrase queries

Hi,

we are using the edismax query parser and execute queries on specific fields 
by using the qf option. Like others, we are facing the problem we do not 
want explicit phrase queries to be performed on some of the qf fields and 
also require additional search fields for those kind of queries.


We tried to expand explicit phrases in a query by implementing some 
pre-processing logic, which did not seemed to be quite convenient.


So for example (lets assume qf=title text, we want phrase queries to be 
performed on the additional fields titleAlt textAlt ): q=ran away from 
home Cat Dog -transformTo- q=( titleAlt:ran away from home OR 
textAlt:ran away from home ) Cat Dog. Unfortunately this gets rather 
complicated if logic operators are involved within the query. Is there some 
kind of best practice, should we for example extend the query parser, or 
stick to our pre-processing approach?



Regards,
Richard.




Re: AW: Edismax query parser and phrase queries

2012-12-03 Thread Erick Erickson
It _seems_ like just adding phrase fields (qf) to your edismax defaults
gets you close. It would have the problem of matching if the field were
longer... but it might be close enough.

Otherwise, why not just add in fq clauses on your exact fields? Because one
problem you'll have is that you need to get the parameters past the parser
to the field, which will be...er...interesting.

And one note. Rather than String fields (which are case sensitive),
consider KeywordTokenizer and LowercaseFilter or some such.

But I'd _really_ prove that you can't get close enough with current
functionality before I went down the custom route. Often things like this
seem like a good idea but then don't improve results enough to be worth the
complexity.

Best
Erick


On Mon, Dec 3, 2012 at 8:00 AM, Jack Krupansky j...@basetechnology.comwrote:

 Okay, so the bottom line here is that you wish to change the semantics of
 quoted phrases. Fine, that's your prerogative, but a change in semantics
 would require a change to the query parser, or as you originally indicated,
 a pre-processor. It does sound as if a pre-processor is the way to go here.

 You still have a choice: An application-level preprocessor that generates
 an edismax query, or implement a Solr SearchComponent that pre-processes
 the query after Solr receives it but before edismax sees it. The former is
 probably easier. The only question is whether there might be multiple
 applications that access the same Solr node, so that maybe centralizing the
 pre-processing in Solr might be warranted.

 -- Jack Krupansky

 -Original Message- From: Tantius, Richard
 Sent: Monday, December 03, 2012 5:03 AM
 To: solr-user@lucene.apache.org
 Subject: AW: Edismax query parser and phrase queries


 Hi,
 the use case we have in mind is that we would like to achieve exact
 matches for explicit phrases. Our users expect that an explicit phrase not
 only considers the order of terms, but also the exact wording. Therefore if
 we search on fields using a data type that is not meant performing exact
 matches, we need to change that for explicit phrases. This means in a usual
 query we have qf default fields using advanced tokenization (for query
 processing and indexing), for example like stemming via
 SnowballPorterFilterFactory. So our idea was to change the default search
 fields for explicit phrases to achieve exact matches, by using a simple
 data format like for example “string“ (StrField, without advanced options).

 Extending our example from the last mail:

 qf=title text

 Datatype of title, text, something like “text_advanced”:

 fieldtype ...
 analyzer type=index !--(and also analyzer type=query )--
  filter class=solr.**WordDelimiterFilterFactory ...
  filter class=solr.**LowerCaseFilterFactory /
  filter class=solr.**SnowballPorterFilterFactory language=German2 /
 ...

 Data type of the additional fields titleExact, textExact:
 fieldType name=string class=solr.StrField sortMissingLast=true
 omitNorms=true/

 q=ran away from home Cat Dog

 -transformTo-

 q=( titleExact:ran away from home OR textExact:ran away from home )
 Cat Dog.

 Regards,
 Richard.

 BINSERV
 Gesellschaft für interaktive Konzepte und neue Medien mbH
 Software Engineer

 Gotenstr. 7-9
 53175 Bonn
 Tel.: +49 (0)228 / 4 22 86 - 38
 Fax.: +49 (0)228 / 4 22 86 - 538
 E-Mail:   r.tant...@binserv.de
 Web:  www.binserv.de
www.binforcepro.de

 Geschäftsführer: Rüdiger Jakob
 Amtsgericht: Siegburg HRB 6765
 Hauptsitz der Gesellschaft.: Pfarrer-Wichert-Str. 35, 53639 Königswinter
 Diese E-Mail einschließlich eventuell angehängter Dateien enthält
 vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht
 der richtige Adressat sind und diese E-Mail irrtümlich erhalten haben,
 dürfen Sie weder den Inhalt dieser E-Mail nutzen noch dürfen Sie die
 eventuell angehängten Dateien öffnen und auch nichts kopieren oder
 weitergeben/verbreiten. Bitte verständigen Sie den Absender und löschen Sie
 diese E-Mail und eventuell angehängte Dateien umgehend. Vielen Dank!


 - Original message -
 Von: Jack Krupansky [mailto:jack@basetechnology.**comj...@basetechnology.com
 ]
 Gesendet: Freitag, 30. November 2012 23:04
 An: solr-user@lucene.apache.org
 Betreff: Re: Edismax query parser and phrase queries

 I don’t have a simple answer for your stated issue, but maybe part of that
 is because I’m not so sure what the exact problem/goal is. I mean, what’s
 so special about phrase queries for your app than they need distinct
 processing from individual terms?

 And, ultimately, what goal are you trying to achieve? Such as, how will
 the outcome of the query affect what users see and do.

 -- Jack Krupansky

 From: Tantius, Richard
 Sent: Friday, November 30, 2012 8:44 AM
 To: solr-user@lucene.apache.org
 Subject: Edismax query parser and phrase queries

 Hi,

 we are using the edismax query parser and execute queries on specific
 fields by using the qf option. Like others, we are facing

Edismax query parser and phrase queries

2012-11-30 Thread Tantius, Richard
Hi,
we are using the edismax query parser and execute queries on specific fields by 
using the qf option. Like others, we are facing the problem we do not want 
explicit phrase queries to be performed on some of the qf fields and also 
require additional search fields for those kind of queries.
We tried to expand explicit phrases in a query by implementing some 
pre-processing logic, which did not seemed to be quite convenient.
So for example (lets assume qf=title text, we want phrase queries to be 
performed on the additional fields titleAlt textAlt ): q=ran away from home 
Cat Dog -transformTo- q=( titleAlt:ran away from home OR textAlt:ran away 
from home ) Cat Dog. Unfortunately this gets rather complicated if logic 
operators are involved within the query. Is there some kind of best practice, 
should we for example extend the query parser, or stick to our pre-processing 
approach?

Regards,
Richard.

Richard Tantius
Software Engineer

[cid:image001.jpg@01CDCF09.3DA17860]

Gotenstr. 7-9
53175 Bonn
Tel.:+49 (0)228 / 4 22 86 - 38
Fax.:   +49 (0)228 / 4 22 86 - 538
E-Mail:   r.tant...@binserv.demailto:r.tant...@binserv.de
Web:  www.binserv.dehttp://www.binserv.de/
   www.binforcepro.dehttp://www.binforcepro.de/

Geschäftsführer: Rüdiger Jakob
Amtsgericht: Siegburg HRB 6765
Hauptsitz der Gesellschaft.: Pfarrer-Wichert-Str. 35, 53639 Königswinter
Diese E-Mail einschließlich eventuell angehängter Dateien enthält vertrauliche 
und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige 
Adressat sind und diese E-Mail irrtümlich erhalten haben, dürfen Sie weder den 
Inhalt dieser E-Mail nutzen noch dürfen Sie die eventuell angehängten Dateien 
öffnen und auch nichts kopieren oder weitergeben/verbreiten. Bitte verständigen 
Sie den Absender und löschen Sie diese E-Mail und eventuell angehängte Dateien 
umgehend. Vielen Dank!




Re: Edismax query parser and phrase queries

2012-11-30 Thread Jack Krupansky
I don’t have a simple answer for your stated issue, but maybe part of that is 
because I’m not so sure what the exact problem/goal is. I mean, what’s so 
special about phrase queries for your app than they need distinct processing 
from individual terms?

And, ultimately, what goal are you trying to achieve? Such as, how will the 
outcome of the query affect what users see and do.

-- Jack Krupansky

From: Tantius, Richard 
Sent: Friday, November 30, 2012 8:44 AM
To: solr-user@lucene.apache.org 
Subject: Edismax query parser and phrase queries

Hi,

we are using the edismax query parser and execute queries on specific fields by 
using the qf option. Like others, we are facing the problem we do not want 
explicit phrase queries to be performed on some of the qf fields and also 
require additional search fields for those kind of queries.

We tried to expand explicit phrases in a query by implementing some 
pre-processing logic, which did not seemed to be quite convenient.

So for example (lets assume qf=title text, we want phrase queries to be 
performed on the additional fields titleAlt textAlt ): q=ran away from home 
Cat Dog -transformTo- q=( titleAlt:ran away from home OR textAlt:ran away 
from home ) Cat Dog. Unfortunately this gets rather complicated if logic 
operators are involved within the query. Is there some kind of best practice, 
should we for example extend the query parser, or stick to our pre-processing 
approach?

 

Regards,

Richard.

 

Richard Tantius
Software Engineer 



Gotenstr. 7-9
53175 Bonn
Tel.:+49 (0)228 / 4 22 86 - 38 
Fax.:   +49 (0)228 / 4 22 86 - 538
E-Mail:   r.tant...@binserv.de 
Web:  www.binserv.de
   www.binforcepro.de

Geschäftsführer: Rüdiger Jakob
Amtsgericht: Siegburg HRB 6765
Hauptsitz der Gesellschaft.: Pfarrer-Wichert-Str. 35, 53639 Königswinter
Diese E-Mail einschließlich eventuell angehängter Dateien enthält vertrauliche 
und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige 
Adressat sind und diese E-Mail irrtümlich erhalten haben, dürfen Sie weder den 
Inhalt dieser E-Mail nutzen noch dürfen Sie die eventuell angehängten Dateien 
öffnen und auch nichts kopieren oder weitergeben/verbreiten. Bitte verständigen 
Sie den Absender und löschen Sie diese E-Mail und eventuell angehängte Dateien 
umgehend. Vielen Dank!