Re: Exact matching without using new fields

2021-01-21 Thread Alexandre Rafalovitch
If, during index time, your "information" and "informed" are tokenized
into the same root (inform?), then you will not be able to distinguish
them without storing original forms somewhere, usually with copyField.
Same with information vs INFORMATION. The search happens based on
indexed tokens. Which you can test in Admin UI by seeing how your text
is indexed.

But if you do store the form you want to find, you have several
options with eDisMax pf2/pf3, or with Surround Query Parser.

Regards,
   Alex.

On Tue, 19 Jan 2021 at 15:02, gnandre  wrote:
>
> Thanks for replying, Dave.
>
> I am afraid that I am looking for non-index time i.e. query time solution.
>
> Actually in my case I am expecting both documents to be returned from your
> example. I am just trying to avoid returning of documents which contain a
> tokenized versions
> of the provided search query when it is enclosed within double quotes to
> indicate exact matching expectation.
>
> e.g.
> search query -> "information retrieval"
>
> This should match documents like following:
> doc 1: "information retrieval"
> doc 2: "Advanced information retrieval with Solr"
>
> but should NOT match documents like
> doc 3: "informed retrieval"
> doc 4: "information extraction"  (considering 'extraction' was a specified
> synonym of 'retrieval' )
> doc 5: "INFORMATION RETRIEVAL"
>
> etc
>
> I am also ok with these documents showing up as long as they show up at
> bottom. Also, query time solution is a must.
>
> On Tue, Jan 19, 2021 at 12:22 PM David R  wrote:
>
> > We had the same requirement. Just to echo back your requirements, I
> > understand your case to be this. Given these 2 doc titles:
> >
> > doc 1: "information retrieval"
> > doc 2: "Advanced information retrieval with Solr"
> >
> > You want a phrase search for "information retrieval" to find both
> > documents, but an EXACT phrase search for "information retrieval" to find
> > doc #1 only.
> >
> > If that's true, and case-sensitive search isn't a requirement, I indexed
> > this in the token stream, with adjacent positions of course.
> >
> > START information retrieval END
> > START advanced information retrieval with solr END
> >
> > And with our custom query parser, when an EXACT operator is found, I
> > tokenize the query to match the first case. Otherwise pass it through.
> >
> > Needs custom analyzers on the query and index sides to generate the
> > correct token sequences.
> >
> > It's worked out well for our case.
> >
> > Dave
> >
> >
> >
> > 
> > From: gnandre 
> > Sent: Tuesday, January 19, 2021 4:07 PM
> > To: solr-user@lucene.apache.org 
> > Subject: Exact matching without using new fields
> >
> > Hi,
> >
> > I am aware that to do exact matching (only whatever is provided inside
> > double quotes should be matched) in Solr, we can copy existing fields with
> > the help of copyFields into new fields that have very minimal tokenization
> > or no tokenization (e.g. using KeywordTokenizer or using string field type)
> >
> > However this solution is expensive in terms of index size because it might
> > almost double the size of the existing index.
> >
> > Is there any inexpensive way of achieving exact matches from the query
> > side. e.g. boost the original tokens more at query time compared to their
> > tokens?
> >


Re: Exact matching without using new fields

2021-01-21 Thread Doss
Hi,

You can try search query -> "+information +retrieval"

Meaning the document should have both the keywords. Doc 5 will also be in
the results.

https://lucene.apache.org/solr/guide/8_7/the-standard-query-parser.html#the-boolean-operator

- Mohandoss.

On Wed, Jan 20, 2021 at 1:38 AM gnandre  wrote:

> Thanks for replying, Dave.
>
> I am afraid that I am looking for non-index time i.e. query time solution.
>
> Actually in my case I am expecting both documents to be returned from your
> example. I am just trying to avoid returning of documents which contain a
> tokenized versions
> of the provided search query when it is enclosed within double quotes to
> indicate exact matching expectation.
>
> e.g.
> search query -> "information retrieval"
>
> This should match documents like following:
> doc 1: "information retrieval"
> doc 2: "Advanced information retrieval with Solr"
>
> but should NOT match documents like
> doc 3: "informed retrieval"
> doc 4: "information extraction"  (considering 'extraction' was a specified
> synonym of 'retrieval' )
> doc 5: "INFORMATION RETRIEVAL"
>
> etc
>
> I am also ok with these documents showing up as long as they show up at
> bottom. Also, query time solution is a must.
>
> On Tue, Jan 19, 2021 at 12:22 PM David R  wrote:
>
> > We had the same requirement. Just to echo back your requirements, I
> > understand your case to be this. Given these 2 doc titles:
> >
> > doc 1: "information retrieval"
> > doc 2: "Advanced information retrieval with Solr"
> >
> > You want a phrase search for "information retrieval" to find both
> > documents, but an EXACT phrase search for "information retrieval" to find
> > doc #1 only.
> >
> > If that's true, and case-sensitive search isn't a requirement, I indexed
> > this in the token stream, with adjacent positions of course.
> >
> > START information retrieval END
> > START advanced information retrieval with solr END
> >
> > And with our custom query parser, when an EXACT operator is found, I
> > tokenize the query to match the first case. Otherwise pass it through.
> >
> > Needs custom analyzers on the query and index sides to generate the
> > correct token sequences.
> >
> > It's worked out well for our case.
> >
> > Dave
> >
> >
> >
> > 
> > From: gnandre 
> > Sent: Tuesday, January 19, 2021 4:07 PM
> > To: solr-user@lucene.apache.org 
> > Subject: Exact matching without using new fields
> >
> > Hi,
> >
> > I am aware that to do exact matching (only whatever is provided inside
> > double quotes should be matched) in Solr, we can copy existing fields
> with
> > the help of copyFields into new fields that have very minimal
> tokenization
> > or no tokenization (e.g. using KeywordTokenizer or using string field
> type)
> >
> > However this solution is expensive in terms of index size because it
> might
> > almost double the size of the existing index.
> >
> > Is there any inexpensive way of achieving exact matches from the query
> > side. e.g. boost the original tokens more at query time compared to their
> > tokens?
> >
>


Re: Exact matching without using new fields

2021-01-19 Thread gnandre
Thanks for replying, Dave.

I am afraid that I am looking for non-index time i.e. query time solution.

Actually in my case I am expecting both documents to be returned from your
example. I am just trying to avoid returning of documents which contain a
tokenized versions
of the provided search query when it is enclosed within double quotes to
indicate exact matching expectation.

e.g.
search query -> "information retrieval"

This should match documents like following:
doc 1: "information retrieval"
doc 2: "Advanced information retrieval with Solr"

but should NOT match documents like
doc 3: "informed retrieval"
doc 4: "information extraction"  (considering 'extraction' was a specified
synonym of 'retrieval' )
doc 5: "INFORMATION RETRIEVAL"

etc

I am also ok with these documents showing up as long as they show up at
bottom. Also, query time solution is a must.

On Tue, Jan 19, 2021 at 12:22 PM David R  wrote:

> We had the same requirement. Just to echo back your requirements, I
> understand your case to be this. Given these 2 doc titles:
>
> doc 1: "information retrieval"
> doc 2: "Advanced information retrieval with Solr"
>
> You want a phrase search for "information retrieval" to find both
> documents, but an EXACT phrase search for "information retrieval" to find
> doc #1 only.
>
> If that's true, and case-sensitive search isn't a requirement, I indexed
> this in the token stream, with adjacent positions of course.
>
> START information retrieval END
> START advanced information retrieval with solr END
>
> And with our custom query parser, when an EXACT operator is found, I
> tokenize the query to match the first case. Otherwise pass it through.
>
> Needs custom analyzers on the query and index sides to generate the
> correct token sequences.
>
> It's worked out well for our case.
>
> Dave
>
>
>
> 
> From: gnandre 
> Sent: Tuesday, January 19, 2021 4:07 PM
> To: solr-user@lucene.apache.org 
> Subject: Exact matching without using new fields
>
> Hi,
>
> I am aware that to do exact matching (only whatever is provided inside
> double quotes should be matched) in Solr, we can copy existing fields with
> the help of copyFields into new fields that have very minimal tokenization
> or no tokenization (e.g. using KeywordTokenizer or using string field type)
>
> However this solution is expensive in terms of index size because it might
> almost double the size of the existing index.
>
> Is there any inexpensive way of achieving exact matches from the query
> side. e.g. boost the original tokens more at query time compared to their
> tokens?
>


Re: Exact matching without using new fields

2021-01-19 Thread David R
We had the same requirement. Just to echo back your requirements, I understand 
your case to be this. Given these 2 doc titles:

doc 1: "information retrieval"
doc 2: "Advanced information retrieval with Solr"

You want a phrase search for "information retrieval" to find both documents, 
but an EXACT phrase search for "information retrieval" to find doc #1 only.

If that's true, and case-sensitive search isn't a requirement, I indexed this 
in the token stream, with adjacent positions of course.

START information retrieval END
START advanced information retrieval with solr END

And with our custom query parser, when an EXACT operator is found, I tokenize 
the query to match the first case. Otherwise pass it through.

Needs custom analyzers on the query and index sides to generate the correct 
token sequences.

It's worked out well for our case.

Dave




From: gnandre 
Sent: Tuesday, January 19, 2021 4:07 PM
To: solr-user@lucene.apache.org 
Subject: Exact matching without using new fields

Hi,

I am aware that to do exact matching (only whatever is provided inside
double quotes should be matched) in Solr, we can copy existing fields with
the help of copyFields into new fields that have very minimal tokenization
or no tokenization (e.g. using KeywordTokenizer or using string field type)

However this solution is expensive in terms of index size because it might
almost double the size of the existing index.

Is there any inexpensive way of achieving exact matches from the query
side. e.g. boost the original tokens more at query time compared to their
tokens?


Exact matching without using new fields

2021-01-19 Thread gnandre
Hi,

I am aware that to do exact matching (only whatever is provided inside
double quotes should be matched) in Solr, we can copy existing fields with
the help of copyFields into new fields that have very minimal tokenization
or no tokenization (e.g. using KeywordTokenizer or using string field type)

However this solution is expensive in terms of index size because it might
almost double the size of the existing index.

Is there any inexpensive way of achieving exact matches from the query
side. e.g. boost the original tokens more at query time compared to their
tokens?