[
https://issues.apache.org/jira/browse/SOLR-1980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13043399#comment-13043399
]
Jan Høydahl commented on SOLR-1980:
-----------------------------------
I'm sure I can get it working the way I started, using CharFilter, however
perhaps it's possible to implement in a more generic and Lucene-like query
syntax utilizing position info from the index:
{code}
title:"quick fox"@N:M
{code}
This would mean that the phrase must be anchored between N'th and M'th token
position in the field. Negative values for N/M would mean relative to the end.
Thus "^quick fox$" could be written
{code}
title:"quick fox"@0:-0
{code}
Or if you require the phrase to be within first 10 words OR last 10 words:
{code}
title:("quick fox"@0:10 OR "quick fox"@-10:-0)
{code}
Requiring a term to be exactly @ position 3 would be:
{code}
title:fox@3:3
{code}
If this syntax is feasible, we could use same syntax in eDisMax's pf param in
order to tell it to add a position constraint when forming the pf part of the
query:
{code}
pf=title@0:-0
{code}
This would only generate a phrase match on title if the phrase is an exact
match of the whole field.
Potential issues with multi-valued fields? Is the field delimiter clearly
marked or is it only an increment gap?
Would it be easy to parse such a syntax and generate a Lucene query with the
position constraints?
> Implement boundary match support
> --------------------------------
>
> Key: SOLR-1980
> URL: https://issues.apache.org/jira/browse/SOLR-1980
> Project: Solr
> Issue Type: New Feature
> Components: Schema and Analysis
> Reporter: Jan Høydahl
>
> Sometimes you need to specify that a query should match only at the start or
> end of a field, or be an exact match.
> Example content:
> 1) a quick fox is brown
> 2) quick fox is brown
> Example queries:
> "^quick fox" -> should only match 2)
> "brown$" -> should match 1) and 2)
> "^quick fox is brown$" -> should only match 2)
> Proposed way of implmementation is through a new BoundaryMatchTokenFilter
> which behaves like this:
> On the index side it inserts special unique tokens at beginning and end of
> field. These could be some weird unicode sequence.
> On the query side, it looks for the first character matching "^" or the last
> character matching "$" and replaces them with the special tokens.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]