[
https://issues.apache.org/jira/browse/LUCENE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16597355#comment-16597355
]
Martin Hermann commented on LUCENE-8196:
----------------------------------------
[~romseygeek]
1) I agree that this might be a solution, but as it differs from the setting of
the paper should be done very carefully.
2) Internal slop seems like a great idea! You're right, my example wasn't very
good and {{Intervals.phrase()}} already does that. But still, if you think of a
bigger query and e.g. one slop (say, {{"a ("big bad" OR evil) wolf", one
additional token allowed somewhere}}), the problem remains. I don't really see
how 'internal slop' would differ from 'normal slop' (doesn't it measure the
exact same thing?), but it seems rather easy to implement and like something
that would be desirable and solve this issue.
3) I'm not quite sure if I understand that correctly. Do you mean using a gap
in the query and rewrite it to something like
{noformat}
"bad wolf" (slop 1) contained by "big GAP wolf" (slop 2)
{noformat}
or adding the gap automatically somewhere down the way? I think in the first
case it'd still be possible to construct some (maybe a little bit more
complicated) examples that can't be solved like that and where the minimal
intervals behaviour does not match intuition.
Again, while a lot of these queries may seem quite exotic, I think that
intervals will get used a lot various programmatically generated queries (as
spans do now), and there pretty much anything can happen.
> Add IntervalQuery and IntervalsSource to expose minimum interval semantics
> across term fields
> ---------------------------------------------------------------------------------------------
>
> Key: LUCENE-8196
> URL: https://issues.apache.org/jira/browse/LUCENE-8196
> Project: Lucene - Core
> Issue Type: New Feature
> Reporter: Alan Woodward
> Assignee: Alan Woodward
> Priority: Major
> Fix For: 7.4
>
> Attachments: LUCENE-8196-debug.patch, LUCENE-8196.patch,
> LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> This ticket proposes an alternative implementation of the SpanQuery family
> that uses minimum-interval semantics from
> [http://vigna.di.unimi.it/ftp/papers/EfficientAlgorithmsMinimalIntervalSemantics.pdf]
> to implement positional queries across term-based fields. Rather than using
> TermQueries to construct the interval operators, as in LUCENE-2878 or the
> current Spans implementation, we instead use a new IntervalsSource object,
> which will produce IntervalIterators over a particular segment and field.
> These are constructed using various static helper methods, and can then be
> passed to a new IntervalQuery which will return documents that contain one or
> more intervals so defined.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]