[ 
https://issues.apache.org/jira/browse/LUCENE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16391478#comment-16391478
 ] 

Adrien Grand commented on LUCENE-8196:
--------------------------------------

Thanks Alan. I agree that growing a separate hierarchy of objects might help 
land this feature. We might even want to put first iterations of this work in 
sandbox to give time for the API to stabilize before we move it to core or misc.

I have some questions/comments:
 - Do we need {{IntervalIterator.score()}}? It seems to be the same value on 
all implementations.
 - Do we need {{advanceTo}}? It seems to me that things would be simpler and as 
efficient if you documented that nextPosition() may only be called when the 
approximation is positioned and then {{advanceTo}} would be equivalent to 
checking the return value of {{nextInterval}}?
 - Let's make the {{IntervalFunction}} API an implementation detail?
 - The documentation of {{cost()}} says it is the cost of finding the next 
interval but given how you use it in the query it looks like it is actually 
more about the average cost of iterating over _all_ intervals.
 - In terms of testing I would like some form of AssertingIntervalsSource to 
make sure that intervals are always consumed in legal ways and behave correctly.
 - More docs would help read the code. For instance IntervalsSource.intervals 
has no docs. By the way we might want to mention there that the same instance 
might be reused across calls.
 - TermIntervalsSource should check whether positions were indexed.
 - I was a bit annoyed to see the field masking hack but actually those 
intervals source do not need term statistics which makes the hack less 
horrible. Could you still document it to make sure users are aware it is a hack 
and explain it which circumstances it might be ok?

> Add IntervalQuery and IntervalsSource to expose minimum interval semantics 
> across term fields
> ---------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-8196
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8196
>             Project: Lucene - Core
>          Issue Type: New Feature
>            Reporter: Alan Woodward
>            Assignee: Alan Woodward
>            Priority: Major
>         Attachments: LUCENE-8196.patch
>
>
> This ticket proposes an alternative implementation of the SpanQuery family 
> that uses minimum-interval semantics from 
> [http://vigna.di.unimi.it/ftp/papers/EfficientAlgorithmsMinimalIntervalSemantics.pdf]
>  to implement positional queries across term-based fields.  Rather than using 
> TermQueries to construct the interval operators, as in LUCENE-2878 or the 
> current Spans implementation, we instead use a new IntervalsSource object, 
> which will produce IntervalIterators over a particular segment and field.  
> These are constructed using various static helper methods, and can then be 
> passed to a new IntervalQuery which will return documents that contain one or 
> more intervals so defined.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to