Allow Scorer to expose positions and payloads aka. nuke spans 
--------------------------------------------------------------

                 Key: LUCENE-2878
                 URL: https://issues.apache.org/jira/browse/LUCENE-2878
             Project: Lucene - Java
          Issue Type: Improvement
          Components: Search
    Affects Versions: Bulk Postings branch
            Reporter: Simon Willnauer
            Assignee: Simon Willnauer
         Attachments: LUCENE-2878.patch

Currently we have two somewhat separate types of queries, the one which can 
make use of positions (mainly spans) and payloads (spans). Yet Span*Query 
doesn't really do scoring comparable to what other queries do and at the end of 
the day they are duplicating lot of code all over lucene. Span*Queries are also 
limited to other Span*Query instances such that you can not use a TermQuery or 
a BooleanQuery with SpanNear or anthing like that. 
Beside of the Span*Query limitation other queries lacking a quiet interesting 
feature since they can not score based on term proximity since scores doesn't 
expose any positional information. All those problems bugged me for a while now 
so I stared working on that using the bulkpostings API. I would have done that 
first cut on trunk but TermScorer is working on BlockReader that do not expose 
positions while the one in this branch does. I started adding a new Positions 
class which users can pull from a scorer, to prevent unnecessary positions 
enums I added ScorerContext#needsPositions and eventually Scorere#needsPayloads 
to create the corresponding enum on demand. Yet, currently only TermQuery / 
TermScorer implements this API and other simply return null instead. 
To show that the API really works and our BulkPostings work fine too with 
positions I cut over TermSpanQuery to use a TermScorer under the hood and nuked 
TermSpans entirely. A nice sideeffect of this was that the Position BulkReading 
implementation got some exercise which now :) work all with positions while 
Payloads for bulkreading are kind of experimental in the patch and those only 
work with Standard codec. 

So all spans now work on top of TermScorer ( I truly hate spans since today ) 
including the ones that need Payloads (StandardCodec ONLY)!!  I didn't bother 
to implement the other codecs yet since I want to get feedback on the API and 
on this first cut before I go one with it. I will upload the corresponding 
patch in a minute. 

I also had to cut over SpanQuery.getSpans(IR) to 
SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk 
first but after that pain today I need a break first :).

The patch passes all core tests 
(org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't 
look into the MemoryIndex BulkPostings API yet)


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to