[jira] Commented: (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans

Simon Willnauer (JIRA) Sat, 22 Jan 2011 07:39:09 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12985138#action_12985138
 ]


Simon Willnauer commented on LUCENE-2878:
-----------------------------------------

First of all thanks for the reviews so far... seems like the overall feedback 
is positive though. I should have made my point more clear that the most of 
this patch is a PoC more than anything else. The changes to spans are rather to 
proof correctness and give the bulk api a little exercise.

bq. I don't like the MultiSpansWrapper... it only exists to support 
PayloadSpanUtil and tests? I don't like PayloadSpanUtil either, i think its a 
huge trap and it should move to src/test or be removed.

The reason for MultSpansWrapper is pretty simple. MultiBulkEnum is not entirely 
implemented yet so I had to move the Spans to per-segment using 
AtomicReaderContext. What I needed was an easy way to cut over all the tests 
and that seemed to be straight forward. Its really just a testing class and 
should go away over time.

bq. any ideas of the performance comparison of this patch as-is 
(SpanScorerWrapper over the Scorer/bulk enum) versus the existing Spans which 
doesn't use the bulk API?
not yet, once I get back to this I will run a benchmark

bq. A first impression, no time to read the whole patch yet.
my plans are to cut over spans to AtomicREaderContext on trunk so this patch 
will be way way smaller and easier to read maybe you should just skip all the 
span stuff for now. those are just ARC based changes

bq. PayloadSpanUtil

<dream> Spans as they exist today should go away anyway so this is not much of 
a deal for now.</dream>

bq. The name Positions is understated though. It has a begin and an end that 
can be iterating/iterated in place, so it is actually "a" spans.
I agree we might need to go away from this start end thing and have a wrapper 
on top of Position(s)

bq. I hope that the addition to TermScorer does not slow it down.
Those positions == null checks are well predictable and should be optimzed away 
- if hotspot plays tricks on us we can still specialize a scorer....




> Allow Scorer to expose positions and payloads aka. nuke spans 
> --------------------------------------------------------------
>
>                 Key: LUCENE-2878
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2878
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: Bulk Postings branch
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>         Attachments: LUCENE-2878.patch
>
>
> Currently we have two somewhat separate types of queries, the one which can 
> make use of positions (mainly spans) and payloads (spans). Yet Span*Query 
> doesn't really do scoring comparable to what other queries do and at the end 
> of the day they are duplicating lot of code all over lucene. Span*Queries are 
> also limited to other Span*Query instances such that you can not use a 
> TermQuery or a BooleanQuery with SpanNear or anthing like that. 
> Beside of the Span*Query limitation other queries lacking a quiet interesting 
> feature since they can not score based on term proximity since scores doesn't 
> expose any positional information. All those problems bugged me for a while 
> now so I stared working on that using the bulkpostings API. I would have done 
> that first cut on trunk but TermScorer is working on BlockReader that do not 
> expose positions while the one in this branch does. I started adding a new 
> Positions class which users can pull from a scorer, to prevent unnecessary 
> positions enums I added ScorerContext#needsPositions and eventually 
> Scorere#needsPayloads to create the corresponding enum on demand. Yet, 
> currently only TermQuery / TermScorer implements this API and other simply 
> return null instead. 
> To show that the API really works and our BulkPostings work fine too with 
> positions I cut over TermSpanQuery to use a TermScorer under the hood and 
> nuked TermSpans entirely. A nice sideeffect of this was that the Position 
> BulkReading implementation got some exercise which now :) work all with 
> positions while Payloads for bulkreading are kind of experimental in the 
> patch and those only work with Standard codec. 
> So all spans now work on top of TermScorer ( I truly hate spans since today ) 
> including the ones that need Payloads (StandardCodec ONLY)!!  I didn't bother 
> to implement the other codecs yet since I want to get feedback on the API and 
> on this first cut before I go one with it. I will upload the corresponding 
> patch in a minute. 
> I also had to cut over SpanQuery.getSpans(IR) to 
> SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk 
> first but after that pain today I need a break first :).
> The patch passes all core tests 
> (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't 
> look into the MemoryIndex BulkPostings API yet)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans

Reply via email to