[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16433796#comment-16433796 ] Adrien Grand commented on LUCENE-2878: -- +1 > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Robert Muir >Priority: Major > Labels: gsoc2014 > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878_trunk.patch, > LUCENE-2878_trunk.patch, PosHighlighter.patch, PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain today I need a break first :). > The patch passes all core tests > (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't > look into the MemoryIndex BulkPostings API yet) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16433777#comment-16433777 ] Alan Woodward commented on LUCENE-2878: --- This ticket has changed direction wildly a few times, but always had two basic requirements: * allow positional queries with a nicer API and a more concrete theoretic footing than Spans * make it possible to build accurate highlighters We now have IntervalQuery (LUCENE-8196) that does the first, and Matches (LUCENE-8229) that does the second. There's still work to be done on both (I don't think we're quite ready to nuke Spans, and phrase/span/interval queries don't report their matches yet), but I think we can close this issue out now? > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Robert Muir >Priority: Major > Labels: gsoc2014 > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878_trunk.patch, > LUCENE-2878_trunk.patch, PosHighlighter.patch, PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain today I need a break first :). > The patch passes all core tests > (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't > look into the MemoryIndex BulkPostings API yet) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14967504#comment-14967504 ] Alan Woodward commented on LUCENE-2878: --- It's a bit disturbing quite how long it takes to scroll to the bottom of this JIRA now... Spans have received a lot of attention over the past year or so (many thanks to [~paul.elsc...@xs4all.nl]!), so they certainly aren't going to be removed. It's also become clear to me that there are many Scorers that it doesn't make sense to expose positions on. My rough plan of action from here is: * LUCENE-6845: make Spans *be* a Scorer * Move the basic SpanQueries out of the a.o.l.search.spans package and into a.o.l.search ** some of these queries can then move into the queries module, eg SpanFieldMaskingQuery, but I think it's worth having basic near/orderednear/nonoverlapping functionality in core * Turn TermQuery and PhraseQuery (and possibly others, eg MultiTermQuery) into SpanQueries ** this should be simple after LUCENE-6845, because they can still expose their efficient no-positions scorers, and additionally expose a Spans view Span scoring is still a bit odd (eg with a SpanOrQuery you can end up scoring terms that don't actually match in the current document), but that can be dealt with separately. We should probably close this as Won't Fix, but it's been open for so long it feels a bit wrong to do that :-) Maybe I'll wait until TermQuery and PhraseQuery can expose positions, and resolve as a duplicate instead. > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Robert Muir > Labels: gsoc2014 > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878_trunk.patch, > LUCENE-2878_trunk.patch, PosHighlighter.patch, PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14967239#comment-14967239 ] David Smiley commented on LUCENE-2878: -- Alan, could you please comment on the status of this journey? I know you've made incremental steps but it's not clear if Spans will eventually get nuked or not. > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Robert Muir > Labels: gsoc2014 > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878_trunk.patch, > LUCENE-2878_trunk.patch, PosHighlighter.patch, PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain today I need a break first :). > The patch passes all core tests > (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't > look into the MemoryIndex BulkPostings API yet) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14368851#comment-14368851 ] Paul Elschot commented on LUCENE-2878: -- Please have a look at LUCENE-6308, I think that is a good intermediate step for this. > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Robert Muir > Labels: gsoc2014 > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878_trunk.patch, > LUCENE-2878_trunk.patch, PosHighlighter.patch, PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain today I need a break first :). > The patch passes all core tests > (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't > look into the MemoryIndex BulkPostings API yet) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14275229#comment-14275229 ] Alan Woodward commented on LUCENE-2878: --- To make this more digestible, is it worth breaking it out into LUCENE-4524 for the DocsEnum/DocsAndPositionsEnum changes and then keeping this one for the changes to the Weight API and the new query classes? > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Robert Muir > Labels: gsoc2014 > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878_trunk.patch, > LUCENE-2878_trunk.patch, PosHighlighter.patch, PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain today I need a break first :). > The patch passes all core tests > (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't > look into the MemoryIndex BulkPostings API yet) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14249924#comment-14249924 ] ASF subversion and git services commented on LUCENE-2878: - Commit 1646271 from [~romseygeek] in branch 'dev/branches/lucene2878' [ https://svn.apache.org/r1646271 ] LUCENE-2878: Remove dead code from TermScorer > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Robert Muir > Labels: gsoc2014 > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878_trunk.patch, > LUCENE-2878_trunk.patch, PosHighlighter.patch, PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain today I need a break first :). > The patch passes all core tests > (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't > look into the MemoryIndex BulkPostings API yet) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14249722#comment-14249722 ] Alan Woodward commented on LUCENE-2878: --- Hi Rob, Thanks for helping out here! The additional branches are there to allow for returning NO_MORE_POSITIONS once the positions are exhausted, but maybe I should move that logic up to TermScorer instead and put the assertions back in for the PostingsReaders. Merging TermsEnum.docs() and TermsEnum.docsAndPositions() might be tricky because their API is different - docs() never returns null, while docsAndPositions() will return null if the relevant postings data isn't indexed. Although having said that, I'm probably already breaking that contract by redirecting from one to the other with the flags check. I'll fix TermScorer. I haven't nuked Spans yet, mainly because I think we should probably keep them (as deprecated) in 5.0, and remove them only in trunk. It would also make the patch bigger :-) I changed existing test files rather than adding any new ones, apart from the tests exercising the PositionFilterQueries. Maybe a way to reduce the size of the patch would be to remove the PositionFilterQueries from this issue and create a new one for them? Then this one is just about changing the DocsEnum/TermsEnum API. > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Robert Muir > Labels: gsoc2014 > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878_trunk.patch, > LUCENE-2878_trunk.patch, PosHighlighter.patch, PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain today I need a break first :). > The patch passes all core tests > (org.apache.lucene.sea
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14249360#comment-14249360 ] Robert Muir commented on LUCENE-2878: - I ran the benchmark: {noformat} Task QPS trunk StdDev QPS patch StdDev Pct diff HighSloppyPhrase 13.40 (10.4%) 10.31 (5.5%) -23.1% ( -35% - -7%) HighPhrase 17.98 (5.6%) 13.92 (3.1%) -22.6% ( -29% - -14%) MedPhrase 257.86 (7.1%) 213.38 (3.6%) -17.2% ( -26% - -7%) LowPhrase 35.68 (1.8%) 33.32 (1.7%) -6.6% ( -9% - -3%) MedSloppyPhrase 15.79 (4.2%) 14.92 (3.6%) -5.5% ( -12% -2%) LowSloppyPhrase 118.09 (2.4%) 112.14 (2.0%) -5.0% ( -9% -0%) HighTerm 138.18 (10.2%) 136.72 (6.7%) -1.1% ( -16% - 17%) MedTerm 202.67 (9.6%) 200.94 (6.3%) -0.9% ( -15% - 16%) HighSpanNear 144.67 (4.3%) 144.35 (4.3%) -0.2% ( -8% -8%) MedSpanNear 143.52 (3.9%) 143.30 (4.0%) -0.2% ( -7% -8%) Respell 85.33 (1.8%) 85.32 (2.6%) -0.0% ( -4% -4%) LowTerm 1052.81 (8.5%) 1053.59 (5.3%) 0.1% ( -12% - 15%) LowSpanNear 27.81 (2.9%) 27.83 (2.9%) 0.1% ( -5% -6%) Prefix3 232.97 (4.6%) 233.55 (4.5%) 0.3% ( -8% -9%) AndHighHigh 90.67 (1.7%) 91.01 (1.1%) 0.4% ( -2% -3%) Fuzzy1 102.98 (2.1%) 103.38 (3.5%) 0.4% ( -5% -6%) AndHighLow 1121.50 (4.8%) 1126.02 (3.9%) 0.4% ( -7% -9%) AndHighMed 127.28 (2.0%) 127.88 (1.1%) 0.5% ( -2% -3%) Fuzzy2 68.39 (2.1%) 68.77 (3.1%) 0.5% ( -4% -5%) Wildcard 48.08 (2.4%) 48.43 (4.2%) 0.7% ( -5% -7%) IntNRQ9.69 (5.8%)9.79 (7.2%) 1.1% ( -11% - 15%) OrNotHighLow 67.55 (8.1%) 68.88 (7.8%) 2.0% ( -12% - 19%) OrNotHighMed 61.00 (8.3%) 62.38 (8.0%) 2.3% ( -12% - 20%) OrNotHighHigh 35.44 (9.5%) 36.50 (9.5%) 3.0% ( -14% - 24%) OrHighNotHigh 25.97 (9.6%) 26.80 (9.7%) 3.2% ( -14% - 24%) OrHighNotMed 82.14 (10.1%) 84.84 (10.2%) 3.3% ( -15% - 26%) OrHighHigh 29.25 (10.3%) 30.27 (10.5%) 3.5% ( -15% - 27%) OrHighNotLow 104.15 (10.3%) 107.82 (10.5%) 3.5% ( -15% - 27%) OrHighMed 65.67 (10.4%) 68.01 (10.7%) 3.6% ( -15% - 27%) OrHighLow 63.61 (10.6%) 65.91 (10.7%) 3.6% ( -16% - 27%) {noformat} We should look into the regressions for phrases. But first I need to work on LUCENE-6117, it is killing me :) > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Robert Muir > Labels: gsoc2014 > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878_trunk.patch, > LUCENE-2878_trunk.patch, PosHighlighter.patch, PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day t
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14248188#comment-14248188 ] Robert Muir commented on LUCENE-2878: - Hi Alan, Why are there now additional branches in nextPosition() in posting readers? Can we avoid these? If it is to support calling nextPosition() after it has already been exhausted, I am unsure we should be doing this. nextDoc() does not support this, for example. Can we improve this logic in reuse code? Reuse code is notorious for sneaky bugs, and I don't like the comparison done here. Why can't we just check if the bit is set? {code} if ((flags & DocsEnum.FLAG_POSITIONS) >= DocsEnum.FLAG_POSITIONS) {code} This check also is confusing to me in the first place. Why do we have both docsEnum and docsAndPositionsEnum methods, both returning DocsEnum, with the former checking for a FLAG_POSITIONS and calling the other one in that case. I think there should just be one method instead if we want both to share the same api, thats ok. TermScorer introduces outdated dead code that is unused. What are the Span scoring classes still doing here? Do we have tests for any of this new code? I see no added tests files. What about benchmarks? We should try to compare against trunk, since there are a lot of change here. In general, I can try to make a patch for smaller things, to make things faster. Iterating this way may take a long time. > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Robert Muir > Labels: gsoc2014 > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878_trunk.patch, > LUCENE-2878_trunk.patch, PosHighlighter.patch, PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14248162#comment-14248162 ] ASF subversion and git services commented on LUCENE-2878: - Commit 1645925 from [~romseygeek] in branch 'dev/branches/lucene2878' [ https://svn.apache.org/r1645925 ] LUCENE-2878: merge conflicts > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Robert Muir > Labels: gsoc2014 > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, > PosHighlighter.patch, PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain today I need a break first :). > The patch passes all core tests > (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't > look into the MemoryIndex BulkPostings API yet) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14246176#comment-14246176 ] ASF subversion and git services commented on LUCENE-2878: - Commit 1645535 from [~romseygeek] in branch 'dev/branches/lucene2878' [ https://svn.apache.org/r1645535 ] LUCENE-2878: precommit cleanups > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Robert Muir > Labels: gsoc2014 > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, > PosHighlighter.patch, PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain today I need a break first :). > The patch passes all core tests > (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't > look into the MemoryIndex BulkPostings API yet) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14246143#comment-14246143 ] ASF subversion and git services commented on LUCENE-2878: - Commit 1645528 from [~romseygeek] in branch 'dev/branches/lucene2878' [ https://svn.apache.org/r1645528 ] LUCENE-2878: last nocommits > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Robert Muir > Labels: gsoc2014 > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, > PosHighlighter.patch, PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain today I need a break first :). > The patch passes all core tests > (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't > look into the MemoryIndex BulkPostings API yet) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14246133#comment-14246133 ] ASF subversion and git services commented on LUCENE-2878: - Commit 1645525 from [~romseygeek] in branch 'dev/branches/lucene2878' [ https://svn.apache.org/r1645525 ] LUCENE-2878: Scoring on positionfilterqueries > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Robert Muir > Labels: gsoc2014 > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, > PosHighlighter.patch, PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain today I need a break first :). > The patch passes all core tests > (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't > look into the MemoryIndex BulkPostings API yet) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14244395#comment-14244395 ] Robert Muir commented on LUCENE-2878: - Why would checkindex be calling this on an interval scorer? I dont understand the problem. I dont think we should add complex rules to Scorer, just throw UOE here from the interval ones. > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Robert Muir > Labels: gsoc2014 > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, > PosHighlighter.patch, PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain today I need a break first :). > The patch passes all core tests > (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't > look into the MemoryIndex BulkPostings API yet) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14244385#comment-14244385 ] Alan Woodward commented on LUCENE-2878: --- So this doesn't quite work, because things like CheckIndex want to validate frequencies and positions, and in fact the vast majority of DocsEnum implementations have no problem supporting this. I'm now leaning towards making all these 'interval' Scorers just return 1 from freq(), and instead enforcing the rule that you can't call score() and nextPosition(), moving the limitation up to Scorer itself. > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Robert Muir > Labels: gsoc2014 > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, > PosHighlighter.patch, PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain today I need a break first :). > The patch passes all core tests > (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't > look into the MemoryIndex BulkPostings API yet) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14244233#comment-14244233 ] Robert Muir commented on LUCENE-2878: - I would rather us not add slow methods at all to our scoring api. if these position iterators do not support freq(), then they do not support freq(). > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Robert Muir > Labels: gsoc2014 > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, > PosHighlighter.patch, PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain today I need a break first :). > The patch passes all core tests > (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't > look into the MemoryIndex BulkPostings API yet) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14244180#comment-14244180 ] Alan Woodward commented on LUCENE-2878: --- How about keeping it abstract, but having a concrete protected slowFreq() method? > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Robert Muir > Labels: gsoc2014 > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, > PosHighlighter.patch, PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain today I need a break first :). > The patch passes all core tests > (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't > look into the MemoryIndex BulkPostings API yet) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14244145#comment-14244145 ] Robert Muir commented on LUCENE-2878: - {quote} We can then have a default Scorer implementation of freq() that calls nextPosition() repeatedly to count instances, with TermScorer etc having their own specialized overrides. {quote} Can we not do the slow default implementation and just keep it abstract? > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Robert Muir > Labels: gsoc2014 > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, > PosHighlighter.patch, PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain today I need a break first :). > The patch passes all core tests > (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't > look into the MemoryIndex BulkPostings API yet) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14244141#comment-14244141 ] Alan Woodward commented on LUCENE-2878: --- Another thing to think about is the interaction of freq() and nextPosition() calls. At the moment the API contract is that you can call both, and indeed this was necessary to ensure that you didn't call nextPosition() too many times. However, now that you can just call nextPosition() until you get NO_MORE_POSITIONS this isn't needed, and seeing as many Scorers actually need to consume all their positions to return an accurate freq(), I'd like to change this to say that you *can't* call nextPosition() after you've called freq(). We can then have a default Scorer implementation of freq() that calls nextPosition() repeatedly to count instances, with TermScorer etc having their own specialized overrides. > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Robert Muir > Labels: gsoc2014 > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, > PosHighlighter.patch, PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain today I need a break first :). > The patch passes all core tests > (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't > look into the MemoryIndex BulkPostings API yet) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14243949#comment-14243949 ] Alan Woodward commented on LUCENE-2878: --- The remaining nocommits are: * implement nextPosition() etc on TermAutomatonScorer ** this should be fairly simple, and I'm working on it now * implement nextPosition() etc on SpanScorer ** this is not so simple, as Spans.next() will move to the next document if it's exhausted the positions on the current doc, a completely different API. I'm tempted to just not implement these, seeing as they already don't work with 'normal' queries. A followup JIRA will remove Span queries entirely from trunk and deprecate them in 5.0 anyway. * should PositionFilteredScorer enforce that all its subqueries should be on the same field? ** again, I'm tempted not to. 1) this means adding getField() to all queries, which is a far more invasive API change, and 2) we already have FieldMaskingSpanQuery to get around this limitation in SpanQueries, which suggests that it's not necessarily a good thing anyway. On the other hand, most of the time you want phrases or proximity queries to be on the same field. * how to deal with Payloads on intervals. ** At the moment we just return the payload at the starting position. This is probably not what we want. Spans has a different API, returning a Collection from getPayload rather than a BytesRef, which may be closer to what should happen. I think this can be changed from a nocommit to a TODO, however, and dealt with in a separate issue. I also want to have a look at the MultiTermQuery rewrite API, as at the moment it defaults to CONSTANT_SCORE_FILTER_REWRITE, which of course doesn't support positions. It would be nice to have some way of telling the query whether positions are needed or not at rewrite time. But again, that's for another issue. So I think this is close. I'll get TermAutomatonScorer sorted today. > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Robert Muir > Labels: gsoc2014 > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, > PosHighlighter.patch, PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only w
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14239418#comment-14239418 ] ASF subversion and git services commented on LUCENE-2878: - Commit 1644050 from [~romseygeek] in branch 'dev/branches/lucene2878' [ https://svn.apache.org/r1644050 ] LUCENE-2878: UnorderedNearQuery scoring > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Robert Muir > Labels: gsoc2014 > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, > PosHighlighter.patch, PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain today I need a break first :). > The patch passes all core tests > (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't > look into the MemoryIndex BulkPostings API yet) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14238002#comment-14238002 ] ASF subversion and git services commented on LUCENE-2878: - Commit 1643846 from [~romseygeek] in branch 'dev/branches/lucene2878' [ https://svn.apache.org/r1643846 ] LUCENE-2878: Set NO_MORE_POSITIONS to be -1 > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Robert Muir > Labels: gsoc2014 > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, PosHighlighter.patch, > PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain today I need a break first :). > The patch passes all core tests > (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't > look into the MemoryIndex BulkPostings API yet) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14237979#comment-14237979 ] ASF subversion and git services commented on LUCENE-2878: - Commit 1643843 from [~romseygeek] in branch 'dev/branches/lucene2878' [ https://svn.apache.org/r1643843 ] LUCENE-2878: Get Solr compiling > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Robert Muir > Labels: gsoc2014 > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, PosHighlighter.patch, > PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain today I need a break first :). > The patch passes all core tests > (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't > look into the MemoryIndex BulkPostings API yet) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14237937#comment-14237937 ] ASF subversion and git services commented on LUCENE-2878: - Commit 1643835 from [~romseygeek] in branch 'dev/branches/lucene2878' [ https://svn.apache.org/r1643835 ] LUCENE-2878: PostingsHighlighter uses Integer.MAX_VALUE to indicate empty docsenum > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Robert Muir > Labels: gsoc2014 > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, PosHighlighter.patch, > PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain today I need a break first :). > The patch passes all core tests > (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't > look into the MemoryIndex BulkPostings API yet) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14237867#comment-14237867 ] ASF subversion and git services commented on LUCENE-2878: - Commit 1643819 from [~romseygeek] in branch 'dev/branches/lucene2878' [ https://svn.apache.org/r1643819 ] LUCENE-2878: Fix compilation in TestGrouping; don't update extremes in PositionQueue with exhausted iterator > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Robert Muir > Labels: gsoc2014 > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, PosHighlighter.patch, > PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain today I need a break first :). > The patch passes all core tests > (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't > look into the MemoryIndex BulkPostings API yet) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14237792#comment-14237792 ] ASF subversion and git services commented on LUCENE-2878: - Commit 1643791 from [~romseygeek] in branch 'dev/branches/lucene2878' [ https://svn.apache.org/r1643791 ] LUCENE-2878: Remove postingsFeatures() from Collector > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Robert Muir > Labels: gsoc2014 > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, PosHighlighter.patch, > PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain today I need a break first :). > The patch passes all core tests > (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't > look into the MemoryIndex BulkPostings API yet) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14237768#comment-14237768 ] ASF subversion and git services commented on LUCENE-2878: - Commit 1643788 from [~romseygeek] in branch 'dev/branches/lucene2878' [ https://svn.apache.org/r1643788 ] LUCENE-2878: Remove positions highlighter (for now); MemoryPostingsFormat fix > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Robert Muir > Labels: gsoc2014 > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, PosHighlighter.patch, > PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain today I need a break first :). > The patch passes all core tests > (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't > look into the MemoryIndex BulkPostings API yet) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14237758#comment-14237758 ] ASF subversion and git services commented on LUCENE-2878: - Commit 1643787 from [~romseygeek] in branch 'dev/branches/lucene2878' [ https://svn.apache.org/r1643787 ] LUCENE-2878: Make Interval package-private, remove PositionsCollector > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Robert Muir > Labels: gsoc2014 > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, PosHighlighter.patch, > PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain today I need a break first :). > The patch passes all core tests > (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't > look into the MemoryIndex BulkPostings API yet) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14236974#comment-14236974 ] Yonik Seeley commented on LUCENE-2878: -- bq. It does not matter to me really what solr defaults to I was just trying to determine the likelihood of *anyone* actually having pos=MAX_INT in their index. bq. Changing these constants requires work. Sure - I was only pointing it out as an option (5.0 would be the right time), not strongly advocating for it. > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Robert Muir > Labels: gsoc2014 > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, PosHighlighter.patch, > PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain today I need a break first :). > The patch passes all core tests > (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't > look into the MemoryIndex BulkPostings API yet) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14236966#comment-14236966 ] Robert Muir commented on LUCENE-2878: - It does not matter to me really what solr defaults to, we cant be lenient here. Changing these constants requires work. > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Robert Muir > Labels: gsoc2014 > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, PosHighlighter.patch, > PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain today I need a break first :). > The patch passes all core tests > (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't > look into the MemoryIndex BulkPostings API yet) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14236963#comment-14236963 ] Yonik Seeley commented on LUCENE-2878: -- bq. Its more than that, indexwriter has to reject such positions and tests updated, That part seems easy. bq. old postings formats need to have a check and do something, etc. That part could be more of a pain... not sure. But maybe not necessary if it's never happened? (see below) bq. Because some people like large position increment gap values between field instances, someone can easily blow up positions to large values without actually having that many terms. Solr has defaulted to 100 forever... and even if a user changed to a *huge* value, it would be super unlikely to hit MAX_INT exactly (esp since they would get an exception if they went over... they are already in super dangerous territory). I think the only people that would have MAX_INT position are those that are using positions to encode something other than real positions (i.e. using positions like payloads) and have values going all the way up to MAX_INT, or are using MAX_INT as a special value. It could very well be that no such user exists. > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Robert Muir > Labels: gsoc2014 > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, PosHighlighter.patch, > PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain today I need a break first :). > The patch passes all core tests > (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't > look into the MemoryIndex BulkPostings API yet) -- This message was sent by Atlassian JIRA (v
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14236951#comment-14236951 ] Robert Muir commented on LUCENE-2878: - Its more than that, indexwriter has to reject such positions and tests updated, old postings formats need to have a check and do something, etc. Because some people like large position increment gap values between field instances, someone can easily blow up positions to large values without actually having that many terms. > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Robert Muir > Labels: gsoc2014 > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, PosHighlighter.patch, > PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain today I need a break first :). > The patch passes all core tests > (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't > look into the MemoryIndex BulkPostings API yet) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14236945#comment-14236945 ] Yonik Seeley commented on LUCENE-2878: -- bq. I dont really see it as a choice, the current value is not a reserved value. We could chose to make it a reserved value since we're going into a new major version. Does anyone even have positions of MAX_VALUE? > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Robert Muir > Labels: gsoc2014 > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, PosHighlighter.patch, > PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain today I need a break first :). > The patch passes all core tests > (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't > look into the MemoryIndex BulkPostings API yet) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14236939#comment-14236939 ] Robert Muir commented on LUCENE-2878: - I dont really see it as a choice, the current value is not a reserved value. > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Robert Muir > Labels: gsoc2014 > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, PosHighlighter.patch, > PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain today I need a break first :). > The patch passes all core tests > (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't > look into the MemoryIndex BulkPostings API yet) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14236938#comment-14236938 ] Alan Woodward commented on LUCENE-2878: --- The disadvantage of using -1 to indicate NO_MORE_POSITIONS is that having an exhausted iterator sort 'naturally' after an unexhausted one simplifies a lot of the interval spanning logic in things like the PositionQueue or in OrderedNearQuery. > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Robert Muir > Labels: gsoc2014 > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, PosHighlighter.patch, > PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain today I need a break first :). > The patch passes all core tests > (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't > look into the MemoryIndex BulkPostings API yet) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14235749#comment-14235749 ] Robert Muir commented on LUCENE-2878: - In general, I think anything we can do to split it up is good :) > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Robert Muir > Labels: gsoc2014 > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, PosHighlighter.patch, > PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain today I need a break first :). > The patch passes all core tests > (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't > look into the MemoryIndex BulkPostings API yet) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14235738#comment-14235738 ] Alan Woodward commented on LUCENE-2878: --- Interval (outside the posfilter package) and Collector.postingsFeatures() are basically only used by the PositionsCollector, which I'm using for highlighting and testing. The highlighter can just be removed for now, and worked on in a separate issue - maybe as an extension of PostingsHighlighter? I'll rework the tests as well. > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Robert Muir > Labels: gsoc2014 > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, PosHighlighter.patch, > PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain today I need a break first :). > The patch passes all core tests > (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't > look into the MemoryIndex BulkPostings API yet) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14235718#comment-14235718 ] Robert Muir commented on LUCENE-2878: - Currently this NO_MORE_POSITIONS = Integer.MAX_VALUE? Unlike docids, we don't prevent a user from having a position with this value: we only look for overflow (negative) in IndexWriter. I think -1 works better here? > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Robert Muir > Labels: gsoc2014 > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, PosHighlighter.patch, > PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain today I need a break first :). > The patch passes all core tests > (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't > look into the MemoryIndex BulkPostings API yet) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14235709#comment-14235709 ] ASF subversion and git services commented on LUCENE-2878: - Commit 1643349 from [~romseygeek] in branch 'dev/branches/lucene2878' [ https://svn.apache.org/r1643349 ] LUCENE-2878: return NO_MORE_POSITIONS rather than throwing UOE > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Robert Muir > Labels: gsoc2014 > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, PosHighlighter.patch, > PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain today I need a break first :). > The patch passes all core tests > (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't > look into the MemoryIndex BulkPostings API yet) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14235684#comment-14235684 ] Robert Muir commented on LUCENE-2878: - {quote} Interval is there as a concrete POJO that can be stored, cached, passed around, etc, after the scorer has moved on. Maybe it should be an interface instead, and DocsEnum should implement it? {quote} How often is this POJO needed? I see what you are saying, but we have a fairly advanced collection of scorers today, and none need that with Scorer :) If this "snapshot of docsenum" is only needed for say one or two scorers, maybe it can be explicitly and (package-)privately used by them, whereas the rest just use docsenum? > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Robert Muir > Labels: gsoc2014 > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, PosHighlighter.patch, > PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain today I need a break first :). > The patch passes all core tests > (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't > look into the MemoryIndex BulkPostings API yet) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14235677#comment-14235677 ] Alan Woodward commented on LUCENE-2878: --- bq. I don't think the method shoudl be on collector, this is slow and unrelated to collection. OK, I'll try ripping it out and see what stops working > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Robert Muir > Labels: gsoc2014 > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, PosHighlighter.patch, > PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain today I need a break first :). > The patch passes all core tests > (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't > look into the MemoryIndex BulkPostings API yet) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14235671#comment-14235671 ] Robert Muir commented on LUCENE-2878: - I don't think the method shoudl be on collector, this is slow and unrelated to collection. For highlighting, i think it should actually invoke Weight.scorer(..., PARAM, ...) and advance() scorers to precise docids of interest. It isn't doing collection. > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Robert Muir > Labels: gsoc2014 > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, PosHighlighter.patch, > PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain today I need a break first :). > The patch passes all core tests > (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't > look into the MemoryIndex BulkPostings API yet) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14235669#comment-14235669 ] Alan Woodward commented on LUCENE-2878: --- Hm, it is inconsistent at the moment, as I started out returning -1 and then started throwing UOE to find places that were returning the wrong thing. Probably what it ought to do is return NO_MORE_POSITIONS if it doesn't support positions at all (and -1 for offsets, as that's expected everywhere). I'll update the patch. Interval is there as a concrete POJO that can be stored, cached, passed around, etc, after the scorer has moved on. Maybe it should be an interface instead, and DocsEnum should implement it? Collector.postingsFeatures() is there for things like highlighting. Normally a query wouldn't require offsets, for example, but if you're using the PostingsCollector for highlighting, then you want to retrieve offsets when pulling scorers. > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Robert Muir > Labels: gsoc2014 > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, PosHighlighter.patch, > PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain today I need a break first :). > The patch passes all core tests > (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't > look into the MemoryIndex BulkPostings API yet) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14235660#comment-14235660 ] Robert Muir commented on LUCENE-2878: - Can you explain the core api changes such as DocsAndPositionsEnum being folded into DocsEnum? It looks inconsistent, with some impls returning -1 for the "positions" methods and some throwing UOE. And why do we still have an Interval class, if D&PEnum has what we need? I am also unsure of Collector.postingsFeatures, why do we need this? It seems enough to pass flags to Weight.scorer() > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Robert Muir > Labels: gsoc2014 > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, PosHighlighter.patch, > PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain today I need a break first :). > The patch passes all core tests > (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't > look into the MemoryIndex BulkPostings API yet) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14209827#comment-14209827 ] Alan Woodward commented on LUCENE-2878: --- Because I'm an idiot, mainly. Will move the branch, thanks... > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Robert Muir > Labels: gsoc2014 > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, PosHighlighter.patch, > PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain today I need a break first :). > The patch passes all core tests > (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't > look into the MemoryIndex BulkPostings API yet) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14209562#comment-14209562 ] Luc Vanlerberghe commented on LUCENE-2878: -- Any reason why the branch was not created in the branches folder alongside the other lucene«jira issue» branches? Where it is now it is not replicated on git.apache.org, nor on GitHub... > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Robert Muir > Labels: gsoc2014 > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, PosHighlighter.patch, > PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain today I need a break first :). > The patch passes all core tests > (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't > look into the MemoryIndex BulkPostings API yet) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14208082#comment-14208082 ] ASF subversion and git services commented on LUCENE-2878: - Commit 1638800 from [~romseygeek] [ https://svn.apache.org/r1638800 ] Branch for LUCENE-2878 > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Robert Muir > Labels: gsoc2014 > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, PosHighlighter.patch, > PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain today I need a break first :). > The patch passes all core tests > (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't > look into the MemoryIndex BulkPostings API yet) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14202063#comment-14202063 ] Alan Woodward commented on LUCENE-2878: --- The good news is that I have somebody to sponsor me to actually do this work in the next month or so. I have a heap of other things I need to do first though :-( > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Robert Muir > Labels: gsoc2014 > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, PosHighlighter.patch, > PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain today I need a break first :). > The patch passes all core tests > (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't > look into the MemoryIndex BulkPostings API yet) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14202058#comment-14202058 ] David Smiley commented on LUCENE-2878: -- Yeah, tell me about it. I haven't looked at the details yet but I think scorers exposing positions would enable a new highlighter to be built that could accurately know where a match occurred. Hopefully there is control to expose positions for queries that are normally position-less (e.g. wildcard queries). > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Robert Muir > Labels: gsoc2014 > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, PosHighlighter.patch, > PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain today I need a break first :). > The patch passes all core tests > (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't > look into the MemoryIndex BulkPostings API yet) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14202054#comment-14202054 ] Jan Høydahl commented on LUCENE-2878: - This would be a huge selling point for Lucene5.x NUKE SPANS! > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Robert Muir > Labels: gsoc2014 > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, PosHighlighter.patch, > PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain today I need a break first :). > The patch passes all core tests > (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't > look into the MemoryIndex BulkPostings API yet) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13936304#comment-13936304 ] Alan Woodward commented on LUCENE-2878: --- Ooh, hello. So the LUCENE-2878 branch is a bit of a mess, in that it has two semi-working versions of this code: Simon's initial IntervalIterator API, in the o.a.l.search.intervals package, and my DocsEnum.nextPosition() API in o.a.l.search.positions. Simon's code is much more complete, and I've been using a separately maintained version of that in production code for various clients, which you can see at https://github.com/flaxsearch/lucene-solr-intervals. I think the nextPosition() API is nicer, but the IntervalIterator API has the advantage of actually working. The github repository has some other stuff on it too, around making the intervals code work across different fields. The API that I've come up with there is not very nice, though. It would be ace to get this moving again! > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Robert Muir > Labels: gsoc2014 > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, PosHighlighter.patch, > PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain today I need a break first :). > The patch passes all core tests > (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't > look into the MemoryIndex BulkPostings API yet) -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13936240#comment-13936240 ] Simon Willnauer commented on LUCENE-2878: - Now we are talking Sent from my iPhone > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Robert Muir > Labels: gsoc2014 > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, PosHighlighter.patch, > PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain today I need a break first :). > The patch passes all core tests > (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't > look into the MemoryIndex BulkPostings API yet) -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13804377#comment-13804377 ] ASF subversion and git services commented on LUCENE-2878: - Commit 1535436 from [~romseygeek] in branch 'dev/branches/LUCENE-2878' [ https://svn.apache.org/r1535436 ] LUCENE-2878: Merge from trunk > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Simon Willnauer > Labels: gsoc2013 > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, LUCENE-2878-vs-trunk.patch, > PosHighlighter.patch, PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain today I need a break first :). > The patch passes all core tests > (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't > look into the MemoryIndex BulkPostings API yet) -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13585466#comment-13585466 ] Alan Woodward commented on LUCENE-2878: --- I've been chipping away at this for a bit. Here's a summary of what I've done: - Applied LUCENE-4524, and also added startPosition() and endPosition() to DocsEnum - Changed the postings readers to return NO_MORE_POSITIONS once nextPosition() has been called freq() times - Extended the ConjunctionScorer and DisjunctionScorer implementations to return positions - Added an abstract PositionFilteredScorer with reset(int doc) and doNextPosition() methods - Added a bunch of concrete implementations (ExactPhraseQuery, NotWithinQuery, OrderedNearQuery, UnorderedNearQuery, RangeFilterQuery) with tests - these are all in the posfilter package I still need to implement SloppyPhraseQuery and MultiPhraseQuery, but I actually think these won't be too difficult with this API. Plus there are a bunch of nocommits regarding freq() calculations, and this doesn't work at all with BooleanScorer - we'll probably need a way to tell the scorer that we do or don't want position information. [~simonw] and I talked about this on IRC the other day, about resolving collisions in ExactPhraseQuery, but I think that problem may go away doing things this way. I may have misunderstood though - if so, could you add a test to TestExactPhraseQuery showing what I'm missing, Simon? > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Simon Willnauer > Labels: gsoc2011, gsoc2012, lucene-gsoc-11, lucene-gsoc-12, > mentor > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, LUCENE-2878-vs-trunk.patch, > PosHighlighter.patch, PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain toda
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13573392#comment-13573392 ] Robert Muir commented on LUCENE-2878: - I think this should be explored in the branch versus a separate issue E.g. we shouldnt impose this on postings implementations unless it sorta works with the whole design here. I'd also really recommend NO_MORE_POSITIONS not -1. -1 currently means "invalid" (e.g. you should not have called nextPosition). Its not like any Scorer would need to check for this, because if you try to do prox operations on a field that omits position information, the user should be getting an exception up-front from the Weight. > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Simon Willnauer > Labels: gsoc2011, gsoc2012, lucene-gsoc-11, lucene-gsoc-12, > mentor > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, LUCENE-2878-vs-trunk.patch, > PosHighlighter.patch, PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain today I need a break first :). > The patch passes all core tests > (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't > look into the MemoryIndex BulkPostings API yet) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13573389#comment-13573389 ] Simon Willnauer commented on LUCENE-2878: - bq. Hm, OK. So can we change the nextPosition() API to return -1 once the positions have been exhausted, rather than becoming undefined? So a consumer would look something like: yeah I was saying the same thing yesterday when I talked about this to rob. This would make stuff more consistent too. I will open an issue > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Simon Willnauer > Labels: gsoc2011, gsoc2012, lucene-gsoc-11, lucene-gsoc-12, > mentor > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, LUCENE-2878-vs-trunk.patch, > PosHighlighter.patch, PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain today I need a break first :). > The patch passes all core tests > (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't > look into the MemoryIndex BulkPostings API yet) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13573386#comment-13573386 ] Robert Muir commented on LUCENE-2878: - Alan: its a nice idea... we should seriously consider something this (-1 or NO_MORE_POSITIONS or whatever) if it would allow this stuff to just work over the existing D&P api. I dont know what the cost would be to existing impls (e.g. Lucene41PostingsReader would need some code changes), but hopefully small or nil. And of course having an API like this would be well worth any small performance hit. > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Simon Willnauer > Labels: gsoc2011, gsoc2012, lucene-gsoc-11, lucene-gsoc-12, > mentor > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, LUCENE-2878-vs-trunk.patch, > PosHighlighter.patch, PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain today I need a break first :). > The patch passes all core tests > (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't > look into the MemoryIndex BulkPostings API yet) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13573382#comment-13573382 ] Alan Woodward commented on LUCENE-2878: --- Hm, OK. So can we change the nextPosition() API to return -1 once the positions have been exhausted, rather than becoming undefined? So a consumer would look something like: {{monospaced}} int pos; while ((pos = dp.nextPosition()) != -1) { // do stuff here } {{monospaced}} Implementations that need to know the frequency call freq(), others can iterate lazily. > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Simon Willnauer > Labels: gsoc2011, gsoc2012, lucene-gsoc-11, lucene-gsoc-12, > mentor > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, LUCENE-2878-vs-trunk.patch, > PosHighlighter.patch, PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain today I need a break first :). > The patch passes all core tests > (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't > look into the MemoryIndex BulkPostings API yet) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13573353#comment-13573353 ] Simon Willnauer commented on LUCENE-2878: - Alan, I don't think you can cut over to DocsEnum or DocsAndPositionsEnum. DocsAndPosEnum has a significant problem that doesn't allow efficient PosIterator impl underneath it. It defines that DocsAndPosEnum#nextPosition should be called at max DocsEnum#freq() times which is fine on a low level but bogus for lazy pos iterators since we don't know ahead of time how many intervals we might have. I think we first need to fix this problem before we can go and do this refactoring, makes sense? PhraseQuery does only know his freq currently because it's greedy and pulls all intervals at once. > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Simon Willnauer > Labels: gsoc2011, gsoc2012, lucene-gsoc-11, lucene-gsoc-12, > mentor > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, LUCENE-2878-vs-trunk.patch, > PosHighlighter.patch, PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain today I need a break first :). > The patch passes all core tests > (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't > look into the MemoryIndex BulkPostings API yet) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13573337#comment-13573337 ] Alan Woodward commented on LUCENE-2878: --- I'm going to try applying the patch from LUCENE-4524 here and see if that helps. Next step would be to add startPosition() and endPosition() to DocsEnum, and try re-implementing the filter queries using methods directly on child scorers, rather than pulling a separate interval iterator. > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Simon Willnauer > Labels: gsoc2011, gsoc2012, lucene-gsoc-11, lucene-gsoc-12, > mentor > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, LUCENE-2878-vs-trunk.patch, > PosHighlighter.patch, PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain today I need a break first :). > The patch passes all core tests > (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't > look into the MemoryIndex BulkPostings API yet) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13558606#comment-13558606 ] Simon Willnauer commented on LUCENE-2878: - bq. The only Span functionality that's missing I think is payload queries. If we want to have all the span functionality in here before it can land on trunk I can work on that next. I really think we can skip that for now. bq. It would also be good to do some proper benchmarking. Do we already have something that can compare sets of queries? We do have LuceneUtil but its not like straight forward. I will take a look what we can do here. bq. Is it worth specialising here? Have two query types (or maybe just a flag on the query), so you can optimize for query speed or for scoring. SpanScorer always iterates over all spans, by comparison. I think we should specialize the Scorere here. Visiting the least amount of intervals possible is maybe worth it. So from my perspective what we should try exploring is making the scorer a DocsAndPosEnum in the branch and see if we can remove the Interval API mostly in favor of the DocsAndPos API. The only problem I have with this is really that if a given scorer consumes intervals from a subscorer it needs to buffer all those if it's parent needs all of them too. Not sure if it is worth it at this point. Ideally I would want to have DocsAndPosEnum to be folded into DocsEnum first too. > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Simon Willnauer > Labels: gsoc2011, gsoc2012, lucene-gsoc-11, lucene-gsoc-12, > mentor > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, LUCENE-2878-vs-trunk.patch, > PosHighlighter.patch, PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain today I need a break firs
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13558405#comment-13558405 ] Alan Woodward commented on LUCENE-2878: --- So at the moment, IntervalFilterScorer doesn't consume all the intervals on a given document when advancing, it just checks if the document has any matching intervals at all. Which is great for speed, but bad for scoring - you want to iterate through the intervals on a document to get the within-doc frequency, which can then be passed to the docscorer. You also need to iterate through everything to deal with payloads. Is it worth specialising here? Have two query types (or maybe just a flag on the query), so you can optimize for query speed or for scoring. SpanScorer always iterates over all spans, by comparison. > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Simon Willnauer > Labels: gsoc2011, gsoc2012, lucene-gsoc-11, lucene-gsoc-12, > mentor > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, LUCENE-2878-vs-trunk.patch, > PosHighlighter.patch, PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain today I need a break first :). > The patch passes all core tests > (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't > look into the MemoryIndex BulkPostings API yet) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13557667#comment-13557667 ] Alan Woodward commented on LUCENE-2878: --- Since the last patch went up I've fixed a bunch of bugs (BrouwerianQuery works properly now, as do various nested IntervalQuery subtypes that were throwing NPEs), as well as adding Span-type scoring and fleshing out the explain() methods. The only Span functionality that's missing I think is payload queries. If we want to have *all* the span functionality in here before it can land on trunk I can work on that next. It would also be good to do some proper benchmarking. Do we already have something that can compare sets of queries? > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Simon Willnauer > Labels: gsoc2011, gsoc2012, lucene-gsoc-11, lucene-gsoc-12, > mentor > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, LUCENE-2878-vs-trunk.patch, > PosHighlighter.patch, PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain today I need a break first :). > The patch passes all core tests > (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't > look into the MemoryIndex BulkPostings API yet) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13556372#comment-13556372 ] Simon Willnauer commented on LUCENE-2878: - YEAH!!! > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Simon Willnauer > Labels: gsoc2011, gsoc2012, lucene-gsoc-11, lucene-gsoc-12, > mentor > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, LUCENE-2878-vs-trunk.patch, > PosHighlighter.patch, PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain today I need a break first :). > The patch passes all core tests > (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't > look into the MemoryIndex BulkPostings API yet) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13556367#comment-13556367 ] Alan Woodward commented on LUCENE-2878: --- I want to get this moving again - will get the branch up to date tomorrow and then iterate from there. > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Simon Willnauer > Labels: gsoc2011, gsoc2012, lucene-gsoc-11, lucene-gsoc-12, > mentor > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, LUCENE-2878-vs-trunk.patch, > PosHighlighter.patch, PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain today I need a break first :). > The patch passes all core tests > (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't > look into the MemoryIndex BulkPostings API yet) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13495661#comment-13495661 ] Alan Woodward commented on LUCENE-2878: --- Just committed a massive test refactoring, which should show where the problems are in DisjunctionIntervalScorer and MultiTermQuery. Lots of the tests fail now, as the previous ones weren't necessarily picking up false positives (UnorderedNearQuery is particularly bad for this). > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Simon Willnauer > Labels: gsoc2011, gsoc2012, lucene-gsoc-11, lucene-gsoc-12, > mentor > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, LUCENE-2878-vs-trunk.patch, > PosHighlighter.patch, PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain today I need a break first :). > The patch passes all core tests > (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't > look into the MemoryIndex BulkPostings API yet) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494940#comment-13494940 ] Alan Woodward commented on LUCENE-2878: --- Hi Simon, I'll open separate sub-tasks for the issues. The system I'm building is basically an equivalent of the elasticsearch percolator - we register a bunch of queries, and then run them all against individual documents passing through the system. We then emit the exact positions which have matched, which is a type of highlighting, I guess. The point of the anchor terms is that we don't want to highlight them - if you're searching for a term within five positions of the start of a document, you don't want the first term of the document highlighted as well. > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Simon Willnauer > Labels: gsoc2011, gsoc2012, lucene-gsoc-11, lucene-gsoc-12, > mentor > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, LUCENE-2878-vs-trunk.patch, > PosHighlighter.patch, PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain today I need a break first :). > The patch passes all core tests > (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't > look into the MemoryIndex BulkPostings API yet) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494874#comment-13494874 ] Simon Willnauer commented on LUCENE-2878: - hey alan, bq. I've started to use this branch in an (experimental!) system I'm developing for a client. very good news! cool stuff - can you provide more infos what you are doing there? Do you highlight too? regarding your latest patch - commit it! bq. 1) There isn't a replacement for SpanNotQueries - the BrouwerianIterator comes close, but doesn't quite cover all the use cases. I can you provide a testcase what it doesn't cover? you can go ahead and commit it even if you don't have a fix. bq. 2) The API is not very nice when it comes to subclassing Iterators. For example, I have 'anchor' terms at the start and end of documents, which allow I am not sure I understand this. if you have marker terms how do they differ from ordinary terms can't you just do a nearOrdered("X", "_ENDMARKER_") query? I don't see where you need to subclass here. can you elaborate? bq. 4) I found a bug in the iterators() method of DisjunctionSumScorer great, can you submit the testcase? bq. 3) MultiTermQueries don't return iterators unless you set their rewrite policies to something other than CONSTANT_SCORE_REWRITE. yeah the problem here is that we use a filter instead of a scorer, you should see an exception right? I think it would make sense to have a MTQ rewrite a query on a ConstantScoreQuery instead of a filter - we can't get a interval iter from a filter :/ I think overall we should move out of this issue and create separate issues for all you cases. Also for the things robert mentioned like exploring "Scorere extends DocsAndPosEnum" > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Simon Willnauer > Labels: gsoc2011, gsoc2012, lucene-gsoc-11, lucene-gsoc-12, > mentor > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, LUCENE-2878-vs-trunk.patch, > PosHighlighter.patch, PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494785#comment-13494785 ] Alan Woodward commented on LUCENE-2878: --- I've started to use this branch in an (experimental!) system I'm developing for a client. The good news is that performance is generally much better than the existing system that uses SpanQueries - faster query time and smaller memory footprint, and also nicer GC behaviour (I can't give exact numbers, but suffice to say that where the previous system regularly ran out of memory, this one hasn't yet)! There are definitely some rough edges, though, which I'll try and smooth out and add as patches. 1) There isn't a replacement for SpanNotQueries - the BrouwerianIterator comes close, but doesn't quite cover all the use cases. In this instance, I need to have the equivalent of a 'not within' operator - match intervals that do not fall within a given another interval. I've written a new iterator, which I've called an 'InverseBrouwerianIntervalIterator' for want of a better name, but it definitely could do with some more eyes on it... 2) The API is not very nice when it comes to subclassing Iterators. For example, I have 'anchor' terms at the start and end of documents, which allow users to query for terms within a certain distance from them. These shouldn't be highlighted, so I created an AnchorTermQuery which returned a different type of IntervalIterator that didn't do anything in its collect() method. To do this, I had to create an AnchorTermWeight, an AnchorTermScorer and an AnchorTermIntervalIterator, all of which were more or less copy-pastes of the equivalent Term* classes; it would be nice to make this easier... 3) MultiTermQueries don't return iterators unless you set their rewrite policies to something other than CONSTANT_SCORE_REWRITE. 4) I found a bug in the iterators() method of DisjunctionSumScorer - if all subscorers are PositionFilterScorers, then you can get NPEs if the subscorers have matches that don't pass the filters. I'll add a test case shortly 5) I had to run this without my scoring patch (this case doesn't actually use scoring, so it doesn't matter that much), because MultiTermQueries can blow up in scoring if they get rewritten into blank queries; I guess this wasn't a problem with Span* queries, but I haven't had a chance to work out how to get round it. Will add another test case for this as well. All in all, though, these are looking much better than the equivalent SpanQueries. Position filters on boolean queries in particular work much better - the semantics of SpanQueries are completely wrong for this, and involved generating very heavy queries for pretty simple cases. Nice work! > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Simon Willnauer > Labels: gsoc2011, gsoc2012, lucene-gsoc-11, lucene-gsoc-12, > mentor > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, LUCENE-2878-vs-trunk.patch, > PosHighlighter.patch, PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13490120#comment-13490120 ] Alan Woodward commented on LUCENE-2878: --- > will you be at apache con? Not this year :-( Will try and make one next year, though! > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Simon Willnauer > Labels: gsoc2011, gsoc2012, lucene-gsoc-11, lucene-gsoc-12, > mentor > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878_trunk.patch, > LUCENE-2878_trunk.patch, LUCENE-2878-vs-trunk.patch, PosHighlighter.patch, > PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain today I need a break first :). > The patch passes all core tests > (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't > look into the MemoryIndex BulkPostings API yet) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13490119#comment-13490119 ] Simon Willnauer commented on LUCENE-2878: - +1 to add scoring! go ahead this would be great. will you be at apache con? > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Simon Willnauer > Labels: gsoc2011, gsoc2012, lucene-gsoc-11, lucene-gsoc-12, > mentor > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878_trunk.patch, > LUCENE-2878_trunk.patch, LUCENE-2878-vs-trunk.patch, PosHighlighter.patch, > PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain today I need a break first :). > The patch passes all core tests > (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't > look into the MemoryIndex BulkPostings API yet) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13490116#comment-13490116 ] Alan Woodward commented on LUCENE-2878: --- Where do we stand on this now? It sounds as though we need to get more implemented before everyone's happy with it being merged in. I can make a start at cutting TestSpansAdvanced and TestSpansAdvanced2 over to intervals tests this week (at a glance they're the only tests we have for Span scoring at the moment), although I guess things are going to go on hold a bit for the ApacheCon. > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Simon Willnauer > Labels: gsoc2011, gsoc2012, lucene-gsoc-11, lucene-gsoc-12, > mentor > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878_trunk.patch, > LUCENE-2878_trunk.patch, LUCENE-2878-vs-trunk.patch, PosHighlighter.patch, > PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain today I need a break first :). > The patch passes all core tests > (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't > look into the MemoryIndex BulkPostings API yet) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13487069#comment-13487069 ] Simon Willnauer commented on LUCENE-2878: - bq. Interval.java looks a lot like a span. I think it would be good to resolve this before landing on trunk. what exactly did you expect? I mean its basically the same thing but reusing the name sucks. what do you wanna resolve here? regarding your comments could you put your ideas up here in a somewhat more compact form than 5 1-line comments? This would be great especially what kind of ideas you have and you resovle / work on on trunk so we can maybe be more productive here. thanks > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Simon Willnauer > Labels: gsoc2011, gsoc2012, lucene-gsoc-11, lucene-gsoc-12, > mentor > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878_trunk.patch, > LUCENE-2878_trunk.patch, LUCENE-2878-vs-trunk.patch, PosHighlighter.patch, > PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain today I need a break first :). > The patch passes all core tests > (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't > look into the MemoryIndex BulkPostings API yet) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13486868#comment-13486868 ] Robert Muir commented on LUCENE-2878: - To me Interval.java looks a lot like a span. I think it would be good to resolve this before landing on trunk. If we cant score based on positions, it seems to me the api is not fully baked? e.g. i think it would be better to score based on positions and run benchmarks and so on first. here we also get a lot more index option flags. I dont like how many of these we have: * indexoptions * flags on docsenum * flags on docsandpositionsenum * now here, flags on scorer/collector/etc I am working on some ideas to clean some of this up in trunk separate from this branch. i think this makes the apis really confusing. > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Simon Willnauer > Labels: gsoc2011, gsoc2012, lucene-gsoc-11, lucene-gsoc-12, > mentor > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878_trunk.patch, > LUCENE-2878_trunk.patch, LUCENE-2878-vs-trunk.patch, PosHighlighter.patch, > PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain today I need a break first :). > The patch passes all core tests > (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't > look into the MemoryIndex BulkPostings API yet) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13486864#comment-13486864 ] Simon Willnauer commented on LUCENE-2878: - {quote} But then ... this would "typically" be used to find the locations to hilite, right? Ie not for the "main" query? Because if you wanted to do this for the main query you should really use one of the oal.search.intervals.* queries instead, and those pull only a single D&PEnum per Term I think? {quote} correct. {quote} It seems like the new oal.search.interval queries are meant to replace spans? So ... should we remove spans? Or is there functionality missing in intervals? {quote} eventually yes. Currently they don't score based on positions they only filter. My plan was to bring this on trunk including spans. Once on trunk move spans to a module and cut over the functionality query by query. We are currently missing payload support which I think we should add once we are on trunk. makes sense? > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Simon Willnauer > Labels: gsoc2011, gsoc2012, lucene-gsoc-11, lucene-gsoc-12, > mentor > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878_trunk.patch, > LUCENE-2878_trunk.patch, LUCENE-2878-vs-trunk.patch, PosHighlighter.patch, > PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain today I need a break first :). > The patch passes all core tests > (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't > look into the MemoryIndex BulkPostings API yet) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To uns
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13486858#comment-13486858 ] Michael McCandless commented on LUCENE-2878: It seems like the new oal.search.interval queries are meant to replace spans? So ... should we remove spans? Or is there functionality missing in intervals? > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Simon Willnauer > Labels: gsoc2011, gsoc2012, lucene-gsoc-11, lucene-gsoc-12, > mentor > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878_trunk.patch, > LUCENE-2878_trunk.patch, LUCENE-2878-vs-trunk.patch, PosHighlighter.patch, > PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain today I need a break first :). > The patch passes all core tests > (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't > look into the MemoryIndex BulkPostings API yet) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13486194#comment-13486194 ] Michael McCandless commented on LUCENE-2878: It's sort of disturbing that if you iterate over intervals for a PhraseQuery we pull two DocsAndPositionsEnums per term in the phrase ... But then ... this would "typically" be used to find the locations to hilite, right? Ie not for the "main" query? Because if you wanted to do this for the main query you should really use one of the oal.search.intervals.* queries instead, and those pull only a single D&PEnum per Term I think? > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Simon Willnauer > Labels: gsoc2011, gsoc2012, lucene-gsoc-11, lucene-gsoc-12, > mentor > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878_trunk.patch, > LUCENE-2878_trunk.patch, LUCENE-2878-vs-trunk.patch, PosHighlighter.patch, > PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain today I need a break first :). > The patch passes all core tests > (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't > look into the MemoryIndex BulkPostings API yet) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13486067#comment-13486067 ] Robert Muir commented on LUCENE-2878: - I don't like the general style of things like Collector.postingsFeatures() >From the naming, you cant tell this is a "getter". In general I think methods >like this should be getPostingsFeatures() ? > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Simon Willnauer > Labels: gsoc2011, gsoc2012, lucene-gsoc-11, lucene-gsoc-12, > mentor > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878_trunk.patch, > LUCENE-2878_trunk.patch, LUCENE-2878-vs-trunk.patch, PosHighlighter.patch, > PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain today I need a break first :). > The patch passes all core tests > (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't > look into the MemoryIndex BulkPostings API yet) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13486064#comment-13486064 ] Robert Muir commented on LUCENE-2878: - There seems to be a lot of unrelated formatting changes in important classes like TermWeight.java etc Can we factor these out and do any formatting changes separately? > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Simon Willnauer > Labels: gsoc2011, gsoc2012, lucene-gsoc-11, lucene-gsoc-12, > mentor > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878_trunk.patch, > LUCENE-2878_trunk.patch, LUCENE-2878-vs-trunk.patch, PosHighlighter.patch, > PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain today I need a break first :). > The patch passes all core tests > (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't > look into the MemoryIndex BulkPostings API yet) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13486060#comment-13486060 ] Robert Muir commented on LUCENE-2878: - {quote} Instead of PostingFeatures.isProximityFeature, can we just use X.compareTo(PostingsFeatures.POSITIONS) >= 0? (We do this for IndexOptions). {quote} I think this is a confusing part of the current patch. For example: {noformat} // Check if we can return a BooleanScorer - if (!scoreDocsInOrder && topScorer && required.size() == 0) { + if (!scoreDocsInOrder && flags == PostingFeatures.DOCS_AND_FREQS && topScorer && required.size() == 0) { {noformat} I don't think we should be doing these == comparisons. What if someone sends DOCS_ONLY? (which really ConstantScoreQuery i think should pass down to its subs and so on, so they can skip freq blocks, but thats another thing to tackle). > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Simon Willnauer > Labels: gsoc2011, gsoc2012, lucene-gsoc-11, lucene-gsoc-12, > mentor > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878_trunk.patch, > LUCENE-2878_trunk.patch, LUCENE-2878-vs-trunk.patch, PosHighlighter.patch, > PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain today I need a break first :). > The patch passes all core tests > (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't > look into the MemoryIndex BulkPostings API yet) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13486013#comment-13486013 ] Simon Willnauer commented on LUCENE-2878: - bq. I think the patch for review is incomplete? e.g. I see PostingsFeatures was added its a inner class of Weight in that patch but we might move it out! > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Simon Willnauer > Labels: gsoc2011, gsoc2012, lucene-gsoc-11, lucene-gsoc-12, > mentor > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878_trunk.patch, > LUCENE-2878_trunk.patch, LUCENE-2878-vs-trunk.patch, PosHighlighter.patch, > PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain today I need a break first :). > The patch passes all core tests > (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't > look into the MemoryIndex BulkPostings API yet) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13485994#comment-13485994 ] Robert Muir commented on LUCENE-2878: - I think the patch for review is incomplete? e.g. I see PostingsFeatures was added to the Scorer api but its not in the patch. > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Simon Willnauer > Labels: gsoc2011, gsoc2012, lucene-gsoc-11, lucene-gsoc-12, > mentor > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878_trunk.patch, > LUCENE-2878_trunk.patch, LUCENE-2878-vs-trunk.patch, PosHighlighter.patch, > PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain today I need a break first :). > The patch passes all core tests > (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't > look into the MemoryIndex BulkPostings API yet) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13485952#comment-13485952 ] Michael McCandless commented on LUCENE-2878: bq. a better name for "collect"? Yeah, to somehow reflect that it's visiting/collecting/recursing on the full interval tree ... but nothing comes to mind ... When I first saw it / read the docs I thought this was analogous to Scorer.score(Collector), ie that it would "bulk collect" all intervals from the iterator. > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Simon Willnauer > Labels: gsoc2011, gsoc2012, lucene-gsoc-11, lucene-gsoc-12, > mentor > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878_trunk.patch, > LUCENE-2878_trunk.patch, LUCENE-2878-vs-trunk.patch, PosHighlighter.patch, > PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain today I need a break first :). > The patch passes all core tests > (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't > look into the MemoryIndex BulkPostings API yet) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13485899#comment-13485899 ] Simon Willnauer commented on LUCENE-2878: - thanks mike for the commits! much apprecitated! {quote} OK so it sounds like I pull one IntervalIterator up front (and use it for the whole time), and it's my job to call .scorerAdvanced(docID) every time I either .nextDoc or .advance the original Scorer? Ie this "resets" my IntervalIterator onto the current doc's intervals. {quote} exactly! {quote} Ahhh so it visits all intervals in the query tree leading up to the current match interval (of the top query) that you've iterated to? OK. Maybe we can find a better name ... can't think of one now {quote} a better name for "collect"? > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Simon Willnauer > Labels: gsoc2011, gsoc2012, lucene-gsoc-11, lucene-gsoc-12, > mentor > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878_trunk.patch, > LUCENE-2878_trunk.patch, LUCENE-2878-vs-trunk.patch, PosHighlighter.patch, > PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain today I need a break first :). > The patch passes all core tests > (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't > look into the MemoryIndex BulkPostings API yet) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13485723#comment-13485723 ] Michael McCandless commented on LUCENE-2878: {quote} bq. I'm confused on how one uses IntervalIterator along with the Scorer it "belongs" to. Say I want to visit all Intervals for a given TermQuery ... do I first get the TermScorer and then call .intervals, up front? And then call TermScorer.nextDoc(), but then how to iterate over all intervals for that one document? EG, is the caller supposed to call IntervalIterator.scorerAdvanced for each next'd doc? so my major goal here was to make this totally detached, optional and lazy ie no additional code in scorer except of IntervalIterator creation on demand. once you have a scorer you can call intervals() and get an iterator. This instance can and should be reused while docs are collected / scored / matched on a given reader. For each doc I need to iterate over intervals I call scorerAdvanced and update the internal structures this prevents any additional work if it is not really needed ie. on a complex query/scorer tree. Once the iterator is setup (scorerAdvanced is called) you can just call next() on it in a loop --> while ((interval = iter.next) != null) and get all the intervals. makes sense? {quote} OK so it sounds like I pull one IntervalIterator up front (and use it for the whole time), and it's my job to call .scorerAdvanced(docID) every time I either .nextDoc or .advance the original Scorer? Ie this "resets" my IntervalIterator onto the current doc's intervals. bq. I think the scorer#interval method javadoc make this clear, no? if not I should make it clear! I was still confused :) I'll take a stab at improving it ... also we should add @experimental... {quote} bq. I'm also confused why TermIntervalIterator.collect only collects one interval (I think?). Shouldn't it collect all intervals for the current doc? the collect method is special. It's an interface that allows to collect the "current" interval or all "current" intervals that contributed to a higher level interval. For each next call you should call collect if you need all the subtrees intervals or the leaves. one usecase where we do this right now is highlighing. you can highlight based on phrases ie. if you collect on a BQ or you can do individual terms ie. collect leaves. makes sense? {quote} Ahhh so it visits all intervals in the query tree leading up to the current match interval (of the top query) that you've iterated to? OK. Maybe we can find a better name ... can't think of one now :) > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Simon Willnauer > Labels: gsoc2011, gsoc2012, lucene-gsoc-11, lucene-gsoc-12, > mentor > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878_trunk.patch, > LUCENE-2878_trunk.patch, LUCENE-2878-vs-trunk.patch, PosHighlighter.patch, > PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerCont
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13485685#comment-13485685 ] Simon Willnauer commented on LUCENE-2878: - thanks mike for taking a look at this. It still has it's edges so every review is very valuable. {quote}Instead of PostingFeatures.isProximityFeature, can we just use X.compareTo(PostingsFeatures.POSITIONS) >= 0? (We do this for IndexOptions).{quote} sure, I was actually thinking about this for a while though. After a day of playing with different ways of doing it I really asked myself why we have 2 different docs enums and why not just one and one set of features / flags this would make a lot of things easier. Different discussion / progress over perfection.. {quote}Should we move PostingFeatures to its own source instead of hiding it in Weight.java?{quote} alone the same lines, sure lets move it out. {quote}Can we put back single imports (not wildcard, eg "import org.apache.lucene.index.*")?{quote} yeah I saw that when I created the diff I will go over it and bring it back. {quote}PostingFeatures is very similar to FieldInfo.IndexOptions (except the latter does not cover payloads) ... would be nice if we could somehow combine them ...{quote} I agree it would be nice to unify all of this. Lets open another issue - we have a good set of usecases now. {quote} I'm confused on how one uses IntervalIterator along with the Scorer it "belongs" to. Say I want to visit all Intervals for a given TermQuery ... do I first get the TermScorer and then call .intervals, up front? And then call TermScorer.nextDoc(), but then how to iterate over all intervals for that one document? EG, is the caller supposed to call IntervalIterator.scorerAdvanced for each next'd doc?{quote} so my major goal here was to make this totally detached, optional and lazy ie no additional code in scorer except of IntervalIterator creation on demand. once you have a scorer you can call intervals() and get an iterator. This instance can and should be reused while docs are collected / scored / matched on a given reader. For each doc I need to iterate over intervals I call scorerAdvanced and update the internal structures this prevents any additional work if it is not really needed ie. on a complex query/scorer tree. Once the iterator is setup (scorerAdvanced is called) you can just call next() on it in a loop --> while ((interval = iter.next) != null) and get all the intervals. makes sense? {quote}Or ... am I supposed to call .intervals() after each .nextDoc() (but that looks rather costly/wasteful since it's a newly alloc'd TermIntervalIterator each time).{quote} no that is not what you should do. I think the scorer#interval method javadoc make this clear, no? if not I should make it clear! {quote} I'm also confused why TermIntervalIterator.collect only collects one interval (I think?). Shouldn't it collect all intervals for the current doc?{quote} the collect method is special. It's an interface that allows to collect the "current" interval or all "current" intervals that contributed to a higher level interval. For each next call you should call collect if you need all the subtrees intervals or the leaves. one usecase where we do this right now is highlighing. you can highlight based on phrases ie. if you collect on a BQ or you can do individual terms ie. collect leaves. makes sense? > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Simon Willnauer > Labels: gsoc2011, gsoc2012, lucene-gsoc-11, lucene-gsoc-12, > mentor > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, > LUCENE-2878-vs-trunk.patch, PosHighlighter.patch, PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code a
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13485640#comment-13485640 ] Michael McCandless commented on LUCENE-2878: I'm confused on how one uses IntervalIterator along with the Scorer it "belongs" to. Say I want to visit all Intervals for a given TermQuery ... do I first get the TermScorer and then call .intervals, up front? And then call TermScorer.nextDoc(), but then how to iterate over all intervals for that one document? EG, is the caller supposed to call IntervalIterator.scorerAdvanced for each next'd doc? Or ... am I supposed to call .intervals() after each .nextDoc() (but that looks rather costly/wasteful since it's a newly alloc'd TermIntervalIterator each time). I'm also confused why TermIntervalIterator.collect only collects one interval (I think?). Shouldn't it collect all intervals for the current doc? > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Simon Willnauer > Labels: gsoc2011, gsoc2012, lucene-gsoc-11, lucene-gsoc-12, > mentor > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, > LUCENE-2878-vs-trunk.patch, PosHighlighter.patch, PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain today I need a break first :). > The patch passes all core tests > (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't > look into the MemoryIndex BulkPostings API yet) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13485637#comment-13485637 ] Michael McCandless commented on LUCENE-2878: I'm still trying to catch up here (net/net this looks awesome!), but here's some minor stuff I noticed: Instead of PostingFeatures.isProximityFeature, can we just use X.compareTo(PostingsFeatures.POSITIONS) >= 0? (We do this for IndexOptions). Should we move PostingFeatures to its own source instead of hiding it in Weight.java? Can we put back single imports (not wildcard, eg "import org.apache.lucene.index.*")? PostingFeatures is very similar to FieldInfo.IndexOptions (except the latter does not cover payloads) ... would be nice if we could somehow combine them ... > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Simon Willnauer > Labels: gsoc2011, gsoc2012, lucene-gsoc-11, lucene-gsoc-12, > mentor > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, > LUCENE-2878-vs-trunk.patch, PosHighlighter.patch, PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain today I need a break first :). > The patch passes all core tests > (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't > look into the MemoryIndex BulkPostings API yet) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13485594#comment-13485594 ] Simon Willnauer commented on LUCENE-2878: - I renamed the package and fixed the package html. > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Simon Willnauer > Labels: gsoc2011, gsoc2012, lucene-gsoc-11, lucene-gsoc-12, > mentor > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, > PosHighlighter.patch, PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain today I need a break first :). > The patch passes all core tests > (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't > look into the MemoryIndex BulkPostings API yet) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13485585#comment-13485585 ] Alan Woodward commented on LUCENE-2878: --- > I'd rename the package o.a.l.s.positions to o.a.l.s.intervals what do you > think? +1 > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Simon Willnauer > Labels: gsoc2011, gsoc2012, lucene-gsoc-11, lucene-gsoc-12, > mentor > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, > PosHighlighter.patch, PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain today I need a break first :). > The patch passes all core tests > (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't > look into the MemoryIndex BulkPostings API yet) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13485584#comment-13485584 ] Simon Willnauer commented on LUCENE-2878: - bq. I committed a few more javadoc fixes. Ant precommit passes when run from the top level. Let's get this in trunk! good stuff! lets give other folks the chance to jump on it / comment on what we have and then move forward! BTW. I'd rename the package o.a.l.s.positions to o.a.l.s.intervals what do you think? > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Simon Willnauer > Labels: gsoc2011, gsoc2012, lucene-gsoc-11, lucene-gsoc-12, > mentor > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, > PosHighlighter.patch, PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain today I need a break first :). > The patch passes all core tests > (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't > look into the MemoryIndex BulkPostings API yet) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13485582#comment-13485582 ] Alan Woodward commented on LUCENE-2878: --- I committed a few more javadoc fixes. Ant precommit passes when run from the top level. Let's get this in trunk! > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Simon Willnauer > Labels: gsoc2011, gsoc2012, lucene-gsoc-11, lucene-gsoc-12, > mentor > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, > PosHighlighter.patch, PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain today I need a break first :). > The patch passes all core tests > (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't > look into the MemoryIndex BulkPostings API yet) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13485507#comment-13485507 ] Simon Willnauer commented on LUCENE-2878: - alan, I just committed some more javadocs including more content for the package.html (review would be appreciated) I also fixed all nocommits in the latest commit. The nocommit in SloppyPhraseScorer I removed last week or so while I cheated here a bit. I current throw an UnsupportedOE if there are multiple terms per position right now since I think its not crucial for us to have this for now. I really want it since its the entire point of this feature but moving towards trunk is really what we want so other people get into it too. Being on trunk is very helpful. Regarding tests - I agree we should have more tests especially with bigger documents. I might add a couple of random tests next week but feel free to jump on it. Next step would also be to run ant precommit on top level and see where it barfs. Other than that I really need other committers to look over the API but if nobody does we just gonna put up a patch and tell them we gonna reintegrate in X days :) simon > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Simon Willnauer > Labels: gsoc2011, gsoc2012, lucene-gsoc-11, lucene-gsoc-12, > mentor > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, > PosHighlighter.patch, PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain today I need a break first :). > The patch passes all core tests > (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't > look into the MemoryIndex BulkPostings API yet) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on J
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13485183#comment-13485183 ] Alan Woodward commented on LUCENE-2878: --- I've committed a whole bunch more javadocs, and a package.html. There's still a big nocommit in SloppyPhraseScorer, but other than that we're looking good. We could probably do with more test coverage, but then that's never not the case, so... > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Simon Willnauer > Labels: gsoc2011, gsoc2012, lucene-gsoc-11, lucene-gsoc-12, > mentor > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, > PosHighlighter.patch, PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain today I need a break first :). > The patch passes all core tests > (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't > look into the MemoryIndex BulkPostings API yet) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13485017#comment-13485017 ] Simon Willnauer commented on LUCENE-2878: - alan FYI - I committed some refactorings (renamed Scorer#positions to Scorere#intervals) etc. so you should update. I also committed your lattest patch > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Simon Willnauer > Labels: gsoc2011, gsoc2012, lucene-gsoc-11, lucene-gsoc-12, > mentor > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, > PosHighlighter.patch, PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain today I need a break first :). > The patch passes all core tests > (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't > look into the MemoryIndex BulkPostings API yet) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13484997#comment-13484997 ] Alan Woodward commented on LUCENE-2878: --- OK! I think we're nearly there... > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Simon Willnauer > Labels: gsoc2011, gsoc2012, lucene-gsoc-11, lucene-gsoc-12, > mentor > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, > PosHighlighter.patch, PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain today I need a break first :). > The patch passes all core tests > (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't > look into the MemoryIndex BulkPostings API yet) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13484989#comment-13484989 ] Simon Willnauer commented on LUCENE-2878: - alan +1 to the patch BooleanIntervalIterator is a relict. I will go ahead and commit it. bq. Other than writing javadocs, we need to replace PayloadTermQuery and PayloadNearQuery, I think. I'll work on that next. Honestly, fuck it! PayloadTermQuery and PayloadNearQuery are so exotic I'd leave it out and move it into a sep. issue and maybe add them once we are on trunk. We can still just convert them to pos iters eventually. For now that is not important. we should focus on getting this on trunk. > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Simon Willnauer > Labels: gsoc2011, gsoc2012, lucene-gsoc-11, lucene-gsoc-12, > mentor > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, > PosHighlighter.patch, PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain today I need a break first :). > The patch passes all core tests > (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't > look into the MemoryIndex BulkPostings API yet) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13484107#comment-13484107 ] Simon Willnauer commented on LUCENE-2878: - Alan, I merged up with trunk and fixed some small bugs. +1 to all the cleanups > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Simon Willnauer > Labels: gsoc2011, gsoc2012, lucene-gsoc-11, lucene-gsoc-12, > mentor > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, PosHighlighter.patch, > PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain today I need a break first :). > The patch passes all core tests > (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't > look into the MemoryIndex BulkPostings API yet) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13483976#comment-13483976 ] Alan Woodward commented on LUCENE-2878: --- OK, done, added some more javadocs as well. Next cleanup is to make the distinction between iterators and filters a bit more explicit, I think. We've got some iterators that also act as filters, and some which are distinct. I think they should all be separate classes - filters are a public API that clients can use to create queries, whereas Iterators are an implementation detail. > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: Positions Branch >Reporter: Simon Willnauer >Assignee: Simon Willnauer > Labels: gsoc2011, gsoc2012, lucene-gsoc-11, lucene-gsoc-12, > mentor > Fix For: Positions Branch > > Attachments: LUCENE-2878-OR.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, PosHighlighter.patch, > PosHighlighter.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain today I need a break first :). > The patch passes all core tests > (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't > look into the MemoryIndex BulkPostings API yet) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org