OK, so the most straightforward way to do that would be to change the signature to positions(boolean needsPayloads, boolean needsOffsets), I guess. This is a new API so it's not breaking anything.
It'll be tomorrow morning before I have a proper go at this now (Cambridge Beer Festival tonight…). Is the mailing list the best place to discuss this, or is JIRA/IRC better? On 23 May 2012, at 13:43, Simon Willnauer wrote: > hey alan, > > I added position iterator support to ConjunctionTermScorer and > committed it to the branch. All tests that don't rely on payloads are > passing in core. Previously we had to decide if we need positions up > front, the current code can pull them lazily which causes less changes > on the Scorer API. I think we should keep it that way, the only > problem is that we have currently now way to pass information to the > iterators if we need payloads or not. Same is true for offsets since > they are now in the index. I think it would be good if you could > tackle the payloads first and pass some info to the Scorer#positions() > method so we can pull the right thing. > > happy coding. > > simon > > On Wed, May 23, 2012 at 1:23 PM, Alan Woodward > <[email protected]> wrote: >> Sweet, thanks Simon. I'll have a go at getting some failing tests passing >> to begin with. >> >> On 23 May 2012, at 11:59, Simon Willnauer wrote: >> >>> alan, >>> >>> I merged the branch manually and created a new branch from it. its >>> here: https://svn.apache.org/repos/asf/lucene/dev/branches/LUCENE-2878 >>> the branch compiles but lots of nocommits / todos >>> >>> if you have questions please ask I will help as much as I can >>> >>> simon >>> >>> On Tue, May 22, 2012 at 8:38 PM, Alan Woodward >>> <[email protected]> wrote: >>>> Hey, I reckon I can have a decent go at getting the branch updated. Is it >>>> best to work this out as a patch applying to trunk? Any patch that merges >>>> in all the trunk changes to the branch is going to be absolutely massive… >>>> >>>> On 17 May 2012, at 13:15, Simon Willnauer wrote: >>>> >>>>> ok man. I will try to merge up the branch. I tell you this is going to >>>>> be messy and it might not compile but I will make it reasonable so you >>>>> can start. >>>>> >>>>> simon >>>>> >>>>> On Thu, May 17, 2012 at 8:03 AM, Alan Woodward >>>>> <[email protected]> wrote: >>>>>> Sorry for vanishing for so long, life unexpectedly caught up with me... >>>>>> I'm going to have some time to look at this again next week though, if >>>>>> you're interested in picking it up again. >>>>>> >>>>>> On 21 Mar 2012, at 09:02, Alan Woodward wrote: >>>>>> >>>>>>> That would be great, thanks! I had a go at merging it last night, but >>>>>>> there are a *lot* of changes that I haven't got my head round yet, so >>>>>>> it was getting pretty messy. >>>>>>> >>>>>>> On 21 Mar 2012, at 08:49, Simon Willnauer wrote: >>>>>>> >>>>>>>> Alan, if you want I can just merge the branch up next week and we >>>>>>>> iterate from there? >>>>>>>> >>>>>>>> simon >>>>>>>> >>>>>>>> On Tue, Mar 20, 2012 at 12:34 PM, Erick Erickson >>>>>>>> <[email protected]> wrote: >>>>>>>>> Yep, the first challenge is always getting the old patch(es) to >>>>>>>>> apply..... >>>>>>>>> >>>>>>>>> On Tue, Mar 20, 2012 at 4:09 AM, Alan Woodward >>>>>>>>> <[email protected]> wrote: >>>>>>>>>> Thanks for all the offers of help! It looks as though most of the >>>>>>>>>> hard work has already been done, which is exactly where I like to >>>>>>>>>> pick up projects. :-) >>>>>>>>>> >>>>>>>>>> Maybe the best place to start would be for me to rebase the branch >>>>>>>>>> against trunk, and see what still fits? I think there have been >>>>>>>>>> some fairly major changes in the internals since July last year. >>>>>>>>>> >>>>>>>>>> On 19 Mar 2012, at 17:07, Mike Sokolov wrote: >>>>>>>>>> >>>>>>>>>>> I posted a patch with a Collector somewhat similar to what you >>>>>>>>>>> described, Alan - it's attached to one of the sub-issues >>>>>>>>>>> https://issues.apache.org/jira/browse/LUCENE-3318. It is in a >>>>>>>>>>> fairly complete "alpha" state, but has seen no production use of >>>>>>>>>>> course, since it relies on the remainder of the unfinished work in >>>>>>>>>>> that branch. It works by creating a TokenStream based on match >>>>>>>>>>> positions returned from the query and passing that to the existing >>>>>>>>>>> Highlighter. Please feel free to get in touch if you decide to >>>>>>>>>>> look into that and have questions. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -Mike >>>>>>>>>>> >>>>>>>>>>> On 03/19/2012 11:51 AM, Simon Willnauer wrote: >>>>>>>>>>>> On Mon, Mar 19, 2012 at 4:50 PM, Uwe Schindler<[email protected]> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Have you marked that for GSOC? Would be a good idea! >>>>>>>>>>>>> >>>>>>>>>>>> yes I did >>>>>>>>>>>> >>>>>>>>>>>>> ----- >>>>>>>>>>>>> Uwe Schindler >>>>>>>>>>>>> H.-H.-Meier-Allee 63, D-28213 Bremen >>>>>>>>>>>>> http://www.thetaphi.de >>>>>>>>>>>>> eMail: [email protected] >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> -----Original Message----- >>>>>>>>>>>>>> From: Simon Willnauer [mailto:[email protected]] >>>>>>>>>>>>>> Sent: Monday, March 19, 2012 4:43 PM >>>>>>>>>>>>>> To: [email protected] >>>>>>>>>>>>>> Subject: Re: Using term offsets for hit highlighting >>>>>>>>>>>>>> >>>>>>>>>>>>>> Alan, you made my day! >>>>>>>>>>>>>> >>>>>>>>>>>>>> The branch is kind of outdated but I looked at it lately and I >>>>>>>>>>>>>> can certainly help >>>>>>>>>>>>>> to get it up to speed. The feature in that branch is quite a big >>>>>>>>>>>>>> one and its in a >>>>>>>>>>>>>> very early stage. Still I want to encourage you to take a look >>>>>>>>>>>>>> and work on it. I >>>>>>>>>>>>>> promise all my help with the issues! >>>>>>>>>>>>>> >>>>>>>>>>>>>> let me know if you have questions! >>>>>>>>>>>>>> >>>>>>>>>>>>>> simon >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Mon, Mar 19, 2012 at 3:52 PM, Alan Woodward >>>>>>>>>>>>>> <[email protected]> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Cool, thanks Robert. I'll take a look at the JIRA ticket. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On 19 Mar 2012, at 14:44, Robert Muir wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Mon, Mar 19, 2012 at 10:38 AM, Alan Woodward >>>>>>>>>>>>>>>> <[email protected]> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> The project I'm currently working on requires the reporting >>>>>>>>>>>>>>>>> of exact >>>>>>>>>>>>>>>>> hit positions from some pretty hairy queries, not all of >>>>>>>>>>>>>>>>> which are >>>>>>>>>>>>>>>>> covered by the existing highlighter modules. I'm working >>>>>>>>>>>>>>>>> round this >>>>>>>>>>>>>>>>> by translating everything into SpanQueries, and using the >>>>>>>>>>>>>>>>> getSpans() >>>>>>>>>>>>>>>>> method to locate hits (I've extended the Spans interface to >>>>>>>>>>>>>>>>> make >>>>>>>>>>>>>>>>> term offsets available - see >>>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/LUCENE-3826). This >>>>>>>>>>>>>>>>> works for >>>>>>>>>>>>>>>>> our use-case, but isn't terribly efficient, and obviously >>>>>>>>>>>>>>>>> isn't applicable to >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> non-Span queries. >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I've seen a bit of chatter on the list about using term >>>>>>>>>>>>>>>>> offsets to >>>>>>>>>>>>>>>>> provide accurate highlighting in Lucene. I'm going to have a >>>>>>>>>>>>>>>>> couple >>>>>>>>>>>>>>>>> of weeks free in April, and I thought I might have a go at >>>>>>>>>>>>>>>>> implementing this. Mainly I'm wondering if there's already >>>>>>>>>>>>>>>>> been >>>>>>>>>>>>>>>>> thoughts about how to do it. My current thoughts are to >>>>>>>>>>>>>>>>> somehow >>>>>>>>>>>>>>>>> extend the Weight and Scorer interface to make term offsets >>>>>>>>>>>>>>>>> available; to get highlights for a given set of documents, >>>>>>>>>>>>>>>>> you'd >>>>>>>>>>>>>>>>> essentially run the query again, with a filter on just the >>>>>>>>>>>>>>>>> documents >>>>>>>>>>>>>>>>> you want highlighted, and have a custom collector that gets >>>>>>>>>>>>>>>>> the term >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> offsets in place of the scores. >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi Alan, Simon started some initial work on >>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/LUCENE-2878 >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Some work and prototypes were done in a branch, but it might be >>>>>>>>>>>>>>>> lagging behind trunk a bit. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Additionally at the time it was first done, I think we didn't >>>>>>>>>>>>>>>> yet >>>>>>>>>>>>>>>> support offsets in the postings lists. >>>>>>>>>>>>>>>> We've since added this and several codecs support it. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>> lucidimagination.com >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> --------------------------------------------------------------------- >>>>>>>>>>>>>>>> To unsubscribe, e-mail: [email protected] For >>>>>>>>>>>>>>>> additional commands, e-mail: [email protected] >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> --------------------------------------------------------------------- >>>>>>>>>>>>>>> To unsubscribe, e-mail: [email protected] For >>>>>>>>>>>>>>> additional commands, e-mail: [email protected] >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> --------------------------------------------------------------------- >>>>>>>>>>>>>> To unsubscribe, e-mail: [email protected] For >>>>>>>>>>>>>> additional >>>>>>>>>>>>>> commands, e-mail: [email protected] >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> --------------------------------------------------------------------- >>>>>>>>>>>> To unsubscribe, e-mail: [email protected] >>>>>>>>>>>> For additional commands, e-mail: [email protected] >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> --------------------------------------------------------------------- >>>>>>>>>>> To unsubscribe, e-mail: [email protected] >>>>>>>>>>> For additional commands, e-mail: [email protected] >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> --------------------------------------------------------------------- >>>>>>>>>> To unsubscribe, e-mail: [email protected] >>>>>>>>>> For additional commands, e-mail: [email protected] >>>>>>>>>> >>>>>>>>> >>>>>>>>> --------------------------------------------------------------------- >>>>>>>>> To unsubscribe, e-mail: [email protected] >>>>>>>>> For additional commands, e-mail: [email protected] >>>>>>>>> >>>>>>>> >>>>>>>> --------------------------------------------------------------------- >>>>>>>> To unsubscribe, e-mail: [email protected] >>>>>>>> For additional commands, e-mail: [email protected] >>>>>>>> >>>>>>> >>>>>>> >>>>>>> --------------------------------------------------------------------- >>>>>>> To unsubscribe, e-mail: [email protected] >>>>>>> For additional commands, e-mail: [email protected] >>>>>>> >>>>>> >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: [email protected] >>>>> For additional commands, e-mail: [email protected] >>>>> >>>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: [email protected] >>> For additional commands, e-mail: [email protected] >>> >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
