Hey, I reckon I can have a decent go at getting the branch updated. Is it best to work this out as a patch applying to trunk? Any patch that merges in all the trunk changes to the branch is going to be absolutely massiveā¦
On 17 May 2012, at 13:15, Simon Willnauer wrote: > ok man. I will try to merge up the branch. I tell you this is going to > be messy and it might not compile but I will make it reasonable so you > can start. > > simon > > On Thu, May 17, 2012 at 8:03 AM, Alan Woodward > <[email protected]> wrote: >> Sorry for vanishing for so long, life unexpectedly caught up with me... I'm >> going to have some time to look at this again next week though, if you're >> interested in picking it up again. >> >> On 21 Mar 2012, at 09:02, Alan Woodward wrote: >> >>> That would be great, thanks! I had a go at merging it last night, but >>> there are a *lot* of changes that I haven't got my head round yet, so it >>> was getting pretty messy. >>> >>> On 21 Mar 2012, at 08:49, Simon Willnauer wrote: >>> >>>> Alan, if you want I can just merge the branch up next week and we >>>> iterate from there? >>>> >>>> simon >>>> >>>> On Tue, Mar 20, 2012 at 12:34 PM, Erick Erickson >>>> <[email protected]> wrote: >>>>> Yep, the first challenge is always getting the old patch(es) to apply..... >>>>> >>>>> On Tue, Mar 20, 2012 at 4:09 AM, Alan Woodward >>>>> <[email protected]> wrote: >>>>>> Thanks for all the offers of help! It looks as though most of the hard >>>>>> work has already been done, which is exactly where I like to pick up >>>>>> projects. :-) >>>>>> >>>>>> Maybe the best place to start would be for me to rebase the branch >>>>>> against trunk, and see what still fits? I think there have been some >>>>>> fairly major changes in the internals since July last year. >>>>>> >>>>>> On 19 Mar 2012, at 17:07, Mike Sokolov wrote: >>>>>> >>>>>>> I posted a patch with a Collector somewhat similar to what you >>>>>>> described, Alan - it's attached to one of the sub-issues >>>>>>> https://issues.apache.org/jira/browse/LUCENE-3318. It is in a fairly >>>>>>> complete "alpha" state, but has seen no production use of course, since >>>>>>> it relies on the remainder of the unfinished work in that branch. It >>>>>>> works by creating a TokenStream based on match positions returned from >>>>>>> the query and passing that to the existing Highlighter. Please feel >>>>>>> free to get in touch if you decide to look into that and have questions. >>>>>>> >>>>>>> >>>>>>> -Mike >>>>>>> >>>>>>> On 03/19/2012 11:51 AM, Simon Willnauer wrote: >>>>>>>> On Mon, Mar 19, 2012 at 4:50 PM, Uwe Schindler<[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Have you marked that for GSOC? Would be a good idea! >>>>>>>>> >>>>>>>> yes I did >>>>>>>> >>>>>>>>> ----- >>>>>>>>> Uwe Schindler >>>>>>>>> H.-H.-Meier-Allee 63, D-28213 Bremen >>>>>>>>> http://www.thetaphi.de >>>>>>>>> eMail: [email protected] >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> -----Original Message----- >>>>>>>>>> From: Simon Willnauer [mailto:[email protected]] >>>>>>>>>> Sent: Monday, March 19, 2012 4:43 PM >>>>>>>>>> To: [email protected] >>>>>>>>>> Subject: Re: Using term offsets for hit highlighting >>>>>>>>>> >>>>>>>>>> Alan, you made my day! >>>>>>>>>> >>>>>>>>>> The branch is kind of outdated but I looked at it lately and I can >>>>>>>>>> certainly help >>>>>>>>>> to get it up to speed. The feature in that branch is quite a big one >>>>>>>>>> and its in a >>>>>>>>>> very early stage. Still I want to encourage you to take a look and >>>>>>>>>> work on it. I >>>>>>>>>> promise all my help with the issues! >>>>>>>>>> >>>>>>>>>> let me know if you have questions! >>>>>>>>>> >>>>>>>>>> simon >>>>>>>>>> >>>>>>>>>> On Mon, Mar 19, 2012 at 3:52 PM, Alan Woodward >>>>>>>>>> <[email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> Cool, thanks Robert. I'll take a look at the JIRA ticket. >>>>>>>>>>> >>>>>>>>>>> On 19 Mar 2012, at 14:44, Robert Muir wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> On Mon, Mar 19, 2012 at 10:38 AM, Alan Woodward >>>>>>>>>>>> <[email protected]> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hello, >>>>>>>>>>>>> >>>>>>>>>>>>> The project I'm currently working on requires the reporting of >>>>>>>>>>>>> exact >>>>>>>>>>>>> hit positions from some pretty hairy queries, not all of which are >>>>>>>>>>>>> covered by the existing highlighter modules. I'm working round >>>>>>>>>>>>> this >>>>>>>>>>>>> by translating everything into SpanQueries, and using the >>>>>>>>>>>>> getSpans() >>>>>>>>>>>>> method to locate hits (I've extended the Spans interface to make >>>>>>>>>>>>> term offsets available - see >>>>>>>>>>>>> https://issues.apache.org/jira/browse/LUCENE-3826). This works >>>>>>>>>>>>> for >>>>>>>>>>>>> our use-case, but isn't terribly efficient, and obviously isn't >>>>>>>>>>>>> applicable to >>>>>>>>>>>>> >>>>>>>>>> non-Span queries. >>>>>>>>>> >>>>>>>>>>>>> I've seen a bit of chatter on the list about using term offsets to >>>>>>>>>>>>> provide accurate highlighting in Lucene. I'm going to have a >>>>>>>>>>>>> couple >>>>>>>>>>>>> of weeks free in April, and I thought I might have a go at >>>>>>>>>>>>> implementing this. Mainly I'm wondering if there's already been >>>>>>>>>>>>> thoughts about how to do it. My current thoughts are to somehow >>>>>>>>>>>>> extend the Weight and Scorer interface to make term offsets >>>>>>>>>>>>> available; to get highlights for a given set of documents, you'd >>>>>>>>>>>>> essentially run the query again, with a filter on just the >>>>>>>>>>>>> documents >>>>>>>>>>>>> you want highlighted, and have a custom collector that gets the >>>>>>>>>>>>> term >>>>>>>>>>>>> >>>>>>>>>> offsets in place of the scores. >>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> Hi Alan, Simon started some initial work on >>>>>>>>>>>> https://issues.apache.org/jira/browse/LUCENE-2878 >>>>>>>>>>>> >>>>>>>>>>>> Some work and prototypes were done in a branch, but it might be >>>>>>>>>>>> lagging behind trunk a bit. >>>>>>>>>>>> >>>>>>>>>>>> Additionally at the time it was first done, I think we didn't yet >>>>>>>>>>>> support offsets in the postings lists. >>>>>>>>>>>> We've since added this and several codecs support it. >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> lucidimagination.com >>>>>>>>>>>> >>>>>>>>>>>> --------------------------------------------------------------------- >>>>>>>>>>>> To unsubscribe, e-mail: [email protected] For >>>>>>>>>>>> additional commands, e-mail: [email protected] >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> --------------------------------------------------------------------- >>>>>>>>>>> To unsubscribe, e-mail: [email protected] For >>>>>>>>>>> additional commands, e-mail: [email protected] >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> --------------------------------------------------------------------- >>>>>>>>>> To unsubscribe, e-mail: [email protected] For >>>>>>>>>> additional >>>>>>>>>> commands, e-mail: [email protected] >>>>>>>>>> >>>>>>>>> >>>>>>>> --------------------------------------------------------------------- >>>>>>>> To unsubscribe, e-mail: [email protected] >>>>>>>> For additional commands, e-mail: [email protected] >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> --------------------------------------------------------------------- >>>>>>> To unsubscribe, e-mail: [email protected] >>>>>>> For additional commands, e-mail: [email protected] >>>>>>> >>>>>> >>>>>> >>>>>> --------------------------------------------------------------------- >>>>>> To unsubscribe, e-mail: [email protected] >>>>>> For additional commands, e-mail: [email protected] >>>>>> >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: [email protected] >>>>> For additional commands, e-mail: [email protected] >>>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: [email protected] >>>> For additional commands, e-mail: [email protected] >>>> >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: [email protected] >>> For additional commands, e-mail: [email protected] >>> >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
