Hey, I reckon I can have a decent go at getting the branch updated.  Is it best 
to work this out as a patch applying to trunk?  Any patch that merges in all 
the trunk changes to the branch is going to be absolutely massiveā€¦

On 17 May 2012, at 13:15, Simon Willnauer wrote:

> ok man. I will try to merge up the branch. I tell you this is going to
> be messy and it might not compile but I will make it reasonable so you
> can start.
> 
> simon
> 
> On Thu, May 17, 2012 at 8:03 AM, Alan Woodward
> <alan.woodw...@romseysoftware.co.uk> wrote:
>> Sorry for vanishing for so long, life unexpectedly caught up with me...  I'm 
>> going to have some time to look at this again next week though, if you're 
>> interested in picking it up again.
>> 
>> On 21 Mar 2012, at 09:02, Alan Woodward wrote:
>> 
>>> That would be great, thanks!  I had a go at merging it last night, but 
>>> there are a *lot* of changes that I haven't got my head round yet, so it 
>>> was getting pretty messy.
>>> 
>>> On 21 Mar 2012, at 08:49, Simon Willnauer wrote:
>>> 
>>>> Alan, if you want I can just merge the branch up next week and we
>>>> iterate from there?
>>>> 
>>>> simon
>>>> 
>>>> On Tue, Mar 20, 2012 at 12:34 PM, Erick Erickson
>>>> <erickerick...@gmail.com> wrote:
>>>>> Yep, the first challenge is always getting the old patch(es) to apply.....
>>>>> 
>>>>> On Tue, Mar 20, 2012 at 4:09 AM, Alan Woodward
>>>>> <alan.woodw...@romseysoftware.co.uk> wrote:
>>>>>> Thanks for all the offers of help!  It looks as though most of the hard 
>>>>>> work has already been done, which is exactly where I like to pick up 
>>>>>> projects.  :-)
>>>>>> 
>>>>>> Maybe the best place to start would be for me to rebase the branch 
>>>>>> against trunk, and see what still fits?  I think there have been some 
>>>>>> fairly major changes in the internals since July last year.
>>>>>> 
>>>>>> On 19 Mar 2012, at 17:07, Mike Sokolov wrote:
>>>>>> 
>>>>>>> I posted a patch with a Collector somewhat similar to what you 
>>>>>>> described, Alan - it's attached to one of the sub-issues 
>>>>>>> https://issues.apache.org/jira/browse/LUCENE-3318.   It is in a fairly 
>>>>>>> complete "alpha" state, but has seen no production use of course, since 
>>>>>>> it relies on the remainder of the unfinished work in that branch.  It 
>>>>>>> works by creating a TokenStream based on match positions returned from 
>>>>>>> the query and passing that to the existing Highlighter.  Please feel 
>>>>>>> free to get in touch if you decide to look into that and have questions.
>>>>>>> 
>>>>>>> 
>>>>>>> -Mike
>>>>>>> 
>>>>>>> On 03/19/2012 11:51 AM, Simon Willnauer wrote:
>>>>>>>> On Mon, Mar 19, 2012 at 4:50 PM, Uwe Schindler<u...@thetaphi.de>  
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Have you marked that for GSOC? Would be a good idea!
>>>>>>>>> 
>>>>>>>> yes I did
>>>>>>>> 
>>>>>>>>> -----
>>>>>>>>> Uwe Schindler
>>>>>>>>> H.-H.-Meier-Allee 63, D-28213 Bremen
>>>>>>>>> http://www.thetaphi.de
>>>>>>>>> eMail: u...@thetaphi.de
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: Simon Willnauer [mailto:simon.willna...@googlemail.com]
>>>>>>>>>> Sent: Monday, March 19, 2012 4:43 PM
>>>>>>>>>> To: dev@lucene.apache.org
>>>>>>>>>> Subject: Re: Using term offsets for hit highlighting
>>>>>>>>>> 
>>>>>>>>>> Alan, you made my day!
>>>>>>>>>> 
>>>>>>>>>> The branch is kind of outdated but I looked at it lately and I can 
>>>>>>>>>> certainly help
>>>>>>>>>> to get it up to speed. The feature in that branch is quite a big one 
>>>>>>>>>> and its in a
>>>>>>>>>> very early stage. Still I want to encourage you to take a look and 
>>>>>>>>>> work on it. I
>>>>>>>>>> promise all my help with the issues!
>>>>>>>>>> 
>>>>>>>>>> let me know if you have questions!
>>>>>>>>>> 
>>>>>>>>>> simon
>>>>>>>>>> 
>>>>>>>>>> On Mon, Mar 19, 2012 at 3:52 PM, Alan Woodward
>>>>>>>>>> <alan.woodw...@romseysoftware.co.uk>  wrote:
>>>>>>>>>> 
>>>>>>>>>>> Cool, thanks Robert.  I'll take a look at the JIRA ticket.
>>>>>>>>>>> 
>>>>>>>>>>> On 19 Mar 2012, at 14:44, Robert Muir wrote:
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>>> On Mon, Mar 19, 2012 at 10:38 AM, Alan Woodward
>>>>>>>>>>>> <alan.woodw...@romseysoftware.co.uk>  wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> The project I'm currently working on requires the reporting of 
>>>>>>>>>>>>> exact
>>>>>>>>>>>>> hit positions from some pretty hairy queries, not all of which are
>>>>>>>>>>>>> covered by the existing highlighter modules.  I'm working round 
>>>>>>>>>>>>> this
>>>>>>>>>>>>> by translating everything into SpanQueries, and using the 
>>>>>>>>>>>>> getSpans()
>>>>>>>>>>>>> method to locate hits (I've extended the Spans interface to make
>>>>>>>>>>>>> term offsets available - see
>>>>>>>>>>>>> https://issues.apache.org/jira/browse/LUCENE-3826).  This works 
>>>>>>>>>>>>> for
>>>>>>>>>>>>> our use-case, but isn't terribly efficient, and obviously isn't 
>>>>>>>>>>>>> applicable to
>>>>>>>>>>>>> 
>>>>>>>>>> non-Span queries.
>>>>>>>>>> 
>>>>>>>>>>>>> I've seen a bit of chatter on the list about using term offsets to
>>>>>>>>>>>>> provide accurate highlighting in Lucene.  I'm going to have a 
>>>>>>>>>>>>> couple
>>>>>>>>>>>>> of weeks free in April, and I thought I might have a go at
>>>>>>>>>>>>> implementing this.  Mainly I'm wondering if there's already been
>>>>>>>>>>>>> thoughts about how to do it.  My current thoughts are to somehow
>>>>>>>>>>>>> extend the Weight and Scorer interface to make term offsets
>>>>>>>>>>>>> available; to get highlights for a given set of documents, you'd
>>>>>>>>>>>>> essentially run the query again, with a filter on just the 
>>>>>>>>>>>>> documents
>>>>>>>>>>>>> you want highlighted, and have a custom collector that gets the 
>>>>>>>>>>>>> term
>>>>>>>>>>>>> 
>>>>>>>>>> offsets in place of the scores.
>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> Hi Alan, Simon started some initial work on
>>>>>>>>>>>> https://issues.apache.org/jira/browse/LUCENE-2878
>>>>>>>>>>>> 
>>>>>>>>>>>> Some work and prototypes were done in a branch, but it might be
>>>>>>>>>>>> lagging behind trunk a bit.
>>>>>>>>>>>> 
>>>>>>>>>>>> Additionally at the time it was first done, I think we didn't yet
>>>>>>>>>>>> support offsets in the postings lists.
>>>>>>>>>>>> We've since added this and several codecs support it.
>>>>>>>>>>>> 
>>>>>>>>>>>> --
>>>>>>>>>>>> lucidimagination.com
>>>>>>>>>>>> 
>>>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For
>>>>>>>>>>>> additional commands, e-mail: dev-h...@lucene.apache.org
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For
>>>>>>>>>>> additional commands, e-mail: dev-h...@lucene.apache.org
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For 
>>>>>>>>>> additional
>>>>>>>>>> commands, e-mail: dev-h...@lucene.apache.org
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>>>>>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>>>>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>>>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>>>> 
>>>>> 
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>>> 
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>> 
>>> 
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>> 
>> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to