Sweet, thanks Simon.  I'll have a go at getting some failing tests passing to 
begin with.

On 23 May 2012, at 11:59, Simon Willnauer wrote:

> alan,
> 
> I merged the branch manually and created a new branch from it. its
> here: https://svn.apache.org/repos/asf/lucene/dev/branches/LUCENE-2878
> the branch compiles but lots of nocommits / todos
> 
> if you have questions please ask I will help as much as I can
> 
> simon
> 
> On Tue, May 22, 2012 at 8:38 PM, Alan Woodward
> <alan.woodw...@romseysoftware.co.uk> wrote:
>> Hey, I reckon I can have a decent go at getting the branch updated.  Is it 
>> best to work this out as a patch applying to trunk?  Any patch that merges 
>> in all the trunk changes to the branch is going to be absolutely massiveā€¦
>> 
>> On 17 May 2012, at 13:15, Simon Willnauer wrote:
>> 
>>> ok man. I will try to merge up the branch. I tell you this is going to
>>> be messy and it might not compile but I will make it reasonable so you
>>> can start.
>>> 
>>> simon
>>> 
>>> On Thu, May 17, 2012 at 8:03 AM, Alan Woodward
>>> <alan.woodw...@romseysoftware.co.uk> wrote:
>>>> Sorry for vanishing for so long, life unexpectedly caught up with me...  
>>>> I'm going to have some time to look at this again next week though, if 
>>>> you're interested in picking it up again.
>>>> 
>>>> On 21 Mar 2012, at 09:02, Alan Woodward wrote:
>>>> 
>>>>> That would be great, thanks!  I had a go at merging it last night, but 
>>>>> there are a *lot* of changes that I haven't got my head round yet, so it 
>>>>> was getting pretty messy.
>>>>> 
>>>>> On 21 Mar 2012, at 08:49, Simon Willnauer wrote:
>>>>> 
>>>>>> Alan, if you want I can just merge the branch up next week and we
>>>>>> iterate from there?
>>>>>> 
>>>>>> simon
>>>>>> 
>>>>>> On Tue, Mar 20, 2012 at 12:34 PM, Erick Erickson
>>>>>> <erickerick...@gmail.com> wrote:
>>>>>>> Yep, the first challenge is always getting the old patch(es) to 
>>>>>>> apply.....
>>>>>>> 
>>>>>>> On Tue, Mar 20, 2012 at 4:09 AM, Alan Woodward
>>>>>>> <alan.woodw...@romseysoftware.co.uk> wrote:
>>>>>>>> Thanks for all the offers of help!  It looks as though most of the 
>>>>>>>> hard work has already been done, which is exactly where I like to pick 
>>>>>>>> up projects.  :-)
>>>>>>>> 
>>>>>>>> Maybe the best place to start would be for me to rebase the branch 
>>>>>>>> against trunk, and see what still fits?  I think there have been some 
>>>>>>>> fairly major changes in the internals since July last year.
>>>>>>>> 
>>>>>>>> On 19 Mar 2012, at 17:07, Mike Sokolov wrote:
>>>>>>>> 
>>>>>>>>> I posted a patch with a Collector somewhat similar to what you 
>>>>>>>>> described, Alan - it's attached to one of the sub-issues 
>>>>>>>>> https://issues.apache.org/jira/browse/LUCENE-3318.   It is in a 
>>>>>>>>> fairly complete "alpha" state, but has seen no production use of 
>>>>>>>>> course, since it relies on the remainder of the unfinished work in 
>>>>>>>>> that branch.  It works by creating a TokenStream based on match 
>>>>>>>>> positions returned from the query and passing that to the existing 
>>>>>>>>> Highlighter.  Please feel free to get in touch if you decide to look 
>>>>>>>>> into that and have questions.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> -Mike
>>>>>>>>> 
>>>>>>>>> On 03/19/2012 11:51 AM, Simon Willnauer wrote:
>>>>>>>>>> On Mon, Mar 19, 2012 at 4:50 PM, Uwe Schindler<u...@thetaphi.de>  
>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Have you marked that for GSOC? Would be a good idea!
>>>>>>>>>>> 
>>>>>>>>>> yes I did
>>>>>>>>>> 
>>>>>>>>>>> -----
>>>>>>>>>>> Uwe Schindler
>>>>>>>>>>> H.-H.-Meier-Allee 63, D-28213 Bremen
>>>>>>>>>>> http://www.thetaphi.de
>>>>>>>>>>> eMail: u...@thetaphi.de
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>> From: Simon Willnauer [mailto:simon.willna...@googlemail.com]
>>>>>>>>>>>> Sent: Monday, March 19, 2012 4:43 PM
>>>>>>>>>>>> To: dev@lucene.apache.org
>>>>>>>>>>>> Subject: Re: Using term offsets for hit highlighting
>>>>>>>>>>>> 
>>>>>>>>>>>> Alan, you made my day!
>>>>>>>>>>>> 
>>>>>>>>>>>> The branch is kind of outdated but I looked at it lately and I can 
>>>>>>>>>>>> certainly help
>>>>>>>>>>>> to get it up to speed. The feature in that branch is quite a big 
>>>>>>>>>>>> one and its in a
>>>>>>>>>>>> very early stage. Still I want to encourage you to take a look and 
>>>>>>>>>>>> work on it. I
>>>>>>>>>>>> promise all my help with the issues!
>>>>>>>>>>>> 
>>>>>>>>>>>> let me know if you have questions!
>>>>>>>>>>>> 
>>>>>>>>>>>> simon
>>>>>>>>>>>> 
>>>>>>>>>>>> On Mon, Mar 19, 2012 at 3:52 PM, Alan Woodward
>>>>>>>>>>>> <alan.woodw...@romseysoftware.co.uk>  wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> Cool, thanks Robert.  I'll take a look at the JIRA ticket.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On 19 Mar 2012, at 14:44, Robert Muir wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Mon, Mar 19, 2012 at 10:38 AM, Alan Woodward
>>>>>>>>>>>>>> <alan.woodw...@romseysoftware.co.uk>  wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> The project I'm currently working on requires the reporting of 
>>>>>>>>>>>>>>> exact
>>>>>>>>>>>>>>> hit positions from some pretty hairy queries, not all of which 
>>>>>>>>>>>>>>> are
>>>>>>>>>>>>>>> covered by the existing highlighter modules.  I'm working round 
>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>> by translating everything into SpanQueries, and using the 
>>>>>>>>>>>>>>> getSpans()
>>>>>>>>>>>>>>> method to locate hits (I've extended the Spans interface to make
>>>>>>>>>>>>>>> term offsets available - see
>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/LUCENE-3826).  This works 
>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>> our use-case, but isn't terribly efficient, and obviously isn't 
>>>>>>>>>>>>>>> applicable to
>>>>>>>>>>>>>>> 
>>>>>>>>>>>> non-Span queries.
>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I've seen a bit of chatter on the list about using term offsets 
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>> provide accurate highlighting in Lucene.  I'm going to have a 
>>>>>>>>>>>>>>> couple
>>>>>>>>>>>>>>> of weeks free in April, and I thought I might have a go at
>>>>>>>>>>>>>>> implementing this.  Mainly I'm wondering if there's already been
>>>>>>>>>>>>>>> thoughts about how to do it.  My current thoughts are to somehow
>>>>>>>>>>>>>>> extend the Weight and Scorer interface to make term offsets
>>>>>>>>>>>>>>> available; to get highlights for a given set of documents, you'd
>>>>>>>>>>>>>>> essentially run the query again, with a filter on just the 
>>>>>>>>>>>>>>> documents
>>>>>>>>>>>>>>> you want highlighted, and have a custom collector that gets the 
>>>>>>>>>>>>>>> term
>>>>>>>>>>>>>>> 
>>>>>>>>>>>> offsets in place of the scores.
>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Hi Alan, Simon started some initial work on
>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/LUCENE-2878
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Some work and prototypes were done in a branch, but it might be
>>>>>>>>>>>>>> lagging behind trunk a bit.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Additionally at the time it was first done, I think we didn't yet
>>>>>>>>>>>>>> support offsets in the postings lists.
>>>>>>>>>>>>>> We've since added this and several codecs support it.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> lucidimagination.com
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For
>>>>>>>>>>>>>> additional commands, e-mail: dev-h...@lucene.apache.org
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For
>>>>>>>>>>>>> additional commands, e-mail: dev-h...@lucene.apache.org
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For 
>>>>>>>>>>>> additional
>>>>>>>>>>>> commands, e-mail: dev-h...@lucene.apache.org
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>>>>>>>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>>>>>>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>>>>>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>>>>>> 
>>>>>>> 
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>>>>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>>>>> 
>>>>>> 
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>>>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>>>> 
>>>>> 
>>>>> 
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>>> 
>>>> 
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>> 
>> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to