alan,

I merged the branch manually and created a new branch from it. its
here: https://svn.apache.org/repos/asf/lucene/dev/branches/LUCENE-2878
the branch compiles but lots of nocommits / todos

if you have questions please ask I will help as much as I can

simon

On Tue, May 22, 2012 at 8:38 PM, Alan Woodward
<[email protected]> wrote:
> Hey, I reckon I can have a decent go at getting the branch updated.  Is it 
> best to work this out as a patch applying to trunk?  Any patch that merges in 
> all the trunk changes to the branch is going to be absolutely massive…
>
> On 17 May 2012, at 13:15, Simon Willnauer wrote:
>
>> ok man. I will try to merge up the branch. I tell you this is going to
>> be messy and it might not compile but I will make it reasonable so you
>> can start.
>>
>> simon
>>
>> On Thu, May 17, 2012 at 8:03 AM, Alan Woodward
>> <[email protected]> wrote:
>>> Sorry for vanishing for so long, life unexpectedly caught up with me...  
>>> I'm going to have some time to look at this again next week though, if 
>>> you're interested in picking it up again.
>>>
>>> On 21 Mar 2012, at 09:02, Alan Woodward wrote:
>>>
>>>> That would be great, thanks!  I had a go at merging it last night, but 
>>>> there are a *lot* of changes that I haven't got my head round yet, so it 
>>>> was getting pretty messy.
>>>>
>>>> On 21 Mar 2012, at 08:49, Simon Willnauer wrote:
>>>>
>>>>> Alan, if you want I can just merge the branch up next week and we
>>>>> iterate from there?
>>>>>
>>>>> simon
>>>>>
>>>>> On Tue, Mar 20, 2012 at 12:34 PM, Erick Erickson
>>>>> <[email protected]> wrote:
>>>>>> Yep, the first challenge is always getting the old patch(es) to 
>>>>>> apply.....
>>>>>>
>>>>>> On Tue, Mar 20, 2012 at 4:09 AM, Alan Woodward
>>>>>> <[email protected]> wrote:
>>>>>>> Thanks for all the offers of help!  It looks as though most of the hard 
>>>>>>> work has already been done, which is exactly where I like to pick up 
>>>>>>> projects.  :-)
>>>>>>>
>>>>>>> Maybe the best place to start would be for me to rebase the branch 
>>>>>>> against trunk, and see what still fits?  I think there have been some 
>>>>>>> fairly major changes in the internals since July last year.
>>>>>>>
>>>>>>> On 19 Mar 2012, at 17:07, Mike Sokolov wrote:
>>>>>>>
>>>>>>>> I posted a patch with a Collector somewhat similar to what you 
>>>>>>>> described, Alan - it's attached to one of the sub-issues 
>>>>>>>> https://issues.apache.org/jira/browse/LUCENE-3318.   It is in a fairly 
>>>>>>>> complete "alpha" state, but has seen no production use of course, 
>>>>>>>> since it relies on the remainder of the unfinished work in that 
>>>>>>>> branch.  It works by creating a TokenStream based on match positions 
>>>>>>>> returned from the query and passing that to the existing Highlighter.  
>>>>>>>> Please feel free to get in touch if you decide to look into that and 
>>>>>>>> have questions.
>>>>>>>>
>>>>>>>>
>>>>>>>> -Mike
>>>>>>>>
>>>>>>>> On 03/19/2012 11:51 AM, Simon Willnauer wrote:
>>>>>>>>> On Mon, Mar 19, 2012 at 4:50 PM, Uwe Schindler<[email protected]>  
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Have you marked that for GSOC? Would be a good idea!
>>>>>>>>>>
>>>>>>>>> yes I did
>>>>>>>>>
>>>>>>>>>> -----
>>>>>>>>>> Uwe Schindler
>>>>>>>>>> H.-H.-Meier-Allee 63, D-28213 Bremen
>>>>>>>>>> http://www.thetaphi.de
>>>>>>>>>> eMail: [email protected]
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>> From: Simon Willnauer [mailto:[email protected]]
>>>>>>>>>>> Sent: Monday, March 19, 2012 4:43 PM
>>>>>>>>>>> To: [email protected]
>>>>>>>>>>> Subject: Re: Using term offsets for hit highlighting
>>>>>>>>>>>
>>>>>>>>>>> Alan, you made my day!
>>>>>>>>>>>
>>>>>>>>>>> The branch is kind of outdated but I looked at it lately and I can 
>>>>>>>>>>> certainly help
>>>>>>>>>>> to get it up to speed. The feature in that branch is quite a big 
>>>>>>>>>>> one and its in a
>>>>>>>>>>> very early stage. Still I want to encourage you to take a look and 
>>>>>>>>>>> work on it. I
>>>>>>>>>>> promise all my help with the issues!
>>>>>>>>>>>
>>>>>>>>>>> let me know if you have questions!
>>>>>>>>>>>
>>>>>>>>>>> simon
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Mar 19, 2012 at 3:52 PM, Alan Woodward
>>>>>>>>>>> <[email protected]>  wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Cool, thanks Robert.  I'll take a look at the JIRA ticket.
>>>>>>>>>>>>
>>>>>>>>>>>> On 19 Mar 2012, at 14:44, Robert Muir wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Mar 19, 2012 at 10:38 AM, Alan Woodward
>>>>>>>>>>>>> <[email protected]>  wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The project I'm currently working on requires the reporting of 
>>>>>>>>>>>>>> exact
>>>>>>>>>>>>>> hit positions from some pretty hairy queries, not all of which 
>>>>>>>>>>>>>> are
>>>>>>>>>>>>>> covered by the existing highlighter modules.  I'm working round 
>>>>>>>>>>>>>> this
>>>>>>>>>>>>>> by translating everything into SpanQueries, and using the 
>>>>>>>>>>>>>> getSpans()
>>>>>>>>>>>>>> method to locate hits (I've extended the Spans interface to make
>>>>>>>>>>>>>> term offsets available - see
>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/LUCENE-3826).  This works 
>>>>>>>>>>>>>> for
>>>>>>>>>>>>>> our use-case, but isn't terribly efficient, and obviously isn't 
>>>>>>>>>>>>>> applicable to
>>>>>>>>>>>>>>
>>>>>>>>>>> non-Span queries.
>>>>>>>>>>>
>>>>>>>>>>>>>> I've seen a bit of chatter on the list about using term offsets 
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>> provide accurate highlighting in Lucene.  I'm going to have a 
>>>>>>>>>>>>>> couple
>>>>>>>>>>>>>> of weeks free in April, and I thought I might have a go at
>>>>>>>>>>>>>> implementing this.  Mainly I'm wondering if there's already been
>>>>>>>>>>>>>> thoughts about how to do it.  My current thoughts are to somehow
>>>>>>>>>>>>>> extend the Weight and Scorer interface to make term offsets
>>>>>>>>>>>>>> available; to get highlights for a given set of documents, you'd
>>>>>>>>>>>>>> essentially run the query again, with a filter on just the 
>>>>>>>>>>>>>> documents
>>>>>>>>>>>>>> you want highlighted, and have a custom collector that gets the 
>>>>>>>>>>>>>> term
>>>>>>>>>>>>>>
>>>>>>>>>>> offsets in place of the scores.
>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Alan, Simon started some initial work on
>>>>>>>>>>>>> https://issues.apache.org/jira/browse/LUCENE-2878
>>>>>>>>>>>>>
>>>>>>>>>>>>> Some work and prototypes were done in a branch, but it might be
>>>>>>>>>>>>> lagging behind trunk a bit.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Additionally at the time it was first done, I think we didn't yet
>>>>>>>>>>>>> support offsets in the postings lists.
>>>>>>>>>>>>> We've since added this and several codecs support it.
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> lucidimagination.com
>>>>>>>>>>>>>
>>>>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>>>> To unsubscribe, e-mail: [email protected] For
>>>>>>>>>>>>> additional commands, e-mail: [email protected]
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>>> To unsubscribe, e-mail: [email protected] For
>>>>>>>>>>>> additional commands, e-mail: [email protected]
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>> To unsubscribe, e-mail: [email protected] For 
>>>>>>>>>>> additional
>>>>>>>>>>> commands, e-mail: [email protected]
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>> To unsubscribe, e-mail: [email protected]
>>>>>>>>> For additional commands, e-mail: [email protected]
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail: [email protected]
>>>>>>>> For additional commands, e-mail: [email protected]
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: [email protected]
>>>>>>> For additional commands, e-mail: [email protected]
>>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: [email protected]
>>>>>> For additional commands, e-mail: [email protected]
>>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: [email protected]
>>>>> For additional commands, e-mail: [email protected]
>>>>>
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [email protected]
>>>> For additional commands, e-mail: [email protected]
>>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to