[ 
https://issues.apache.org/jira/browse/LUCENE-7526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15616239#comment-15616239
 ] 

ASF GitHub Bot commented on LUCENE-7526:
----------------------------------------

GitHub user Timothy055 opened a pull request:

    https://github.com/apache/lucene-solr/pull/105

    LUCENE-7526 Improvements to UnifiedHighlighter OffsetStrategies

    Pull request for LUCENE-7526

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/Timothy055/lucene-solr master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/lucene-solr/pull/105.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #105
    
----
commit 02e932c4a6146363680b88f4947a693c6697c955
Author: Timothy Rodriguez <trodrigue...@bloomberg.net>
Date:   2016-09-01T19:23:50Z

    Initial fork of PostingsHighlighter for UnifiedHighlighter

commit 9d88411b3985a98851384d78d681431dba710e89
Author: Timothy Rodriguez <trodrigue...@bloomberg.net>
Date:   2016-09-01T23:17:06Z

    Initial commit of the UnifiedHighlighter for OSS contribution

commit e45e39bc4b07ea33e4423b264c2fefb9aa08777a
Author: David Smiley <david.w.smi...@gmail.com>
Date:   2016-09-02T12:45:49Z

    Fix misc issues; "ant test" now works. (#1)

commit 046a28ef31acf4cea7d255bbbb4b827e6a714e3d
Author: Timothy Rodriguez <trodrigue...@bloomberg.net>
Date:   2016-09-02T20:58:31Z

    Minor refactoring of the AnalysisFieldHighlighter

commit ccd1a2280abd4b48cfef8122696e5d9cfd12920f
Author: David Smiley <dsmi...@apache.org>
Date:   2016-09-03T12:55:20Z

    AbstractFieldHighlighter: order methods more sensibly; renamed a couple.

commit d4714a04a3e41d5e95bbe942b275c32ed69b9c2e
Author: David Smiley <dsmi...@apache.org>
Date:   2016-09-04T01:03:29Z

    Improve javadocs and @lucene.external/internal labeling & scope.
    "ant precommit" now passes.

commit e0659f18a59bf2893076da6d7643ff30f2fa5a52
Author: David Smiley <dsmi...@apache.org>
Date:   2016-09-04T01:25:55Z

    Analysis: remove dubious filter() method

commit ccd7ce707bff2c06da89b31853cca9aecea72008
Author: David Smiley <dsmi...@apache.org>
Date:   2016-09-04T01:44:01Z

    getStrictPhraseHelper -> rm "Strict", getHighlightAccuracy -> getFlags, and 
only call filterExtractedTerms once.

commit ffc2a22c700b8abcbf87673d5d05bb3659d177c9
Author: David Smiley <david.w.smi...@gmail.com>
Date:   2016-09-04T15:21:08Z

    UnifiedHighlighter round 2 (#2)
    
    * AbstractFieldHighlighter: order methods more sensibly; renamed a couple.
    
    * Improve javadocs and @lucene.external/internal labeling & scope.
    "ant precommit" now passes.
    
    * Analysis: remove dubious filter() method
    
    * getStrictPhraseHelper -> rm "Strict", getHighlightAccuracy -> getFlags, 
and only call filterExtractedTerms once.

commit 5f95e05595db462d3ab5bffc68c2c92f70875072
Author: David Smiley <dsmi...@apache.org>
Date:   2016-09-04T16:12:33Z

    Refactor: FieldOffsetStrategy

commit 86fb6265fbbdb955ead6d4baf944bf708175715e
Author: David Smiley <dsmi...@apache.org>
Date:   2016-09-04T16:21:32Z

    stop passing maxPassages into highlightFieldForDoc()

commit f6fd80544eae9fab953b94b1e9346c0883f956eb
Author: David Smiley <dsmi...@apache.org>
Date:   2016-09-04T16:12:33Z

    Refactor: FieldOffsetStrategy

commit b335a673c2ce45904890c1e9af7cbfda2bd27b0f
Author: David Smiley <dsmi...@apache.org>
Date:   2016-09-04T16:21:32Z

    stop passing maxPassages into highlightFieldForDoc()

commit 478db9437b92214cbf459f82ba2e3a67c966a150
Author: David Smiley <dsmi...@apache.org>
Date:   2016-09-04T18:29:44Z

    Rename subclasses of FieldOffsetStrategy.

commit dbf4280755c11420a5032445cd618fadb7444b61
Author: David Smiley <dsmi...@apache.org>
Date:   2016-09-04T18:31:34Z

    Re-order and harmonize params on methods called by UH.getFieldHighlighter()

commit f0340e27e61dcda2e11992f08ec07a72fad6c24c
Author: David Smiley <dsmi...@apache.org>
Date:   2016-09-04T18:53:51Z

    FieldHighlighter: harmonize field/param order. And don't apply 
maxNoHighlightPasses twice.

commit 817f63c1d48fd523c13b9c40a2ae9b8a4047209a
Author: Timothy Rodriguez <trodrigue...@bloomberg.net>
Date:   2016-09-06T20:43:20Z

    Merge of renaming changes

commit 0f644a4f53c1ed4d41d562848f6fe51a87442a75
Author: Timothy Rodriguez <trodrigue...@bloomberg.net>
Date:   2016-09-06T20:54:13Z

    add visibility tests

commit 9171f49e117085e7d086267bb73836831ff07f8e
Author: Timothy Rodriguez <trodrigue...@bloomberg.net>
Date:   2016-09-07T14:26:59Z

    ADd additional extensibility test

commit 7ce488147cb811e15cb6e9125a835171157746f2
Author: Timothy Rodriguez <trodrigue...@bloomberg.net>
Date:   2016-09-28T22:04:15Z

    Reduce visibility of MultiTermHighlighting to package protected

commit 2f08465020448592b0e8750db568ade5a9218267
Author: Timothy M. Rodriguez <timothy.rodrig...@gmail.com>
Date:   2016-10-11T16:44:29Z

    Initial commit that will use memory index to generate offsets enum if the 
tokenstream is null

commit 357f3dfb9ace4deef20787af19bc2e5a6b4ff61e
Author: Timothy M. Rodriguez <timothy.rodrig...@gmail.com>
Date:   2016-10-11T17:34:51Z

    Switched analysis offset strategy to not re-build a tokenstream

commit 64153d288db5714cdaf3726328557f65c1635610
Author: Timothy M. Rodriguez <timothy.rodrig...@gmail.com>
Date:   2016-10-11T17:42:12Z

    Switched to using chars ref builder

commit f137779b1e1b7e57c4b78652614a04507b9e09e1
Author: Timothy Rodriguez <trodrigue...@bloomberg.net>
Date:   2016-10-21T19:07:48Z

    minor cleanup

commit ec814f974db6459eba7aa45bf7a4cdae04e6ad6f
Author: Timothy M. Rodriguez <timothy.rodrig...@gmail.com>
Date:   2016-10-21T21:25:57Z

    switch to use of a CompositePostingsEnum that wraps the postings of 
wildcard matches

commit 955a1e79b5189492fae2c95da39343c29e1cdb25
Author: Timothy M. Rodriguez <timothy.rodrig...@gmail.com>
Date:   2016-10-21T21:32:48Z

    merge conflicts on PhraseHelper rename

commit d35bd1cffd3c6e2aed67aeccfd03959bf855670a
Author: Timothy Rodriguez <trodrigue...@bloomberg.net>
Date:   2016-10-21T22:13:51Z

    minor cleanup of how automata are handled in the FieldOffsetStrategy

commit aa8c92667272e5f397b9566cde05eae7e31bcce5
Author: Timothy Rodriguez <trodrigue...@bloomberg.net>
Date:   2016-10-24T20:03:08Z

    Removed most use of TokenStreams except in pure Analysis

commit db42d6a959ca19a77dee3cc7b09496a21c631bd6
Author: Timothy Rodriguez <trodrigue...@bloomberg.net>
Date:   2016-10-24T20:08:25Z

    simplified some logic to not use continue statements

commit 657c2a70c4d8ec8ef850e9f66b93cf85ec16f636
Author: Timothy Rodriguez <trodrigue...@bloomberg.net>
Date:   2016-10-24T22:56:17Z

    split analysis mode into two, moved all offset sources from readers into 
the FieldOffsetStrategy

----


> Improvements to UnifiedHighlighter OffsetStrategies
> ---------------------------------------------------
>
>                 Key: LUCENE-7526
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7526
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/highlighter
>            Reporter: Timothy M. Rodriguez
>            Assignee: David Smiley
>            Priority: Minor
>             Fix For: 6.4
>
>
> This ticket improves several of the UnifiedHighlighter FieldOffsetStrategies 
> by reducing reliance on creating or re-creating TokenStreams.
> The primary changes are as follows:
> * AnalysisOffsetStrategy - split into two offset strategies
>   ** MemoryIndexOffsetStrategy - the primary analysis mode that utilizes a 
> MemoryIndex for producing Offsets
>   ** TokenStreamOffsetStrategy - an offset strategy that avoids creating a 
> MemoryIndex.  Can only be used if the query distills down to terms and 
> automata.
> * TokenStream removal 
>   ** MemoryIndexOffsetStrategy - previously a TokenStream was created to fill 
> the memory index and then once consumed a new one was generated by 
> uninverting the MemoryIndex back into a TokenStream if there were automata 
> (wildcard/mtq queries) involved.  Now this is avoided, which should save 
> memory and avoid a second pass over the data.
>   ** TermVectorOffsetStrategy - this was refactored in a similar way to avoid 
> generating a TokenStream if automata are involved.
>   ** PostingsWithTermVectorsOffsetStrategy - similar refactoring
> * CompositePostingsEnum - aggregates several underlying PostingsEnums for 
> wildcard/mtq queries.  This should improve relevancy by providing unified 
> metrics for a wildcard across all it's term matches
> * Added a HighlightFlag for enabling the newly separated 
> TokenStreamOffsetStrategy since it can adversely affect passage relevancy



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to