[
https://issues.apache.org/jira/browse/LUCENE-7526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15616239#comment-15616239
]
ASF GitHub Bot commented on LUCENE-7526:
----------------------------------------
GitHub user Timothy055 opened a pull request:
https://github.com/apache/lucene-solr/pull/105
LUCENE-7526 Improvements to UnifiedHighlighter OffsetStrategies
Pull request for LUCENE-7526
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/Timothy055/lucene-solr master
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/lucene-solr/pull/105.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #105
----
commit 02e932c4a6146363680b88f4947a693c6697c955
Author: Timothy Rodriguez <[email protected]>
Date: 2016-09-01T19:23:50Z
Initial fork of PostingsHighlighter for UnifiedHighlighter
commit 9d88411b3985a98851384d78d681431dba710e89
Author: Timothy Rodriguez <[email protected]>
Date: 2016-09-01T23:17:06Z
Initial commit of the UnifiedHighlighter for OSS contribution
commit e45e39bc4b07ea33e4423b264c2fefb9aa08777a
Author: David Smiley <[email protected]>
Date: 2016-09-02T12:45:49Z
Fix misc issues; "ant test" now works. (#1)
commit 046a28ef31acf4cea7d255bbbb4b827e6a714e3d
Author: Timothy Rodriguez <[email protected]>
Date: 2016-09-02T20:58:31Z
Minor refactoring of the AnalysisFieldHighlighter
commit ccd1a2280abd4b48cfef8122696e5d9cfd12920f
Author: David Smiley <[email protected]>
Date: 2016-09-03T12:55:20Z
AbstractFieldHighlighter: order methods more sensibly; renamed a couple.
commit d4714a04a3e41d5e95bbe942b275c32ed69b9c2e
Author: David Smiley <[email protected]>
Date: 2016-09-04T01:03:29Z
Improve javadocs and @lucene.external/internal labeling & scope.
"ant precommit" now passes.
commit e0659f18a59bf2893076da6d7643ff30f2fa5a52
Author: David Smiley <[email protected]>
Date: 2016-09-04T01:25:55Z
Analysis: remove dubious filter() method
commit ccd7ce707bff2c06da89b31853cca9aecea72008
Author: David Smiley <[email protected]>
Date: 2016-09-04T01:44:01Z
getStrictPhraseHelper -> rm "Strict", getHighlightAccuracy -> getFlags, and
only call filterExtractedTerms once.
commit ffc2a22c700b8abcbf87673d5d05bb3659d177c9
Author: David Smiley <[email protected]>
Date: 2016-09-04T15:21:08Z
UnifiedHighlighter round 2 (#2)
* AbstractFieldHighlighter: order methods more sensibly; renamed a couple.
* Improve javadocs and @lucene.external/internal labeling & scope.
"ant precommit" now passes.
* Analysis: remove dubious filter() method
* getStrictPhraseHelper -> rm "Strict", getHighlightAccuracy -> getFlags,
and only call filterExtractedTerms once.
commit 5f95e05595db462d3ab5bffc68c2c92f70875072
Author: David Smiley <[email protected]>
Date: 2016-09-04T16:12:33Z
Refactor: FieldOffsetStrategy
commit 86fb6265fbbdb955ead6d4baf944bf708175715e
Author: David Smiley <[email protected]>
Date: 2016-09-04T16:21:32Z
stop passing maxPassages into highlightFieldForDoc()
commit f6fd80544eae9fab953b94b1e9346c0883f956eb
Author: David Smiley <[email protected]>
Date: 2016-09-04T16:12:33Z
Refactor: FieldOffsetStrategy
commit b335a673c2ce45904890c1e9af7cbfda2bd27b0f
Author: David Smiley <[email protected]>
Date: 2016-09-04T16:21:32Z
stop passing maxPassages into highlightFieldForDoc()
commit 478db9437b92214cbf459f82ba2e3a67c966a150
Author: David Smiley <[email protected]>
Date: 2016-09-04T18:29:44Z
Rename subclasses of FieldOffsetStrategy.
commit dbf4280755c11420a5032445cd618fadb7444b61
Author: David Smiley <[email protected]>
Date: 2016-09-04T18:31:34Z
Re-order and harmonize params on methods called by UH.getFieldHighlighter()
commit f0340e27e61dcda2e11992f08ec07a72fad6c24c
Author: David Smiley <[email protected]>
Date: 2016-09-04T18:53:51Z
FieldHighlighter: harmonize field/param order. And don't apply
maxNoHighlightPasses twice.
commit 817f63c1d48fd523c13b9c40a2ae9b8a4047209a
Author: Timothy Rodriguez <[email protected]>
Date: 2016-09-06T20:43:20Z
Merge of renaming changes
commit 0f644a4f53c1ed4d41d562848f6fe51a87442a75
Author: Timothy Rodriguez <[email protected]>
Date: 2016-09-06T20:54:13Z
add visibility tests
commit 9171f49e117085e7d086267bb73836831ff07f8e
Author: Timothy Rodriguez <[email protected]>
Date: 2016-09-07T14:26:59Z
ADd additional extensibility test
commit 7ce488147cb811e15cb6e9125a835171157746f2
Author: Timothy Rodriguez <[email protected]>
Date: 2016-09-28T22:04:15Z
Reduce visibility of MultiTermHighlighting to package protected
commit 2f08465020448592b0e8750db568ade5a9218267
Author: Timothy M. Rodriguez <[email protected]>
Date: 2016-10-11T16:44:29Z
Initial commit that will use memory index to generate offsets enum if the
tokenstream is null
commit 357f3dfb9ace4deef20787af19bc2e5a6b4ff61e
Author: Timothy M. Rodriguez <[email protected]>
Date: 2016-10-11T17:34:51Z
Switched analysis offset strategy to not re-build a tokenstream
commit 64153d288db5714cdaf3726328557f65c1635610
Author: Timothy M. Rodriguez <[email protected]>
Date: 2016-10-11T17:42:12Z
Switched to using chars ref builder
commit f137779b1e1b7e57c4b78652614a04507b9e09e1
Author: Timothy Rodriguez <[email protected]>
Date: 2016-10-21T19:07:48Z
minor cleanup
commit ec814f974db6459eba7aa45bf7a4cdae04e6ad6f
Author: Timothy M. Rodriguez <[email protected]>
Date: 2016-10-21T21:25:57Z
switch to use of a CompositePostingsEnum that wraps the postings of
wildcard matches
commit 955a1e79b5189492fae2c95da39343c29e1cdb25
Author: Timothy M. Rodriguez <[email protected]>
Date: 2016-10-21T21:32:48Z
merge conflicts on PhraseHelper rename
commit d35bd1cffd3c6e2aed67aeccfd03959bf855670a
Author: Timothy Rodriguez <[email protected]>
Date: 2016-10-21T22:13:51Z
minor cleanup of how automata are handled in the FieldOffsetStrategy
commit aa8c92667272e5f397b9566cde05eae7e31bcce5
Author: Timothy Rodriguez <[email protected]>
Date: 2016-10-24T20:03:08Z
Removed most use of TokenStreams except in pure Analysis
commit db42d6a959ca19a77dee3cc7b09496a21c631bd6
Author: Timothy Rodriguez <[email protected]>
Date: 2016-10-24T20:08:25Z
simplified some logic to not use continue statements
commit 657c2a70c4d8ec8ef850e9f66b93cf85ec16f636
Author: Timothy Rodriguez <[email protected]>
Date: 2016-10-24T22:56:17Z
split analysis mode into two, moved all offset sources from readers into
the FieldOffsetStrategy
----
> Improvements to UnifiedHighlighter OffsetStrategies
> ---------------------------------------------------
>
> Key: LUCENE-7526
> URL: https://issues.apache.org/jira/browse/LUCENE-7526
> Project: Lucene - Core
> Issue Type: Improvement
> Components: modules/highlighter
> Reporter: Timothy M. Rodriguez
> Assignee: David Smiley
> Priority: Minor
> Fix For: 6.4
>
>
> This ticket improves several of the UnifiedHighlighter FieldOffsetStrategies
> by reducing reliance on creating or re-creating TokenStreams.
> The primary changes are as follows:
> * AnalysisOffsetStrategy - split into two offset strategies
> ** MemoryIndexOffsetStrategy - the primary analysis mode that utilizes a
> MemoryIndex for producing Offsets
> ** TokenStreamOffsetStrategy - an offset strategy that avoids creating a
> MemoryIndex. Can only be used if the query distills down to terms and
> automata.
> * TokenStream removal
> ** MemoryIndexOffsetStrategy - previously a TokenStream was created to fill
> the memory index and then once consumed a new one was generated by
> uninverting the MemoryIndex back into a TokenStream if there were automata
> (wildcard/mtq queries) involved. Now this is avoided, which should save
> memory and avoid a second pass over the data.
> ** TermVectorOffsetStrategy - this was refactored in a similar way to avoid
> generating a TokenStream if automata are involved.
> ** PostingsWithTermVectorsOffsetStrategy - similar refactoring
> * CompositePostingsEnum - aggregates several underlying PostingsEnums for
> wildcard/mtq queries. This should improve relevancy by providing unified
> metrics for a wildcard across all it's term matches
> * Added a HighlightFlag for enabling the newly separated
> TokenStreamOffsetStrategy since it can adversely affect passage relevancy
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]