[
https://issues.apache.org/jira/browse/LUCENE-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15470875#comment-15470875
]
ASF GitHub Bot commented on LUCENE-7438:
----------------------------------------
GitHub user Timothy055 opened a pull request:
https://github.com/apache/lucene-solr/pull/79
LUCENE-7438 UnifiedHighlighter
Initial pull request for
[LUCENE-7438](https://issues.apache.org/jira/browse/LUCENE-7438)
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/Timothy055/lucene-solr master
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/lucene-solr/pull/79.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #79
----
commit 02e932c4a6146363680b88f4947a693c6697c955
Author: Timothy Rodriguez <[email protected]>
Date: 2016-09-01T19:23:50Z
Initial fork of PostingsHighlighter for UnifiedHighlighter
commit 9d88411b3985a98851384d78d681431dba710e89
Author: Timothy Rodriguez <[email protected]>
Date: 2016-09-01T23:17:06Z
Initial commit of the UnifiedHighlighter for OSS contribution
commit e45e39bc4b07ea33e4423b264c2fefb9aa08777a
Author: David Smiley <[email protected]>
Date: 2016-09-02T12:45:49Z
Fix misc issues; "ant test" now works. (#1)
commit 046a28ef31acf4cea7d255bbbb4b827e6a714e3d
Author: Timothy Rodriguez <[email protected]>
Date: 2016-09-02T20:58:31Z
Minor refactoring of the AnalysisFieldHighlighter
commit ccd1a2280abd4b48cfef8122696e5d9cfd12920f
Author: David Smiley <[email protected]>
Date: 2016-09-03T12:55:20Z
AbstractFieldHighlighter: order methods more sensibly; renamed a couple.
commit d4714a04a3e41d5e95bbe942b275c32ed69b9c2e
Author: David Smiley <[email protected]>
Date: 2016-09-04T01:03:29Z
Improve javadocs and @lucene.external/internal labeling & scope.
"ant precommit" now passes.
commit e0659f18a59bf2893076da6d7643ff30f2fa5a52
Author: David Smiley <[email protected]>
Date: 2016-09-04T01:25:55Z
Analysis: remove dubious filter() method
commit ccd7ce707bff2c06da89b31853cca9aecea72008
Author: David Smiley <[email protected]>
Date: 2016-09-04T01:44:01Z
getStrictPhraseHelper -> rm "Strict", getHighlightAccuracy -> getFlags, and
only call filterExtractedTerms once.
commit ffc2a22c700b8abcbf87673d5d05bb3659d177c9
Author: David Smiley <[email protected]>
Date: 2016-09-04T15:21:08Z
UnifiedHighlighter round 2 (#2)
* AbstractFieldHighlighter: order methods more sensibly; renamed a couple.
* Improve javadocs and @lucene.external/internal labeling & scope.
"ant precommit" now passes.
* Analysis: remove dubious filter() method
* getStrictPhraseHelper -> rm "Strict", getHighlightAccuracy -> getFlags,
and only call filterExtractedTerms once.
commit 5f95e05595db462d3ab5bffc68c2c92f70875072
Author: David Smiley <[email protected]>
Date: 2016-09-04T16:12:33Z
Refactor: FieldOffsetStrategy
commit 86fb6265fbbdb955ead6d4baf944bf708175715e
Author: David Smiley <[email protected]>
Date: 2016-09-04T16:21:32Z
stop passing maxPassages into highlightFieldForDoc()
commit f6fd80544eae9fab953b94b1e9346c0883f956eb
Author: David Smiley <[email protected]>
Date: 2016-09-04T16:12:33Z
Refactor: FieldOffsetStrategy
commit b335a673c2ce45904890c1e9af7cbfda2bd27b0f
Author: David Smiley <[email protected]>
Date: 2016-09-04T16:21:32Z
stop passing maxPassages into highlightFieldForDoc()
commit 478db9437b92214cbf459f82ba2e3a67c966a150
Author: David Smiley <[email protected]>
Date: 2016-09-04T18:29:44Z
Rename subclasses of FieldOffsetStrategy.
commit dbf4280755c11420a5032445cd618fadb7444b61
Author: David Smiley <[email protected]>
Date: 2016-09-04T18:31:34Z
Re-order and harmonize params on methods called by UH.getFieldHighlighter()
commit f0340e27e61dcda2e11992f08ec07a72fad6c24c
Author: David Smiley <[email protected]>
Date: 2016-09-04T18:53:51Z
FieldHighlighter: harmonize field/param order. And don't apply
maxNoHighlightPasses twice.
commit 817f63c1d48fd523c13b9c40a2ae9b8a4047209a
Author: Timothy Rodriguez <[email protected]>
Date: 2016-09-06T20:43:20Z
Merge of renaming changes
commit 0f644a4f53c1ed4d41d562848f6fe51a87442a75
Author: Timothy Rodriguez <[email protected]>
Date: 2016-09-06T20:54:13Z
add visibility tests
commit 9171f49e117085e7d086267bb73836831ff07f8e
Author: Timothy Rodriguez <[email protected]>
Date: 2016-09-07T14:26:59Z
ADd additional extensibility test
----
> UnifiedHighlighter
> ------------------
>
> Key: LUCENE-7438
> URL: https://issues.apache.org/jira/browse/LUCENE-7438
> Project: Lucene - Core
> Issue Type: Improvement
> Components: modules/highlighter
> Affects Versions: 6.2
> Reporter: Timothy M. Rodriguez
> Assignee: David Smiley
>
> The UnifiedHighlighter is an evolution of the PostingsHighlighter that is
> able to highlight using offsets in either postings, term vectors, or from
> analysis (a TokenStream). Lucene’s existing highlighters are mostly
> demarcated along offset source lines, whereas here it is unified -- hence
> this proposed name. In this highlighter, the offset source strategy is
> separated from the core highlighting functionalty. The UnifiedHighlighter
> further improves on the PostingsHighlighter’s design by supporting accurate
> phrase highlighting using an approach similar to the standard highlighter’s
> WeightedSpanTermExtractor. The next major improvement is a hybrid offset
> source strategythat utilizes postings and “light” term vectors (i.e. just the
> terms) for highlighting multi-term queries (wildcards) without resorting to
> analysis. Phrase highlighting and wildcard highlighting can both be disabled
> if you’d rather highlight a little faster albeit not as accurately reflecting
> the query.
> We’ve benchmarked an earlier version of this highlighter comparing it to the
> other highlighters and the results were exciting! It’s tempting to share
> those results but it’s definitely due for another benchmark, so we’ll work on
> that. Performance was the main motivator for creating the UnifiedHighlighter,
> as the standard Highlighter (the only one meeting Bloomberg Law’s accuracy
> requirements) wasn’t fast enough, even with term vectors along with several
> improvements we contributed back, and even after we forked it to highlight in
> multiple threads.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]