[ https://issues.apache.org/jira/browse/LUCENE-7526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15616239#comment-15616239 ]
ASF GitHub Bot commented on LUCENE-7526: ---------------------------------------- GitHub user Timothy055 opened a pull request: https://github.com/apache/lucene-solr/pull/105 LUCENE-7526 Improvements to UnifiedHighlighter OffsetStrategies Pull request for LUCENE-7526 You can merge this pull request into a Git repository by running: $ git pull https://github.com/Timothy055/lucene-solr master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/lucene-solr/pull/105.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #105 ---- commit 02e932c4a6146363680b88f4947a693c6697c955 Author: Timothy Rodriguez <trodrigue...@bloomberg.net> Date: 2016-09-01T19:23:50Z Initial fork of PostingsHighlighter for UnifiedHighlighter commit 9d88411b3985a98851384d78d681431dba710e89 Author: Timothy Rodriguez <trodrigue...@bloomberg.net> Date: 2016-09-01T23:17:06Z Initial commit of the UnifiedHighlighter for OSS contribution commit e45e39bc4b07ea33e4423b264c2fefb9aa08777a Author: David Smiley <david.w.smi...@gmail.com> Date: 2016-09-02T12:45:49Z Fix misc issues; "ant test" now works. (#1) commit 046a28ef31acf4cea7d255bbbb4b827e6a714e3d Author: Timothy Rodriguez <trodrigue...@bloomberg.net> Date: 2016-09-02T20:58:31Z Minor refactoring of the AnalysisFieldHighlighter commit ccd1a2280abd4b48cfef8122696e5d9cfd12920f Author: David Smiley <dsmi...@apache.org> Date: 2016-09-03T12:55:20Z AbstractFieldHighlighter: order methods more sensibly; renamed a couple. commit d4714a04a3e41d5e95bbe942b275c32ed69b9c2e Author: David Smiley <dsmi...@apache.org> Date: 2016-09-04T01:03:29Z Improve javadocs and @lucene.external/internal labeling & scope. "ant precommit" now passes. commit e0659f18a59bf2893076da6d7643ff30f2fa5a52 Author: David Smiley <dsmi...@apache.org> Date: 2016-09-04T01:25:55Z Analysis: remove dubious filter() method commit ccd7ce707bff2c06da89b31853cca9aecea72008 Author: David Smiley <dsmi...@apache.org> Date: 2016-09-04T01:44:01Z getStrictPhraseHelper -> rm "Strict", getHighlightAccuracy -> getFlags, and only call filterExtractedTerms once. commit ffc2a22c700b8abcbf87673d5d05bb3659d177c9 Author: David Smiley <david.w.smi...@gmail.com> Date: 2016-09-04T15:21:08Z UnifiedHighlighter round 2 (#2) * AbstractFieldHighlighter: order methods more sensibly; renamed a couple. * Improve javadocs and @lucene.external/internal labeling & scope. "ant precommit" now passes. * Analysis: remove dubious filter() method * getStrictPhraseHelper -> rm "Strict", getHighlightAccuracy -> getFlags, and only call filterExtractedTerms once. commit 5f95e05595db462d3ab5bffc68c2c92f70875072 Author: David Smiley <dsmi...@apache.org> Date: 2016-09-04T16:12:33Z Refactor: FieldOffsetStrategy commit 86fb6265fbbdb955ead6d4baf944bf708175715e Author: David Smiley <dsmi...@apache.org> Date: 2016-09-04T16:21:32Z stop passing maxPassages into highlightFieldForDoc() commit f6fd80544eae9fab953b94b1e9346c0883f956eb Author: David Smiley <dsmi...@apache.org> Date: 2016-09-04T16:12:33Z Refactor: FieldOffsetStrategy commit b335a673c2ce45904890c1e9af7cbfda2bd27b0f Author: David Smiley <dsmi...@apache.org> Date: 2016-09-04T16:21:32Z stop passing maxPassages into highlightFieldForDoc() commit 478db9437b92214cbf459f82ba2e3a67c966a150 Author: David Smiley <dsmi...@apache.org> Date: 2016-09-04T18:29:44Z Rename subclasses of FieldOffsetStrategy. commit dbf4280755c11420a5032445cd618fadb7444b61 Author: David Smiley <dsmi...@apache.org> Date: 2016-09-04T18:31:34Z Re-order and harmonize params on methods called by UH.getFieldHighlighter() commit f0340e27e61dcda2e11992f08ec07a72fad6c24c Author: David Smiley <dsmi...@apache.org> Date: 2016-09-04T18:53:51Z FieldHighlighter: harmonize field/param order. And don't apply maxNoHighlightPasses twice. commit 817f63c1d48fd523c13b9c40a2ae9b8a4047209a Author: Timothy Rodriguez <trodrigue...@bloomberg.net> Date: 2016-09-06T20:43:20Z Merge of renaming changes commit 0f644a4f53c1ed4d41d562848f6fe51a87442a75 Author: Timothy Rodriguez <trodrigue...@bloomberg.net> Date: 2016-09-06T20:54:13Z add visibility tests commit 9171f49e117085e7d086267bb73836831ff07f8e Author: Timothy Rodriguez <trodrigue...@bloomberg.net> Date: 2016-09-07T14:26:59Z ADd additional extensibility test commit 7ce488147cb811e15cb6e9125a835171157746f2 Author: Timothy Rodriguez <trodrigue...@bloomberg.net> Date: 2016-09-28T22:04:15Z Reduce visibility of MultiTermHighlighting to package protected commit 2f08465020448592b0e8750db568ade5a9218267 Author: Timothy M. Rodriguez <timothy.rodrig...@gmail.com> Date: 2016-10-11T16:44:29Z Initial commit that will use memory index to generate offsets enum if the tokenstream is null commit 357f3dfb9ace4deef20787af19bc2e5a6b4ff61e Author: Timothy M. Rodriguez <timothy.rodrig...@gmail.com> Date: 2016-10-11T17:34:51Z Switched analysis offset strategy to not re-build a tokenstream commit 64153d288db5714cdaf3726328557f65c1635610 Author: Timothy M. Rodriguez <timothy.rodrig...@gmail.com> Date: 2016-10-11T17:42:12Z Switched to using chars ref builder commit f137779b1e1b7e57c4b78652614a04507b9e09e1 Author: Timothy Rodriguez <trodrigue...@bloomberg.net> Date: 2016-10-21T19:07:48Z minor cleanup commit ec814f974db6459eba7aa45bf7a4cdae04e6ad6f Author: Timothy M. Rodriguez <timothy.rodrig...@gmail.com> Date: 2016-10-21T21:25:57Z switch to use of a CompositePostingsEnum that wraps the postings of wildcard matches commit 955a1e79b5189492fae2c95da39343c29e1cdb25 Author: Timothy M. Rodriguez <timothy.rodrig...@gmail.com> Date: 2016-10-21T21:32:48Z merge conflicts on PhraseHelper rename commit d35bd1cffd3c6e2aed67aeccfd03959bf855670a Author: Timothy Rodriguez <trodrigue...@bloomberg.net> Date: 2016-10-21T22:13:51Z minor cleanup of how automata are handled in the FieldOffsetStrategy commit aa8c92667272e5f397b9566cde05eae7e31bcce5 Author: Timothy Rodriguez <trodrigue...@bloomberg.net> Date: 2016-10-24T20:03:08Z Removed most use of TokenStreams except in pure Analysis commit db42d6a959ca19a77dee3cc7b09496a21c631bd6 Author: Timothy Rodriguez <trodrigue...@bloomberg.net> Date: 2016-10-24T20:08:25Z simplified some logic to not use continue statements commit 657c2a70c4d8ec8ef850e9f66b93cf85ec16f636 Author: Timothy Rodriguez <trodrigue...@bloomberg.net> Date: 2016-10-24T22:56:17Z split analysis mode into two, moved all offset sources from readers into the FieldOffsetStrategy ---- > Improvements to UnifiedHighlighter OffsetStrategies > --------------------------------------------------- > > Key: LUCENE-7526 > URL: https://issues.apache.org/jira/browse/LUCENE-7526 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/highlighter > Reporter: Timothy M. Rodriguez > Assignee: David Smiley > Priority: Minor > Fix For: 6.4 > > > This ticket improves several of the UnifiedHighlighter FieldOffsetStrategies > by reducing reliance on creating or re-creating TokenStreams. > The primary changes are as follows: > * AnalysisOffsetStrategy - split into two offset strategies > ** MemoryIndexOffsetStrategy - the primary analysis mode that utilizes a > MemoryIndex for producing Offsets > ** TokenStreamOffsetStrategy - an offset strategy that avoids creating a > MemoryIndex. Can only be used if the query distills down to terms and > automata. > * TokenStream removal > ** MemoryIndexOffsetStrategy - previously a TokenStream was created to fill > the memory index and then once consumed a new one was generated by > uninverting the MemoryIndex back into a TokenStream if there were automata > (wildcard/mtq queries) involved. Now this is avoided, which should save > memory and avoid a second pass over the data. > ** TermVectorOffsetStrategy - this was refactored in a similar way to avoid > generating a TokenStream if automata are involved. > ** PostingsWithTermVectorsOffsetStrategy - similar refactoring > * CompositePostingsEnum - aggregates several underlying PostingsEnums for > wildcard/mtq queries. This should improve relevancy by providing unified > metrics for a wildcard across all it's term matches > * Added a HighlightFlag for enabling the newly separated > TokenStreamOffsetStrategy since it can adversely affect passage relevancy -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org